20121102

Led Astray

It's frustrating, and it's my own damn fault.

I read in the HP v1910 switch documentation that an 802.3ad bond would utilize all connections for the transmission of data.   Even with static aggregation I thought I'd get something different than what, in fact, I received.  To quote their introduction on the concept of link aggregation:

"Link aggregation delivers the following benefits: * Increases bandwidth beyond the limits of any single link.  In an aggregate link, traffic is distributed across the member ports."
I'll spare you the rest.  It's my own damn fault because I took that little piece of marketing with an assumption:  That "traffic" indicated TCP packets regardless of their source or destination.  I know better now, and I do bow and scrape to the Prophets of Linux Bonding, the deities that espouse Whole Technical Truth.  I am not worthy!

Despite my best efforts, I cannot get more than 1G/sec between two LACP-connected machines.  Running iperf -s on one, and iperf -c on the other, the connection saturates as though a single channel were all that was available.  The only benefit then is that different machines are distributed across these multiple connections.  Those reading this and who knew better than I, I am sorry.  I'm an idiot.  May this blog serve to save others from my fate.

Static aggregation, as far as my HP switches are concerned, does nothing for mode-0 connections.  I can get a little better throughput, but watching the up-and-down of the flow rates suggests there is much evil happening, and I don't like it.  Plus, I can't really distribute a static aggregation across my switches as far as I know - maybe the HP switch stacking feature would help with this, but I also sense much evil there and don't want to go at it.

The only benefits I can derive from RR is by placing all connections into separate VLANs.  That, of course, kills any notion of redundancy and shared connectivity.  First, it's like having multiple switches, but if a single connection from a single machine goes down, then that whole machine is unable to communicate with the other machines across those virtual switches.  So, bollocks to that.

Second, it's damn hard to figure out a good, robust and non-impossible way to configure these VLANs to also communicate with the rest of the world.  I guess that it all boils down to my desire to use the maximum possible throughput to and from any given machine, without having to jump through hoops like creating gateway hosts just to aggregate all these connections into something recognizable by other networking hardware.  I am also not willing to sacrifice ports to the roles of active-passive, even though that would allow me at least one switch or link failure before catastrophic consequences took hold.

It's my own damn fault because I didn't take the time to read the bonding driver kernel documentation that the Good Lords of Kernel Development took the time to write.  I didn't, at least, until last night.  I poured through it, reading the telling tales of switches and support and the best way to get at certain kinds of redundancy or throughput.

802.3ad obviously doesn't do much for me either.  After reading the docs, I know this.  It does make aggregation on a single switch rather easy, but no more or less easy than mode-6 bonding.  Well, I take that back.  It IS less easy because the switch needs its ports configured.  It also doesn't support my need for multi-switch redundancy, so 802.3ad is out, too.

In short, if you're thinking of bonding two bonds together, don't.  It's just not worth it.  The trouble, the init scripts, the switch configuration will just not do you any good.  You'll still be stuck with 1 G/sec per machine connection.  Even worse, you might not even get your links quickly enough back if someone trips over the power-strip running your two highly-available switches.

I considered the VLAN solution, minus its connection to the world, thereby encapsulating my SAN-to-Hypervisor subnet in its own universe of ultra-high-throughput.  3 G/sec seemed a nice thing.  I managed to get close to that throughput; but, sadly, given that single-link failures would be catastrophic, I can't afford to take that risk.  Redundancy is too important.  I will relegate myself to mode-6, as it appears to be the most flexible, the most robust and the most reliable with regard to even link distribution.

I hope the price of 10GigE drops sooner rather than later...

No comments:

Post a Comment