Unbalanced - Why your load balancer hates you
Your centralized load balancer secretly hates you. You just haven't realized it yet.
(Another in my 947 part series on why anything centralized inherently sucks. I barely even like central air)
Lets rip off the band aid: It might work well enough for brochureware, but for web-based applications, centralized load balancing is just a dumb idea. It started out as a hack to make a bunch of servers look like one server; a workaround for the fact that HTTP and the web wasn't designed for modern applications. It's an unnecessary middleman and a bottleneck. It forces you toward certain strictures that are not suitable for the modern web. It doesn't like you, and it doesn't like your friends.
HAProxy load balancing sucks
- Because it ties you to one IP address / data center. (Do you really want to deal with setting up anycast?)
- Because monitoring bogged/failing hosts has an unacceptable delay. Your users will get a hung/refused connection for at least several seconds before the monitoring catches it. Your client has fresher health data than your load balancer.
- Because it's a single point of failure. I've had more load balancers fail than I have fingers - That's at least eleven times too many
DNS round robin sucks ( ok, actually it's HTTP 1.1 that sucks )
- Because some browsers will only try one IP address from the group.
- Some browsers will try another IP after 30 seconds of your users sitting there wondering why they hired you. Browser authors are squeamish about DOS'ing your servers with concurrent connection attempts.
Active DNS sucks even harder
- Because their monitoring assumes that your clients have a route to the server just because they do (and the inverse)
- Because some ISPs ignore DNS TTL values
- Because 30 seconds is way too long to wait for a good connection
- Because DYN is a horrible company
If you're schlepping adult toys or hotel reservations, then sure, you could get away with a regular, old fashioned centralized load balancer; BUT, if you're doing something serious, like gaming, automation, or that you otherwise give a carp about your customer, you might start to realize that 30 seconds is wayyy too long to wait if you're having an issue with server X.
The solution is simple: Push the load balancer to the client
The fix is stunningly simple, in principle at least. Load the basic app via your CDN or whatever, and include a list of servers. Use websockets (or ajax, if you must) and connect to at least two of them simultaneously. You can make the client as gentle, or as aggressive as you like. Want to take a server offline? go for it. Design your client to simply try another one. If raw performance is desired, connect to multiple servers and measure RTT between them. Server A is responding within 30ms, server B takes 300ms, therefore, route requests to server A, use B as a standby, or maybe give server C a poke.
For my use case, we use an exponential backoff. The app could have 10 or more connection attempts going at a time if necessary, and that's ok. If your servers are geographically distributed, and you are contending with backbone cut or other routing problem, it'll automatically find a node it can reach. Try that with a centralized load balancer.
The moral of the story
Cut out the middle man. Load balancers in the middle are not your friend. They're a liability. Push the load balancing and failover to the client. Your users will be much happier, and you'll get a lot fewer calls at 4am.