HA Proxy - roundrobin vs leastconn
Solution 1:
I haven't experimented with leastconn, but my understanding is that the typical use case for leastconn is when you are load balancing something that can have long lived connections. The reason for this is that leastconn focus on ensuring balanced concurrency where as road robin is going provide a more balanced arrival rate. If this distinction isn't clear, see my answer on the difference.
When you say the load is not evenly distributed, it might help to define "load" a little bit better. If you mean server resources, than I suggest identifying what exactly is causing the increased load (i.e. certain types of connections) and working backwards from there.
Solution 2:
It depends on what's the protocol and the use case to balance. For anything where the amount of connections is correlated with the load/usage, it's better to use leastconn
. Because of the way networks and applications work, it's pretty much always true and you're better off using leastconn
by default.
RDP / X11 remote desktops / Jump Hosts
For example, a company has a pool of remote desktops that employees connect to. You would like employees to be distributed somewhat evenly across desktops.
The number of active connections in that use case is roughly "how many employees are using that desktop right now". The host with the least connections has the least employees using it and it's probably the least loaded. Use "leastconn" in these circumstances, it spreads the load evenly with the amount of users.
An ideal load balancer should be aware of the remote desktop load. How many users? How many applications? How much memory and CPU consumed? There are commercial solutions dedicated to remote desktops (Microsoft/Citrix/etc...), they typically measure these metrics to spread usage very well. HAProxy is a simple network load balancer and it can't do better than counting connections with leastconn
.
HTTP / HTTPS
With HTTP, an active connection means that the server is busy processing a request. Connections are directly proportional to the load. You want to select the server with the least amount of active connections (requests in progress). Use leastconn
for HTTP(S) traffic.
Imagine a scenario with two HTTP servers, where one server is slower to process requests (maybe it's overloaded, maybe it has older hardware).
roundrobin
will distribute requests half-half between the two servers. It's very inefficient, the faster server should take more. Worse yet, the slower server could be overloaded, it will get even slower as more requests come in and could start dropping requests anytime. You don't want that.
leastconn
would detect that the servers are uneven. The slower server holds connections for longer, it has a higher connection count. leastconn
accounts for that and prefers the other server.
In my experience, including roles where I was exclusively doing performance testing for moderate to large websites. leastconn
can be 300% as efficient as roundrobin
for HTTP(S). roundrobin
doesn't distribute connection fairly and it will cause instability on high load.
DNS Request
(Let's ignore that HAProxy doesn't support UDP and UDP is connection less).
One last example. DNS is a simple protocol. The clients sends a single UDP message to request a domain and the DNS server replies in a single message.
In this case, there isn't really a connection. Even if there were, it would be instantly closed (theoretically).
It wouldn't make sense to count connections in these circumstances, it's not optimal for leastconn
. A simple roundrobin
can distribute messages.
A Common Misunderstanding
People sometimes believe that they should not use leastconn
for short lived connections (similar to the last example). Even the HAProxy documentation is misleading about that.
leastconn
Use of this algorithm is recommended where very long sessions are
expected, such as LDAP, SQL, TSE, etc... but is not very well
suited for protocols using short sessions such as HTTP.
[misleading advice, should ignore it]
In the real world, short connections
is not a thing.
Applications are built on top of TCP. Messages are delivered and often processed in order. When a server is slow or overloaded, "short" connections become longer. If there are (more) connections, there is probably some (more) work being done. Connection count and connection duration vary and have meaning.
Think of a basic HTTP server. Some assets take a few milliseconds, some API calls take a few seconds, a page could take any time to load with any amount of requests within it, etc.. Requests are not short lived, their lifetime follow what's being processed on which server. leastconn
understands the ongoing activity and adjusts the distribution, which is exactly what you want from a load balancer.