Do clients typically implement failover/load-balancing on multiple A records?
The short answer is that it varies.
When multiple address records are present in the answer set, a queried DNS server normally returns them in a randomized order. The operating system will typically present the returned record set to the application in the order they were received. That said, there are options on both sides of the transaction (the nameserver and the OS) which can result in different behaviors. Usually these are not employed. As an example, a little-known file called /etc/gai.conf
controls this on glibc based systems.
The Zytrax book (DNS for Rocket Scientists) has a good summary on the history of this topic, and concludes that RFC 6724 is the current standard that applications and resolver implementations should adhere to.
From here it's worth noting a choice quote from RFC 6724:
Well-behaved applications SHOULD NOT simply use the first address
returned from an API such as getaddrinfo() and then give up if it
fails. For many applications, it is appropriate to iterate through
the list of addresses returned from getaddrinfo() until a working
address is found. For other applications, it might be appropriate to
try multiple addresses in parallel (e.g., with some small delay in
between) and use the first one to succeed.
The standard encourages applications to not stop at the first address on failure, but it is neither a requirement nor the behavior that many casually written applications are going to implement. You should never rely solely on multiple address records for high availability unless you are certain that the greater (or at least most important) percentage of your consuming applications will play nicely. Modern browsers tend to be good about this, but remember that they are not the only consumers that you are dealing with.
(also, as @kasperd notes below, it's important to distinguish between what this buys you in HA vs. load balancing)
My guess what happens is that the DNS TTL for the record is set really low and curl
just needs to resolve again every time
and will get another IP from the DNS server.
Neither curl
nor the kernel are at all aware that this DNS level load balancing happens and you can't reasonably expect anything like that.