A DNS record that will frequently change?
Solution 1:
This is called Fast Flux DNS Records. And it's usually how malware authors hide their infrastructure servers.
While this will work for your plan, it's not the best plan. You will likely need to have a spare server or more, online, and doing nothing almost all the time. Only when you have an issue with the main server you would switch to the next one.
Even if you have a TTL of 1 minute, one record will most likely be valid for more than that:
Browser caches
Browsers usually cache the DNS records for a variable amount of time. Firefox uses 60 seconds, Chrome uses 60 seconds too, IE 3.x and earlier cached for 24 hours, IE 4.x and above cached for 30 minutes.
OS cache
Windows will not usually honour the TTL. A TTL for DND is not like the TTL for a IPv4 packet. It's more a indication of freshness than a mandatory refresh. Linux can have nscd configured to set the amount of time the user wants, disregarding DNS TTL. It can cache entries for a week, for example.
ISP cache
ISPs can (and some will) use aggressive caching for decreasing the traffic. They can not only change TTL, but cache the records and return it to clients without even asking upstream DNS servers. This is more prevalent on mobile ISPs, as they change the TTL so mobile clients don't complain on traffic latency.
A load balancer is made to do exactly what you want. With a load balancer in place, you can have 2 or 4 or 10 servers all online at the same time, dividing the load. If one of them goes offline, the service will not be affected. Changing DNS records will have a downtime between the time when the server goes off and the DNS is changed. It will take more than one minute, because you have to detect the downtime, change the records, and wait for them to propagate.
So use a load balancer. It's made to do what you want, and you know exact what to expect. A fast flux DNS setup will have mixed and inconsistent results.
Solution 2:
DNS and how it works is perhaps accompanied by more misunderstanding, legend, superstition, and mythology as any aspect of IT.
Even those of us who know that we are essentially lying (or at least drastically oversimplifying) when we talk about "propagation" of changes still tend to use the term to describe something that is -- simultaneously -- extremely simple and straightforward... yet difficult to explain... and has nothing to do with propagation per se, but everything to do with caching and negative caching, both of which are an essential component of how the system works (and, arguably, how it avoids outright collapse under its own weight) -- essentially the inside-out, opposite of actual "propagation," pull -- not push.
For all the worrying and hand-wringing about short TTLs, they tend to work more often than not, to the point that it may be in your interests to simply try them. At ${day_job}, when our sites migrate from an "old" platform to a "new" platform, it often means they are migrating in such a way that nothing in the infrastructure is shared. My first step in such a migration is dropping the TTL to 60s far enough in advance of the cut so that the old TTL has multiple multiples of itself to run out, giving me a reasonable assurance that these transitional RRs with short TTLs will "propagate out." When I am ready for the cut, I reconfigure the old balancer¹ to hairpin traffic to the new system -- across the Internet -- such that the balancer is no longer balancing multiple internal systems, but instead, is "balancing" all the requests to a single external system -- the balancer of the new platform.²
Then I cutover the DNS, and watch the new balancer and the old.
I am always pleasantly surprised at how rapidly the transition occurs. The holdouts seem to almost always be search spiders and third party "health checking" sites that inexplicably latch on to the old records.
But there's one scenario that breaks down predictably: when a user's browser windows remain open, they tend to latch to the already-discovered address, and often it persists until all of their browser windows are closed.
But in the above narrative, you find the solution to the problem: a "load balancer" -- specifically and more precisely, a reverse proxy -- can be the system that your exposed DNS record points to.
The reverse proxy then forwards the request to the correct target IP address, which it resolves using a second "dummy" hostname with a short TTL, which points to the real back-end server.³ Because the proxy always honors the DNS TTL on that dummy DNS entry, you are assured of a rapid and complete switchover.
The down-side is that you may be routing the traffic through unnecessary extra infrastructure, or paying more for transport across multiple network boundaries, redundantly.
There are services that provide this kind of capability on a global scale, and the one with which I am most familiar is CloudFront. (Most likely, Cloudflare would serve exactly the same purpose, as the small amoubt of testing I have done indicates that it also behaves correctly, and I'm sure there are others.)
Although primarily marketed as a CDN, CloudFront is at its core a global network of reverse proxies with the capability of optionally caching the responses. If www.example.com
points to CloudFront and CloudFront is configured to forward these requests to backend.example.com
, and the DNS record for backend.example.com
uses a short TTL, then CloudFront will do the right thing, because it does honor that short TTL. When the back-end record changes, the traffic will all migrate by the time the TTL runs down.
The TTL on the front-side record pointing to CloudFront, and whether browsers and caching resolvers are honoring it is unimportant, because changes to the back-end destination do not require changes on the www.example.com
record... so the notion that "The Internet" has, with regard to the correct target for www.example.com
is consistent, regardless of where the back-end system happens to be.
This, to me, solves the problem completely by relieving the browser of any need to "follow" changes to the origin server's IP.
tl; dr: route the requests to a system that serves as a proxy for the real web server, so that only the proxy configuration needs to accommodate the change in origin server IP -- not the browser-facing DNS.
Note that CloudFront also minimizes latency by some DNS magic it imposes on the front side, which results in www.example.com
resolving to the most optimal CloudFront edge location based on the location of the browser that's querying www.example.com
, so there is minimal chance of traffic taking an unnecessarily circuitous route from browser to edge to origin... but this part is transparent and automatic and outside the scope of the question.
And, of course, content caching may also be of value by reducing load on the origin server or transport -- I have configured web sites on CloudFront where the origin server was on an ADSL circuit, and ADSL is inherently constrained for upstream bandwidth. The origin server where CloudFront connects in order to fetch the content does not need to be a server inside the AWS ecosystem.
¹ I speak of balancer as a single entity when in fact it has multiple nodes. When the balancer is an ELB, a machine behind the balancer acts as a dummy app server and does the actual hairpinning to the new platform's balancer, since ELB can't do this on its own.
² The new balancer's only knowledge about the old one is that it needs to trust the old balancer's X-Forwarded-For and that it should not do any IP-based rate limiting on the old balancer's source addresses.
³ When the proxy is one or more servers you control, you have the option of skipping using DNS on the back-side, and simply using IP addresses in the proxy config, but the hosted/distributed scenario discussed subsequently needs that second layer of DNS.
Solution 3:
When I change DNS records, often times the old IP address will be used for months. Having said that, a TTL of only a few seconds is what Amazon uses to create fallback service, for example.
Instead of changing DNS, you can put a proxy server / load balancer in front of it.