Do servers hold one website only?
Basically: the browser includes the domain name in the HTTP request, so the webserver knows which domain was requested and can respond accordingly.
HTTP requests
Here's how your typical HTTP request happens:
The user provides a URL, in the form
http://host:port/path
.The browser extracts the host (domain) part of the URL and translates it into an IP address if necessary, in a process known as name resolution. This translation can occur via DNS, but it does not have to (for example, the local
hosts
file on common OSes bypasses DNS).The browser opens a TCP connection to the specified port, or defaults to port 80, on that IP address.
The browser sends an HTTP request. For HTTP/1.1, it looks like this:
GET /path HTTP/1.1 Host: example.com
(The
Host
header is standard and required in HTTP/1.1. It was not specified in the HTTP/1.0 spec, but some servers support it anyway.)
From here, the webserver has several pieces of information it can use to decide what the response should be. Note that it is possible for a single webserver to be bound to multiple IP addresses.
- The requested IP address, from the TCP socket
- The IP address of the client is also available, but this is rarely used - sometimes for blocking/filtering
- The requested port, from the TCP socket
- The requested hostname, as specified in the
Host
header by the browser in the HTTP request. - The requested path
- Any other headers (cookies, etc.)
As you seem to have noticed, the most common shared hosting setup these days puts multiple websites on a single IP address:port combination, leaving just Host
to differentiate between websites.
This is known as a Name-based Virtual Host in Apache-land, while Nginx calls them Server Names in Server Blocks and IIS prefers Virtual Server.
What about HTTPS?
HTTPS is a bit different. Everything is identical up to the establishment of the TCP connection, but after that an encrypted TLS tunnel must be established. The goal is to not leak any information about the request.
In order to verify that the server actually owns this domain, the server must send a certificate signed by a trusted third party. The browser will then compare this certificate with the domain it requested.
This presents a problem. How does the server know which host (website)'s certificate to send, if it needs to do this before the HTTP request is received?
Traditionally, this was solved by having a dedicated IP address (or port) for every website requiring HTTPS. Obviously, this becomes problematic as we start running out of IPv4 addresses.
Enter SNI (Server Name Indication). The browser now passes the hostname during the TLS negotiations, so the server has this info early enough to send the correct certificate. On the server side, configuration is very similar to how HTTP virtual hosts are configured.
The downside is the hostname is now passed as plain text before encryption, and is essentially leaked information. This is usually considered an acceptable tradeoff, considering the hostname is normally exposed in a DNS query anyway.
What if you request a site by IP address only?
What the server does when it does not know which specific host you requested depends on the server implementation and configuration. Typically, there is a "default", "catchall" or "fallback" site specified that will provide responses to all requests that do not explicitly specify a host.
This default site can be its own independent site (often showing an error message), or it could be any of the other sites on the server, depending on the preference of the server admin.
I have this explanation for non-tech people.
Jack, Jill and Joe live at a dormitory, and they don't have cellphones.
In the phonebook, they are all listed with the same number. (A-record)
You dial the number, and somebody picks up the phone; you say "I'd like to speak to Jill", and you get her on the line.
Instead of an A-record (A phonenumber/IP-adress) in the phonebook, it may just say "Dormitory X", then you must look further for the number for Dormitory X. This is a CNAME record.
If Jill is not available, you might get
- 404 Jill is not here
- 410 Jill is dead.
- 301 Jill is moved in with Peter
302 Jill is visiting Peter, call him instead
400 I can't understand you.
- 401 Who are you? What is the password? or We don't allow male callers after 10pm
- 402 Payment Required (Are you sure Jill is her real name ;-) )
- 403 No, that is not the right password.
- 418 Jill is a teapot :-)
- 429 Jill can't take any more calls.
451 You are violating your restraining order.
500 Our phone system has broken down.
As from what I understand the DNS link the domain name with the IP address of the server the website is stored on, does that mean each server can only hold one website?
First, you need to understand that there are a number of distinct concepts here.
- Web site, a group of web pages that form a coherent whole.
- IP address, a numerical address (32-bit for IPv4, 128 bit for IPv6) used by the internet protocol as the source or destination for traffic.
- Server, a machine whose job is to serve requests from clients.
- Hostname, a name used to identify a machine in DNS (e.g. "www.example.com" or "en.wikipedia.org")
There is not a one-to-one relationship between any of these things. One server can have multiple IP addresses; multiple hostnames can point at one IP address; one host name can point at multiple IP addresses. Multiple websites can be under the same hostname. One website can be spread across multiple hostnames.
If they don't, how does calling the server's IP address know which website I want if there are many on the same server?
In the old days (HTTP 1.0 and before) each hostname that the server wanted to handle differently had to have its own IP address. This was rather wasteful.
HTTP 1.1 added the Host
"header as a mandatory field in the HTTP request (IIRC some vendors had previously supported this as an extension). This told the server which hostname had been requested and hence allowed it to serve different content for different hostnames on the same IP address. Support for HTTP 1.1 in clients is now ubiquitous.
Unfortunately, SSL (later TLS) added a wrinkle. Establishing a SSL/TLS session requires the server to present a certificate to the client that covers the requested hostname, but the HTTP request doesn't arrive until after the SSL/TLS session is established.
It is possible to have one certificate cover multiple hostnames through the use of the SubjectAltName
field or the use of wildcards in the CommonName
field. However, this poses administrative challenges, especially if the hostnames involved are under domains with different ownership.
So TLS introduced the "server name indication" (SNI) extension. With this extension, the client sends the requested hostname to the server during the TLS handshake procedure. The server can then present the appropriate certificate. Unfortunately, while current versions of all major SSL/TLS implementations support SNI, it has taken a long time for older versions to fall out of use.