When webservers send a page, why don't they send all required CSS, JS, and images without being asked?
The short answer is "Because HTTP wasn't designed for it".
Tim Berners-Lee did not design an efficient and extensible network protocol. His one design goal was simplicity. (The professor of my networking class in college said that he should have left the job to the professionals.) The problem that you outline is just one of the many problems with the HTTP protocol. In its original form:
- There was no protocol version, just a request for a resource
- There were no headers
- Each request required a new TCP connection
- There was no compression
The protocol was later revised to address many of these problems:
- The requests were versioned, now requests look like
GET /foo.html HTTP/1.1
- Headers were added for meta information with both the request and response
- Connections were allowed to be reused with
Connection: keep-alive
- Chunked responses were introduced to allow connections to be reused even when the document size is not known ahead of time.
- Gzip compression was added
At this point HTTP has been taken about as far as possible without breaking backwards compatibility.
You are not the first person to suggest that a page and all its resources should be pushed to the client. In fact, Google designed a protocol that can do so called SPDY.
Today both Chrome and Firefox can use SPDY instead of HTTP to servers that support it. From the SPDY website, its main features compared to HTTP are:
- SPDY allows client and server to compress request and response headers, which cuts down on bandwidth usage when the similar headers (e.g. cookies) are sent over and over for multiple requests.
- SPDY allows multiple, simultaneously multiplexed requests over a single connection, saving on round trips between client and server, and preventing low-priority resources from blocking higher-priority requests.
- SPDY allows the server to actively push resources to the client that it knows the client will need (e.g. JavaScript and CSS files) without waiting for the client to request them, allowing the server to make efficient use of unutilized bandwidth.
If you want to serve your website with SPDY to browsers that support it, you can do so. For example Apache has mod_spdy.
SPDY has become the basis for HTTP version 2 with server push technology.
Your web browser doesn't know about the additional resources until it downloads the web page (HTML) from the server, which contains the links to those resources.
You might be wondering, why doesn't the server just parse its own HTML and send all the additional resources to the web browser during the initial request for the web page? It's because the resources might be spread across multiple servers, and the web browser might not need all those resources since it already has some of them cached, or may not support them.
The web browser maintains a cache of resources so it does not have to download the same resources over and over from the servers that host them. When navigating different pages on a website that all use the same jQuery library, you don't want to download that library every time, just the first time.
So when the web browser gets a web page from the server, it checks what linked resources it DOESN'T already have in the cache, then makes additional HTTP requests for those resources. Pretty simple, very flexible and extensible.
A web browser can usually make two HTTP requests in parallel. This is not unlike AJAX - they are both asynchronous methods for loading web pages - asynchronous file loading and asynchronous content loading. With keep-alive, we can make several requests using one connection, and with pipelining we can make several requests without having to wait for responses. Both of these techniques are very fast because most overhead usually comes from opening/closing TCP connections:
A bit of web history...
Web pages started as plain text email, with computer systems being engineered around this idea, forming a somewhat free-for-all communication platform; web servers were still proprietary at the time. Later, more layers were added to the "email spec" in the form of additional MIME types, such as images, styles, scripts, etc. After all, MIME stands for Multi-Purpose Internet Mail Extension. Sooner or later we had what is essentially multimedia email communication, standardized web servers, and web pages.
HTTP requires that data be transmitted in the context of email-like messages, although the data most often is not actually email.
As technology like this evolves, it needs to allow developers to progressively incorporate new features without breaking existing software. For example, when a new MIME type is added to the spec - let's say JPEG - it will take some time for web servers and web browsers to implement that. You don't just suddenly force JPEG into the spec and start sending it to all web browsers, you allow the web browser to request the resources that it supports, which keeps everyone happy and the technology moving forward. Does a screen reader need all the JPEGs on a web page? Probably not. Should you be forced to download a bunch of Javascript files if your device doesn't support Javascript? Probably not. Does Googlebot need to download all your Javascript files in order to index your site properly? Nope.
Source: I've developed an event-based web server like Node.js. It's called Rapid Server.
References:
- HTTP persistent connection (keep-alive)
- HTTP pipelining
- HTTP/1.1 Connections (RFC 2616)
Further reading:
- Is SPDY any different than http multiplexing over keep alive connections
- Why HTTP/2.0 does not seem interesting
Because they do not know what those resources are. The assets a web page requires are coded into the HTML. Only after a parser determines what those assets are can the y be requested by the user-agent.
Additionally, once those assets are known, they need to be served individually so the proper headers (i.e. content-type) can be served so the user-agent knows how to handle it.