Why does Chrome sometimes download a PDF instead of opening it?
Basically, this happens because the website tells the browser to do it. Occasionally, it's because the website developer decides they want this behaviour, e.g. common on file sharing sites. Other times, it's because it's a default option for whatever software they're using (e.g. forum or blogging software). Sometimes it's because the site dev has no idea what they're doing.
Content-Disposition
That's usually because the site sends a Content-Disposition
header in the response. Specifically, it can send either inline
or attachment
.
inline
is the default if not otherwise specified, and means the browser will open the file within the browser window if it is able to.
attachment
means to always download the file, never attempt to open it inside the browser.
If you open your browser's developer tools, you'll see that particular link sends the following response headers:
Content-Disposition: attachment; filename="Schubert-Sonata-21-B-flat.pdf"
Content-Type: application/pdf
This tells the browser to always download (attachment
) the file, and to give it the default filename of Schubert-Sonata-21-B-flat.pdf
rather than inferring it from the URL. Additionally, it does tell the browser (correctly) that it's an application/pdf
file - but since it's an attachment
the browser will still default to downloading.
Inline handling details
When a Content-Disposition
is inline (or unspecified), the browser will try to open the file in the default embedded viewer. This only works when the browser knows what file type it is, and the browser knows how to open that type.
Type detection
The file type can be specified by the server with a Content-Type
header. For example, the most common inline types are text/html
, application/javascript
and text/css
, making up the three major parts of a modern website. You can also have more esoteric types like application/pdf
.
Another possibility is the server has specified a Content-Type
of application/octet-stream
. This is the most generic type, and it tells the browser that the file is just arbitrary data - at which point the only thing the browser can do is download it (in theory - we'll get to that).
When a Content-Type
is not specified by the server (and sometimes even when it is), the browser can perform what is known as sniffing to try to guess the type by reading the file and looking for patterns.
Type handling
Upon receiving a file with an inline
or unspecified disposition, the browser needs to try to open it within the browser if possible. To do this, it looks at the file type, and if it recognises the type it will try to open it. Most browsers will open any text/
type in a simple text viewer, will try to render text/html
as a webpage, might open application/json
in a special syntax-highlighted viewer, etc..
The type application/octet-stream
was handled specially. Since it's supposed to be the most generic type, denoting an arbitrary stream of bytes, there isn't supposed to be any handler that can apply to all files of this "type". For example, in Firefox, this manifests as an inability to set the default handler for application/octet-stream
.
Some websites have also used non-standard types. I've seen application/force-download
used - which ends up as a download because the browser does not recognise or know what else to do with the type, but does not enjoy the special handling that application/octet-stream
does.
A bit of a history lesson
To see how PDFs are handled, we can delve a bit into web history. See, in the past, browsers had no idea what a PDF is. So they could not open it. But we've seen PDFs being opened in browsers long before built-in PDF viewers were a thing, so how did that work?
It used to be possible to extend browser functionality with far more control than what you can do with limited extensions/addons these days. Those were most generically known as plugins. In Internet Explorer, they were ActiveX controls; in Mozilla Firefox and later Google Chrome they were NPAPI plugins. These plugins were capable of doing everything any other program could, and could additionally register themselves as a handler for a specific file type that might be otherwise unrecognised by the browser. (Incidentally, this was later found to be a huge security risk and support for these powerful plugins was gradually dropped...)
In the days of plugins, you would go and install Adobe Acrobat Reader, which would then install an ActiveX or NPAPI plugin that would register the application/pdf
MIME type and tell the browser to open those types inline using the plugin.
Of course, after a number of security and performance issues caused by these plugins, the major browser vendors decided to incorporate their own PDF viewers while phasing out support for most plugins. The only one we still see is Adobe Shockwave Flash, which handles application/x-shockwave-flash
.
There's actually still some leftover controls for this, e.g. in Firefox the Preview in Firefox
option still exists:
In the past, this would have allowed the choice between multiple plugins that registered that type. For example, the list of registered types for Flash:
Those days were also before a lot of the media support that came with HTML5. It wasn't just PDFs - your browser would have no idea how to handle a MP4 container or H.264 video, no idea how to play a MP3 file, etc., etc.. You would see plugins provided by media players like VLC or even Windows Media Player, or websites would embed a media player built in Flash.
I found an explanation. According to an answer I found, it appears that Chrome will download a PDF if the MIME content type is set not to application/pdf
but rather an "incorrect or generic MIME type", application/octet-stream
.
Furthermore, "Most web servers send unknown-type resources using the default application/octet-stream
MIME type. For security reasons, most browsers do not allow setting a custom default action for such resources, forcing the user to store it to disk to use it."
This is due to the HTTP Content-Disposition
header specifying that the file is an attachment. This instructs the browser to download the file, rather than to open it directly.
There is a Chrome add-on that can override this behavior. The following image is from the Firefox developer tools: