Why do some web servers still provide information on vendor and version in the HTTP response headers
Yes, in theory by advertising vendor and version in the banner makes an attacker's job easier, but like only a little bit.
Even if you don't advertise it, it can be figured out from the behaviour of the app. Take for example, the network scanning tool nmap:
Nmap provides a number of features for probing computer networks, including host discovery and service and operating system detection. These features are extensible by scripts that provide more advanced service detection, vulnerability detection, and other features.
Source: wikipedia
After TCP and/or UDP ports are discovered using one of the other scan methods, version detection interrogates those ports to determine more about what is actually running. The nmap-service-probes database contains probes for querying various services and match expressions to recognize and parse responses.
Source: nmap documentation.
Basically, nmap will start by trying to decide if it's apache / nginx / IIS, etc, which it will do by sending a specific packet to which it knows that the different web servers will respond differently. Then, once it knows that it's apache, it will send packets that target behaviours that have changed between one version and another, possibly because of a bug fix or because of a new feature.
As noted by @paj28 in comments, nmap is quite good at detecting the vendor, but hit-and-miss with version, when nmap displays detailed version info, that's probably because there was a banner. Of course, if you're trying to hack in, why bother trying to figure out version at all; once you know it's apache, why not just run all the apache exploit scripts from metasploit at it and see if any of them stick?
So yes, in theory it's possible to write a specification for "Here's how all web servers MUST behave", but that means that you're not allowing web servers to ever innovate, or fix bugs. Moreover, web servers will want to differentiate themselves with features that other servers don't have.
As pointed out by @TripeHound, version banners can be considered a win for security as they allow sysadmins to easily inventory their systems and keep them up to date.
TL;DR: Banner or not, you can figure out what web server and OS you're talking to.
Nmap Example
Since this post got popular, I'll add and some examples:
Apache httpd puts version info right in an HTTP header:
HTTP/1.1 200 OK
Date: Thu, 17 Oct 2019 14:26:10 GMT
Server: Apache/2.4.6 (CentOS)
and unsurprisingly, nmap picks this up:
~ nmap 192.168.56.107 -p 80 -A --version-all
Starting Nmap 7.60 ( https://nmap.org ) at 2019-10-17 07:58 EDT
Nmap scan report for 192.168.56.107
Host is up (0.0020s latency).
PORT STATE SERVICE VERSION
80/tcp open http Apache httpd 2.4.6 ((CentOS))
| http-methods:
|_ Potentially risky methods: TRACE
|_http-server-header: Apache/2.4.6 (CentOS)
|_http-title: Apache HTTP Server Test Page powered by CentOS
Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 14.23 seconds
If we try something a bit trickier, Apache Tomcat does not give an obvious banner:
HTTP/1.1 200
Accept-Ranges: bytes
ETag: W/"1896-1527518937672"
Last-Modified: Mon, 28 May 2018 14:48:57 GMT
Content-Type: text/html
Content-Length: 1896
Date: Thu, 17 Oct 2019 14:35:05 GMT
Connection: close
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Apache Tomcat</title>
</head>
...
we see that nmap gets the vendor right, but does not have a guess at the version:
~ nmap -A -p 8080 192.168.56.1
Starting Nmap 7.60 ( https://nmap.org ) at 2019-10-17 10:34 EDT
Nmap scan report for 192.168.56.1
Host is up (0.0013s latency).
PORT STATE SERVICE VERSION
8080/tcp open http Apache Tomcat
| http-methods:
|_ Potentially risky methods: PUT DELETE
|_http-open-proxy: Proxy might be redirecting requests
|_http-title: Apache Tomcat
Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 21.71 seconds
Attackers just do not care. When you have a popular website, you find bots probing for every (vulnerable) version of phpmyadmin on every conceivable path, even when your site does not use a database at all. In the same manner, they probably probe for security holes in every version of, e.g., apache. Lying on or omitting the Server
header does not buy you much security here.
But when you're the admin, you can use the Server
header to quickly identify, if the server is still unpatched, what may prevent you from forgetting that you did not update, yet.
Of course your concerns are valid, e.g., when somebody is gathering information and may manually sort out exploits, that would work when they knew the right version, but in most cases it is just security by obscurity. The important point is, that you should make sure to keep the software updated, so you do not need to care about an attacker knowing that you run the latest version.
Historically, many servers returned headers like this:
Server: Apache/2.4.1 (Unix) mod_php/1.2 mod_ssl/0.9
This does give away a bit more information than you want, so most servers now default to headers like this:
Server: Apache
In fact, most web servers make it difficult to turn off this header. The reason? Marketing. They want to appear in surveys of what the most common web servers are.
Regarding fingerprinting, a number of tools can quite reliably determine Apache vs NGINX vs IIS, etc. But fingerprinting the version of a server is much harder, open source tools do not do it well, and it is impossible with any granularity.