What legitimate uses do browser proxies have?
One of the early uses of HTTP proxies was caching proxies in order to make better use of the expensive bandwidth and to speed up surfing by caching heavily used content near the user. I remember a time when ISPs employed explicit mandatory proxies for the users. This was at a time when most content on the internet was static and not user-specific and thus caching was very effective. But even many years later transparent proxies (i.e. no explicit configuration needed) were still used in mobile networks.
Another major and early use case is proxies in companies. These are used to restrict access and also still used for caching content. While many perimeter firewalls employ transparent proxies where no configuration is needed, classic secure web gateways usually have at least the ability to require explicit (non-transparent) proxy access. Usually not all traffic is send through the proxy, i.e. internal traffic is usually excluded with some explicit configuration or using a PAC file which defines the proxy depending on the target URL. A commonly used proxy in this area is the free Squid proxy which provides extensive ACL's, distributed caching, authentication, content inspection, SSL interception etc.
Using an explicit proxy for filtering has the advantage that in-band authentication against the proxy can be required, i.e. identifying the user by username instead of source IP address. Another advantage is that the connection from the client ends at the proxy, and the proxy then only forwards the request to the server given in the HTTP proxy request if the ACL check is fine and maybe after rewriting parts of the request (like making sure that the Host
header actually matches the target host). Contrary to this in inline-IDS/IPS (which is the basic technology in many NGFW) the connection setup from the client is already forwarded to the final server and ACL checks are done based on the Host
header, which might or might not match the IP address the client is connecting to. This is actually used by some malware C2 communication to bypass blocking or detection by claiming a whitelisted host in the Host
header but having actually a different target as IP address.
Some examples as follows:
- To enable a firewall rule like 'proxy server to any destination on 80, 443' instead of from 'any internal to any external'
- To monitor all websites visited through logs
- To control, limit, filter websites visited through enforcing rules - these could be lists of approved sites, blacklisted sites, content categories etc
- To enforce user authentication to use the internet - e.g. limiting to domain users, certificate holders
- You could have separate egress points for different contexts if you're on VPN, in local office, on workstation, on a server
For the "original" reason, think back to 1993, when Netscape 0.9 was released. It had a "Mail and Proxies" Options dialog (per a copy of the manual). At that time, most Internet links were 56-kbit or fractional T1 lines between university campuses and government. There were several ways an HTTP proxy could help or even be required:
- The web-browser might be on a TCP/IP LAN with no (IP-level) routing to the Internet, but with a host dual-homed on that LAN and the Internet. The dual-homed host could run an HTTP proxy service and a DNS proxy service, allowing clients on the LAN to access the Web. (Note that the RFCs describing the standard private-address ranges and NAT, RFC 1597 and RFC 1631, were not published until March and May of 1994.)
- Even if the LAN had routable addresses, or even after NAT was deployed, the off-site bandwidth was probably a lot less than the local bandwidth between clients and a potential proxy location. As long as the clients were browsing a significant amount of the same, static or slowly-changing, content, the proxy made things better for the clients (by returning cached content quickly) as well as the network operator (by freeing up bandwidth for other network functions, or reducing charges for data usage when billing was per-packet or per-byte).
- If enough end users were behind proxies, it took the edge off what would 10 years later be called the "Slashdot effect": origin servers for worldwide-popular content would only have to serve it to each proxy, not to each end user.
Of course, sending all HTTP traffic through a designated process also makes that process a good control-point for applying security policy: filtering by keywords in the (decoded) request or response body; only allowing access to users who authenticate to the proxy; logging.
Yes, there are organizations that "push" a proxy policy to end-user devices they control, and enforce it by a restrictive policy at the border routers.
Also note that even if you think you're browsing on a "clean" network, your traffic may be going through a proxy; see for example the Transparent Proxy with Linux and Squid mini-HOWTO.
But it's true that an explicitly-configured proxy may give little advantage or even make browsing a lot worse on today's Internet. When popular websites use CDNs, and most content is dynamic, and even content that seems like it could be cached (like Google Maps tiles and Youtube video data) is varied based on browser version, OS, screen size, or even a random cookie meant to make it uncacheable, caching saves little bandwidth for a cache near the end-user (although origin servers often have caches in front of them). For the uncacheable content, another cache adds RTT to every request, making browsing slower.