Are URLs viewed during HTTPS transactions to one or more websites from a single IP distinguishable?
TLS reveals to an eavesdropper the following information:
- the site that you are contacting
- the (possibly approximate) length of the rest of the URL
- the (possibly approximate) length of the HTML of the page you visited (assuming it is not cached)
- the (possibly approximate) number of other resources (e.g., images, iframes, CSS stylesheets, etc.) on the page that you visited (assuming they are not cached)
- the time at which each packet is sent and each connection is initiated. (@nealmcb points out that the eavesdropper learns a lot about timing: the exact time each connection was initiated, the duration of the connection, the time each packet was sent and the time the response was sent, the time for the server to respond to each packet, etc.)
If you interact with a web site by clicking links in series, the eavesdropper can see each of these for each click on the web page. This information can be combined to try to infer what pages you are visiting.
Therefore, in your example, TLS reveals only A.com vs B.com, because in your example, the rest of the URL is the same length in all cases. However, your example was poorly chosen: it is not representative of typical practice on the web. Usually, URL lengths on a particular site vary, and thus reveal information about the URL that you are accessing. Moreover, page lengths and number of resources also vary, which reveals still more information.
There has been research suggesting that these leakages can reveal substantial information to eavesdroppers about what pages you are visiting. Therefore, you should not assume that TLS conceals which pages you are visiting from an eavesdropper. (I realize this is counterintuitive.)
Added: Here are citations to some research in the literature on traffic analysis of HTTPS:
Shuo Chen, Rui Wang, XiaoFeng Wang, Kehuan Zhang. Side-Channel Leaks in Web Applications: a Reality Today, a Challenge Tomorrow, IEEE Security & Privacy 2010. This paper is fairly mind-blowing; for instance, it shows how AJAX-based search suggestions can reveal what characters you are typing, even over SSL. Here is a high-level overview of the paper.
Kehuan Zhang, Zhou Li, Rui Wang, XiaoFeng Wang, Shuo Chen. Sidebuster: Automated Detection and Quantification of Side-Channel Leaks in Web Application Development. CCS 2010.
Marc Liberatore, Brian Neil Levine. Inferring the Source of Encrypted HTTPS Connections. CCS 2006.
George Danezis. Traffic Analysis of the HTTP Protocol over TLS, unpublished.
George Dean Bissias, Marc Liberatore, Brian Neil Levine. Privacy vulnerabilities in encrypted HTTPS streams. PET 2005.
Qixiang Sun, Daniel R. Simon, Yi-Min Wang, Wilf Russell, Venkata N. Padmanabhan, Lili Qiu. Statistical identification of encrypted web browsing traffic. IEEE Security & Privacy 2002.
Andrew Hintz. Fingerprinting websites using traffic analysis. PET2002.
Heyning Cheng, Ron Avnur. Traffic analysis of SSL encrypted web browsing. Class project, 1998.
Shailen Mistry, Bhaskaran Raman. Quantifying Traffic Analysis of Encrypted Web-Browsing. Class project, 1998.
The second choice. Mostly.
When a browser visits a HTTPS web site, it establishes a TLS tunnel, which involves an asymmetric key exchange (client and server agree on a shared secret). That key exchange mechanism uses the server public key, which the server shows as part of his certificate. The server certificate contains the server name (e.g. A.com
) and the client verifies that the name matches the one it expects (i.e. the server name in the URL). The server certificate is sent, fatally, before the key exchange, hence in plain view.
The rest of the URL is sent as part of the HTTP request which occurs within the encrypted tunnel, hence invisible to third parties. A given tunnel may be reused for several other HTTP requests, but (by construction) they are all for the same server (the same domain name).