Why is weakness of SHA-1 considered a threat to TLS security
If they find a different value that produces the same SHA-1 digest as the website's actual public key, what have they gained?
Let's assume that we are dealing with MD5. SHA-1 attacks are still a work in progress and I expect they follow about the same lines as the attacks that are possible on MD5.
With MD5 it is possible to create two different values that produce a hash collision (within certain boundaries). That however doesn't mean that it is possible to create a hash collision given just any input data and the hash value over that data.
In your example that means that you cannot just get an MD5 or SHA1 signed certificate and then create a fake. So internal CA certificates created within a CA aren't much at risk. It is however possible to send a specially formatted certificate to a CA and then change the signed data in such a way that the signature still verifies.
So the same organization that requests the certificate is the one doing the attacking. This means they already got the private key. So replacing the public key could be possible, but would not make too much sense.
So what can they do. Well, it's about the "some other information" within the certificate. That can be changed by the organization requesting the information. Basically they can put in whatever they want in there. This means that a certificate for a certain domain may become a certificate for another domain. The certificate could become a certificate that can be used to create other certificates (although path length constraints may alleviate that issue).
In other words, the certificate authority signs a hash, which may be valid for other data. The certificate authority doesn't know what certificate it is signing anymore! You, as user, may get a certificate for mail.google.com with a valid signature that was created for happy.attacker.com, with the attacker holding the private key. In other words, you may not be communicating with Google at all!
Misconceptions:
The hashing function is applied to a subject's public key.
No, it's applied to the certificate's attributes that require explicit trust (so also the domain name in the example); not just the public key.
The CA then uses their private key to encrypt the hash output.
No, signature generation is not encryption. For RSA it also uses modular exponentiation, but that doesn't equate encryption. Even the RSA specifications - PKCS#1 - differentiate between signature generation and encryption.
The output of that encryption along with some other information becomes the certificate.
That some information is quite important, and a lot of it is signed along with the public key. Validity period, issuer, public key, signature, subject, key usage, path length constraints, revocation related information are all pretty important.
So, say someone is determined to impersonate a site that has an SHA-1 certificate and attacks the SHA-1 digest, by trying to produce a hash collision.
This would still take about 2^159 operations. So no, that's not likely to ever be attempted by a serious researcher.
They could now forge a certificate that appears to be signed by the CA but instead of the actual public key contains X, but wouldn't this be useless to them without a matching private key for X?
Well, yes. And actually, a public key doesn't consist of a single value X, it consists of a modulus and exponent, which are ASN.1/DER encoded within X5.09 certificates. So X would have to have a pretty specific structure. If you try to break the certificate this way (which - as already established above - is unfeasible). So it would only be useful to replace the key with a specific X.
The encoding of the key and certificate that is signed may however contain additional information. This can be abused by an attacker, and it is actually required to perform the actual MD5 attack.
I thought hashes are used in digital signatures mainly for performance reasons, due to the fact that public key cryptography like RSA is slow for large inputs.
No, not just performance reasons. RSA requires padding, which leaves about 11 bytes less than the key (modulus) size for the message - in case the older PKCS#1 v1.5 padding is used anyway. The signed attributes in general would be larger than that. In that case you need to compress the data using a hash method. ECDSA, another public key algorithm, can actually not encrypt any data.
In general it is better to think of the hash method as a configuration option for the signing algorithm. In other words, it is an integral part of the signature algorithm. This is also how it is specified in the various standards.