How exactly HTTPS (ssl) works
What I don't understand is, couldn't a hacker just intercept the public key it sends back to the "customer's browser", and be able to decrypt anything the customer can.
Public/private key encryption is based on modulo arithmetics using prime numbers.
Such asymmetric encryption was only discovered in the mid-1970s. It is credited to Diffie and Hellman, and to Rivest, Shamir and Adleman. (Though, both actually rediscovered things already known by the British secret services.)
The wikipedia page on Diffie-Hellman has a detailed example of a secret key exchange through a public channel. While it does not describe SSL itself, it should be handy to make sense of why knowing a public key doesn't reveal the contents of a message.
You might also find this simple RSA example interesting.
What I don't understand is, couldn't a hacker just intercept the public key it sends back to the "customer's browser", and be able to decrypt anything the customer can.
First, understand that there are generally two steps of HTTPs communication.
1) Generate a shared symmetric key which can only be known between client and server, no one else knows it
2) With this shared symmetric key, client and server is able to safely communicate with each other without worrying about information being intercepted and decrypted by others.
So the question becomes, how can the client and server generate a secret shared key without being known by others in this open internet? This is the asymmetric algorithm coming to play, a demo flow is like below:
-- Client receives public key from server.
-- Client generates a key string "DummySharedKey" which will be later used as shared key, and encrypt it into "7$&^^%####LDD!!@" with server's public key, Man-in-the-middle may have the public key and might be able to intercept the encrypted data, but the data is useless to him as the data can only be decrypted by sever's private key.
-- Server receives the encrypted key string "7$&^^%####LDD!!@", and decrypt it into "DummySharedKey" with its private key
Above key exchange steps makes sure that only Client and Server can know the shared key is "DummySharedKey", no one else knows it.
So it's critical to understand that it is Client's responsibility to generate the shared key, NOT SERVER! (i think this is what confused you)
I also recommend you to take a look at this video which explains HTTPs very well. https://www.youtube.com/watch?v=JCvPnwpWVUQ&list=FLt15cp4Q09BpKVUryBOg2hQ&index=3
I'm studying related topics and read several blogs like how-https-works and how-does-https-work-rsa-encryption-explained/.
I have summarized a sequence diagram based on my study and hope it can be helpful to someone who comes to this thread.
For more details, you can check the blog mentioned.
Why is HTTPS required?
To verify whether the website is authenticated/certified or not (uncertified websites can do evil things). An authenticated website has a unique personal certificate purchased from one of the CA’s.
Who are CA’s (Certificate Authorities)?
CA’s are globally trusted companies like GoDaddy, GeoTrust, VeriSign etc who provide digital certificates to the websites.
What are public keys and private keys?
Keys are nothing but long random numbers used to encrypt/decrypt data.
Public keys are keys which can be shared with others. Private keys are meant to be kept private.
Suppose Jerry generates a private key and public key. He makes many copies of that public key and shares with others.
Now, others can only encrypt the data using the public key and that data can only be decrypted by the private key of Jerry.
Another approach is to use public keys to only decrypt the data and private keys to only encrypt the data.
How does a company get a certificate?
Website owner first generates a public key and private key, keeping the private key secret.
He gives a Certificate Signing Request file (CSR) and his public key to the CA.
CA then creates a personal certificate based on CSR including domain name, owner name, expiry date, serial no. etc and also adds an encrypted text (= digital signature) to the certificate and finally encrypts the whole certificate with the public key of the server and sends it back to the website owner.
This certificate is then decrypted with the private key of the website owner and finally, he installs it on the website.
Note: That encrypted text is the digital signature of the CA. That text is encrypted by the private key of the CA and can only be decrypted by a public key of CA.
When you install your operating system or Browser, root-certificates from many trusted CA's like GeoTrust, VeriSign, GoDaddy etc. come with it. These root-certificates contain the public key of that CA provider which helps decrypt the signature.
HTTPS security can be split into 2 parts (Handshakes):
1. To validate the certificate of a website:
1) When you enter the URL www.Google.com, Google’s server gives its public key and certificate (which was signed by GeoTrust) to the Browser.
2) Now browser has to verify the authenticity of the certificate i.e. it’s actually signed from GeoTrust or not. As browsers come with a pre-installed list of public keys from all the major CA’s, it picks the public key of the GeoTrust and tries to decrypt the digital signature of the certificate which was encrypted by the private key of GeoTrust.
3) If it’s able to decrypt the signature (which means it’s a trustworthy website) then it proceeds to the next step else it stops and shows a red cross before the URL.
2. To create a secure connection (encrypts outgoing and incoming data) so that no one else can read it:
1) As I mentioned, Google sends its public key when you enter www.Google.com . Any data encrypted with this public key can only be decrypted by Google’s private key which Google doesn’t share with anyone.
2) After validating the certificate, browser creates a new key let’s call it Session Key and make 2 copies of it. These keys can encrypt as well as decrypt the data.
3) The browser then encrypts (1 copy of session key + other request data) with the Google's public key . Then it sends it back to the Google server.
4) Google’s server decrypts the encrypted data using its private key and gets the session key , and other request data.
Now, see, server and browser both have got the same copies of session key of the browser. No one else has this key, therefore, only server and browser can encrypt and decrypt the data. This key will now be used for both to decrypt and to encrypt the data.
5) When Google sends the data like requested HTML document and other HTTP data to the browser it first encrypts the data with this session key and browser decrypts the data with the other copy of the session key.
6) Similarly, when browser sends the data to the Google server it encrypts it with the session key which server decrypts on the other side.
Note: This session key is only used for that session only. If the user closes the website and opens again, a new session key would be created.
can't get enough of web? behind the scenes when u type www.google.com in browser