SHA, RSA and the relation between them
RSA is actually two algorithms, one for asymmetric encryption, and one for digital signatures (the signature algorithm is traditionally -- but incorrectly -- described as "encryption with the private key" and this is an endless source of confusion).
Asymmetric encryption uses keys. Keys are parameters to the algorithm; the algorithm itself is the same for everybody (in software terms, it is the executable file) while keys vary between users. In a key pair, the public key is the key which is used to encrypt data (convert a piece of data, i.e. a sequence of bytes, into another sequence of bytes which is unfathomable for everybody) while the private key is the key which allows one to decrypt data (i.e. reverse the encryption).
Whereas in symmetric encryption, the encryption and decryption keys are identical, but with asymmetric encryption, the encryption and decryption keys are distinct from each other (hence the name); they are mathematically linked together, but it should be unfeasible (i.e. too hard to do with a mere bunch of computers) to recover the decryption key from the encryption key. This is why the encryption key can be made public while the decryption key is kept private: revealing the public key does not reveal the private key.
What asymmetric encryption achieves is no trivial feat. The possibility to reveal the public key while not saying too much about the private key, but such that both keys work together (what is encrypted with the public key can be decrypted by the corresponding private key, but none other), requires a lot of mathematics! RSA is full of mathematics. This contrasts with symmetric encryption algorithms which are "just" ways to make a big mess of data by mixing bits together.
Asymmetric encryption is the natural tool to use when we want to allow for confidential transmissions between any two users among a big population. If you have 1000 users, and you want any of the two users to be able to exchange data with each other without allowing anybody to spy on them (including the 998 other users), then the classical solution would be to distribute keys for symmetric encryption to every pair of users. Alice and Bob would have a known, common key; Alice and Charlie would also have a shared key (not the same); and so would Bob and Charlie; and so on. Each user would need to remember his "shared key" with every other one of the 999 other users, and you would have 499500 keys in total. Adding a 1001th user would involve creating 1000 additional symmetric keys, and give one to each of the 1000 existing users. The whole key distribution soon turns into an unusable/infeasible nightmare. With asymmetric encryption though, things are much more straightforward in terms of key distribution: every user just has to remember his/her own private key; and the public keys(being public) can be distributed through some sort of broadcasting (e.g. a directory).
RSA has some operational constraints. With the most used variant (the one known as PKCS#1 v1.5), if the size of the RSA key is "1024 bits" (meaning that the central mathematical component of the key pair is a 1024-bit integer), then RSA can encrypt a message of up to 117 bytes in length, and yield an encrypted message of length 128 bytes. That limited size, and the size increase when encrypting, are unavoidable consequences of the mathematical structure of the RSA encryption process. Due to these constraints, we do not usually encrypt data directly with RSA; instead, we select a small sequence of random bytes, which we call session key. We encrypt the session key with RSA; and then we use the session key with a symmetric encryption algorithm to process the whole message. This is called hybrid encryption.
SHA is the common name for a family of cryptographic hash functions. The very first member of that family was described under the name 'SHA' but was soon deprecated after a serious weakness was found in it; a fixed version was published under the name SHA-1
(the weak version is colloquially known as SHA-0
). Four new SHA-like functions were added to the family later on (SHA-224
, SHA-256
, SHA-384
and SHA-512
: which are collectively known as 'SHA-2').
Hash functions have no key. A hash function is an executable algorithm which is pure code. There is one SHA-1 and everybody uses the same.
Hash functions "just" make a big mess of the input data, which is not meant to be unraveled. Actually, it is meant to be resilient to unraveling. Even though everybody knows all that is to be known about a hash function (there is no key, only code, and nothing of it is secret), it still turns out to be "too hard" to recompute a matching input message, given the hash function output. It is even unfeasible to find two distinct input messages which, when given to the hash function, yield the same output; there must exist such pairs of messages -- called collisions -- because a hash function output has a fixed small size, while accepted inputs can be widely larger, so there are more possible inputs than possible outputs. It is a mathematical certainty that collisions exist for every hash function, but actually finding one is another matter.
A hash function, as itself, does not do anything of immediate high value, but it is a very important building block for other algorithms. For instance, they are used with digital signatures. A digital signature "proves" conscious action of a designated signer over a piece of data; like asymmetric encryption, this involves key pairs and mathematics, and associated constraints on the signed data. A hash function h is such that signing h(m) is as good as signing m itself: since it is unfeasible to find two distinct messages which hash to the same value, approval of the hash output is good enough. The point being that the output of the hash function is small enough to be usable with the mathematics hidden in the signature algorithm, even if the message itself is big (SHA-1 can process gigabytes of data, and yield a 20-byte output).
It can be noted that some recent variants of RSA-the-encryption-algorithm (with the 'OAEP padding' from PKCS#1 v2.0) internally uses hash functions. Hash functions are good "randomizers" (the output of a hash function does not exhibit recognizable structure) and this makes them appropriate for building more elaborate cryptographic algorithms with good security features.
In SSL/TLS (HTTPS is just HTTP-within-a-SSL/TLS-tunnel), hash functions are used for several things:
- as part of asymmetric encryption and/or digital signatures;
- as part of HMAC to allow client and server to verify that exchanged data has not been altered in transit;
- as a building brick for a Key Derivation Function, which "expands" a given session key into several symmetric keys used for symmetric encryption and integrity checks in both directions of the tunnel.
The KDF relies on the "randomizing" and non-invertibility of the hash function. In SSL/TLS up to TLS 1.1, the KDF is built over two hash functions, MD5 and SHA-1, in an attempt to make it robust even if weaknesses were later found in either MD5 or SHA-1. It turns out that weaknesses were found in both, but it did not allow any break on the KDF as used in SSL/TLS. Nevertheless, TLS 1.2 switched to another KDF which uses a single, configurable hash function, usually SHA-256, for which no weakness is currently known.
SHA is not used in RSA.
However, cryptographic protocols like SSL, SSH and others, use different algorithms like SHA and RSA for different purposes. SSL uses RSA (encryption) or DH (with RSA, DSA or ECDSA signature) for key negotiation and AES or 3DES for data encryption. In the PGP protocol/file format, RSA, DSA and ElGamal are used for signing and encrypting.