Computationally simple, lightweight replacement for SSL/TLS
Edit: after some effort, I did re-implement a RAM-efficient SSL library, that can run in the kind of RAM amounts indicated below. It has many more features and flexibility than my previous creations, and yet it is still very small. More importantly, it is also opensource (MIT license). Enjoy: https://www.bearssl.org/
It is possible to implement a SSL/TLS client (or server) in about 21 kB of ARM code (thumb), requiring less than 20 kB of RAM when running(*). I know it can be done because I did it (sorry, not open source). Most of the complexity of TLS comes from its support of many kinds of cryptographic algorithms, which are negotiated during the initial handshake; if you concentrate on only one set of cryptographic algorithms, then you can strip the code down to something which is quite small. I recommend using TLS 1.2 with the TLS_RSA_WITH_AES_128_CBC_SHA256
cipher suite: for that one, you will only need implementations for RSA, AES and SHA-256 (for TLS 1.1 and previous, you would also need implementations for both MD5 and SHA-1, which is not hard but will spend a few extra kBytes of code). Also, you can make it synchronous (in plain TLS, client and server may speak simultaneously, but nothing forces them to do so) and omit the "handshake renegotiation" part (client and server perform an initial handshake, but they can redo it later on during the connection).
The trickiest part in the protocol implementation is about the certificates. The server and the client authenticate each other by using their respective private keys -- with RSA, the server performs a RSA decryption, while the client computes a RSA signature. This provides authentication as long as client and server known each other public keys; therefore, they send their public keys to each other, wrapped in certificates which are signed blobs. A certificate must be validated before usage, i.e. its signature verified with regards to an a priori known public key (often called "root CA" or "trust anchor"). The client cannot blindly use the public key that the server just sent, because it would allow man-in-the-middle attacks.
X.509 certificate parsing and validation is a bit complex (in my implementation, it was 6 kB of code, out of the 21 kB). Depending on your setup, you may have lighter options; for instance, if you can hardcode the server public key in the client, then the client can simply use that key and throw away the server certificate, which is "just a blob": no need for parsing, no certification, very robust protocol. You could also define your own "certificate" format. Another possibility is to use SRP, which is a key exchange mechanism where both parties authenticate each other with regards to the knowledge of a shared secret value (the magic of SRP is that it is robust even if the shared secret has relatively low entropy, e.g. is a password); use TLS_SRP_SHA_WITH_AES_128_CBC_SHA
.
The point here is that even with a custom protocol, you will not get something really lighter than a stripped-down TLS, at least if you want to keep it robust. And designing a robust protocol is not easy at all; TLS got to the point of being considered as adequately secure through years of blood and tears. So it is really better to reuse TLS than inventing your own protocol. Also, this makes the code much easier to test (you can interoperate with existing SSL/TLS implementations).
(*) Out of the 20 kB of RAM, there is a 16.5 kB buffer for incoming "records", because TLS states that records may reach that size. If you control both client and server code, you can arrange for a smaller maximum record size, thus saving on the RAM requirements. Per-record overhead is not much -- less than 50 bytes on average -- so you could use 4 kB records and still have efficient communication.
TLS still seems like the tool for the job, just pick the ciphers wisely, and then profile which of the cipher implementation works well on your platform. Also, the official implementations of crypto libraries tend to be horribly slow comparing to hacking stuff, like the algorithms from John The Ripper, so maybe you could use these. Have you tried MatrixSSL yet?
Whatever you do I would go with an existing protocol that has been around for a while and strip away / hardcode some options to make it light, rather than invent your own protocol.
If your needs are simple enough that you could get by with preshared symmetric keys, then adding a subset of IPSEC to the TCP/IP stack seems like it might be doable without a huge code footprint. Might even be possible to maintain interoperability.
Similarly a stripped down SSL or Kerberos is probably an option, as you won't need most of the fancier authentication and key management aspects, nor most of the ciphers.