Smallest AES implementation for microcontrollers?
I'm wondering how did you get 7.5kB of RAM usage with axTLS. Looking at the code, all the context is stored in this structure:
typedef struct aes_key_st
{
uint16_t rounds;
uint16_t key_size;
uint32_t ks[(AES_MAXROUNDS+1)*8];
uint8_t iv[AES_IV_SIZE];
} AES_CTX;
Size of this structure is 2 + 2 + 4 * 15 * 8 + 16 = 504. I see no global variables in aes.c, automatic variables are all small, so stack usage is also reasonable. So where does 7.5kB go? Perhaps you're trying to use the whole library instead of just extracting AES implementation from it?
Anyway, this implementation looks pretty simple, I'd rather stick to this code and try to optimize it. I know it can be tricky, but learning the AES details can help you at least to estimate the absolute minimum RAM usage.
Update: I've just tried to compile this library on IA-32 Linux and write a simple CBC AES-128 encryption test. Got the following results (first number is the section length hex):
22 .data 00000028 0804a010 0804a010 00001010 2**2
CONTENTS, ALLOC, LOAD, DATA
23 .bss 00000294 0804a040 0804a040 00001038 2**5
ALLOC
That's just 660 bytes of .bss (I've declared AES_CTX as a global variable). Most of .data is occupied by IV and key. I don't include .text here, as you'll get totally different result on PIC (data sections should be nearly the same size on both architectures).
I know this question is a bit old, but I've just recently had to research it myself as I'm implementing AES128 on a PIC16 and an 8051, and so I was curious about this question too.
I've used something like this: http://cs.ucsb.edu/~koc/cs178/projects/JT/aes.c and my ram usage is a couple hundred bytes and the binary size is less than 3kb ROM.
My best advice is to read up on the Wikipedia page http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation and understand the different modes, for instance how AES in OFB mode sorta utilizes ECB mode as a basic building block. Also the XOR'ing (in OFB-mode) makes it a symmetrical operation, so encrypt/decrypt is the same function which also saves space.
When I understood how AES really worked, I could implement it in C and then test it against the NIST specification** (do this! much code found online is flawed) and only implement what I absolutely needed.
I was able to fit AES128 on an 8051 alongside with some other RF firmware by doing this customization and optimization. The RAM usage (for the whole system) went down from ~2.5kb to just below 2kb, meaning we did not have to upgrade to an 8051 with 4kb SRAM, but could keep using the cheaper 2kb SRAM version.
** Test Vectors are in Appendix F in: http://csrc.nist.gov/publications/nistpubs/800-38a/addendum-to-nist_sp800-38A.pdf
EDIT:
Finally got the code on Github: https://github.com/kokke/tiny-AES-c
I've optimized a bit for size. GCC size output when compiled for ARM:
$ arm-none-eabi-gcc -O2 -c aes.c -o aes.o
$ size aes.o
text data bss dec hex filename
1024 0 204 1228 4cc aes.o
So the resource usage is now 1KB code, 204 bytes RAM.
I don't remember how to build for the PIC, but if the 8bit AVR Atmel Mega16 is anything like the PIC, the resource usage is:
$ avr-gcc -Wall -Wextra -mmcu=atmega16 -O2 -c aes.c -o aes.o
$ avr-size aes.o
text data bss dec hex filename
1553 0 198 1751 6d7 aes.o
So 1.5K code and 198bytes RAM.
I recently took the axTLS implementation and worked on shrinking it as much as I could. You can easily generate the S-boxes yourself and save yourself a few hundred bytes.
static uint8_t aes_sbox[256]; /** AES S-box */
static uint8_t aes_isbox[256]; /** AES iS-box */
void AES_generateSBox(void)
{
uint32_t t[256], i;
uint32_t x;
for (i = 0, x = 1; i < 256; i ++)
{
t[i] = x;
x ^= (x << 1) ^ ((x >> 7) * 0x11B);
}
aes_sbox[0] = 0x63;
for (i = 0; i < 255; i ++)
{
x = t[255 - i];
x |= x << 8;
x ^= (x >> 4) ^ (x >> 5) ^ (x >> 6) ^ (x >> 7);
aes_sbox[t[i]] = (x ^ 0x63) & 0xFF;
}
for (i = 0; i < 256;i++)
{
aes_isbox[aes_sbox[i]]=i;
}
}
You can get the full source at: http://ccodeblog.wordpress.com/2012/05/25/aes-implementation-in-300-lines-of-code/