How to allocate 16byte memory aligned data

The address returned by memalign function is 0x11fe010, which is a multiple of 0x10. So the function is doing a right thing. This also means that your array is properly aligned on a 16-byte boundary. What you are doing later is printing an address of every next element of type float in your array. Since float size is exactly 4 bytes in your case, every next address will be equal to the previous one +4. For instance, 0x11fe010 + 0x4 = 0x11FE014. Of course, address 0x11FE014 is not a multiple of 0x10. If you were to align all floats on 16 byte boundary, then you will have to waste 16 / 4 - 1 bytes per element. Double-check the requirements for the intrinsics that you are using.


The memory you allocate is 16-byte aligned. See:
&A[0] = 0x11fe010
But in an array of float, each element is 4 bytes, so the second is 4-byte aligned.

You can use an array of structures, each containing a single float, with the aligned attribute:

struct x {
    float y;
} __attribute__((aligned(16)));
struct x *A = memalign(...);

AFAIK, both memalign and posix_memalign are doing their job.

&A[0] = 0x11fe010

This is aligned to 16 byte.

&A[1] = 0x11fe014

When you do &A[1] you are telling the compiller to add one position to a float pointer. It will unavoidably lead to:

&A[0] + sizeof( float ) = 0x11fe010 + 4 = 0x11fe014

If you intend to have every element inside your vector aligned to 16 bytes, you should consider declaring an array of structures that are 16 byte wide.

struct float_16byte
{
    float data;
    float padding[ 3 ];
}
    A[ ELEMENT_COUNT ];

Then you must allocate memory for ELEMENT_COUNT (20, in your example) variables:

struct float_16byte *A = ( struct float_16byte * )memalign( 16, ELEMENT_COUNT * sizeof( struct float_16byte ) );

Tags:

C

Memory

Sse

Icc