Allocating aligned memory


Just a quick note that may be use­ful to some­one else. As you may know SSE2 intro­duced a new instruc­tion: MOVDQA (MOVe Dou­ble Quad­word Aligned). This is used to move 128 bit (16 bytes) of data from/to memory/xmm reg­is­ters. This instruc­tion only works if the data is aligned the the 16 byte bound­ary. There is also another instruc­tion for the unaligned case, but the aligned ver­sion is way faster. So let’s sum­ma­rize some tech­niques to get an aligned mem­ory area

  • For local, sta­tic and mem­ber vari­ables you can append __attribute__ (( aligned (16 ) ) to the type def­i­n­i­tion. Example:
  • struct A { int val; } __attribute__ ((aligned ( 16 ) );

  • For dynam­i­cally allo­cated mem­ory the usual mal­loc is not enough, but there is a posix_memalign which has the seman­tics that we need. It is defined as:
  • int posix_memalign(void **memptr, size_t alignment, size_t size);

So we have to pass a pointer to the pointer that will receive the newly allo­cated mem­ory, the required align­ment (which has to be a power of two) and the allo­ca­tion size. Mem­ory allo­cated this way can (at least on the glibc imple­men­ta­tion) be freed using the usual free func­tion.

, , , , , ,

Allocating aligned memory


Just a quick note that may be use­ful to some­one else. As you may know SSE2 intro­duced a new instruc­tion: MOVDQA (MOVe Dou­ble Quad­word Aligned). This is used to move 128 bit (16 bytes) of data from/to memory/xmm reg­is­ters. This instruc­tion only works if the data is aligned the the 16 byte bound­ary. There is also another instruc­tion for the unaligned case, but the aligned ver­sion is way faster. So let’s sum­ma­rize some tech­niques to get an aligned mem­ory area

  • For local, sta­tic and mem­ber vari­ables you can append __attribute__ (( aligned (16 ) ) to the type def­i­n­i­tion. Example:
  • struct A { int val; } __attribute__ ((aligned ( 16 ) );

  • For dynam­i­cally allo­cated mem­ory the usual mal­loc is not enough, but there is a posix_memalign which has the seman­tics that we need. It is defined as:
  • int posix_memalign(void **memptr, size_t alignment, size_t size);

So we have to pass a pointer to the pointer that will receive the newly allo­cated mem­ory, the required align­ment (which has to be a power of two) and the allo­ca­tion size. Mem­ory allo­cated this way can (at least on the glibc imple­men­ta­tion) be freed using the usual free func­tion.

, , , , , ,