SIMD, which stands for “Single Instruction Multiple Data,” is a set of special operations supported by some processors to perform a single operation on several numbers (usually 2 or 4) simultaneously. SIMD floating-point instructions are available on several popular CPUs: SSE/SSE2/AVX on recent x86/x86-64 processors, AltiVec (single precision) on some PowerPCs (Apple G4 and higher), and MIPS Paired Single (currently only in FFTW 3.2.x). FFTW can be compiled to support the SIMD instructions on any of these systems.
A program linking to an FFTW library compiled with SIMD support can
obtain a nonnegligible speedup for most complex and r2c/c2r
transforms. In order to obtain this speedup, however, the arrays of
complex (or real) data passed to FFTW must be specially aligned in
memory (typically 16-byte aligned), and often this alignment is more
stringent than that provided by the usual malloc
(etc.)
allocation routines.
In order to guarantee proper alignment for SIMD, therefore, in case
your program is ever linked against a SIMD-using FFTW, we recommend
allocating your transform data with fftw_malloc
and
de-allocating it with fftw_free
.
These have exactly the same interface and behavior as
malloc
/free
, except that for a SIMD FFTW they ensure
that the returned pointer has the necessary alignment (by calling
memalign
or its equivalent on your OS).
You are not required to use fftw_malloc
. You can
allocate your data in any way that you like, from malloc
to
new
(in C++) to a fixed-size array declaration. If the array
happens not to be properly aligned, FFTW will not use the SIMD
extensions.
Since fftw_malloc
only ever needs to be used for real and
complex arrays, we provide two convenient wrapper routines
fftw_alloc_real(N)
and fftw_alloc_complex(N)
that are
equivalent to (double*)fftw_malloc(sizeof(double) * N)
and
(fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N)
,
respectively (or their equivalents in other precisions).