MPI Plan Creation

Next: MPI Wisdom Communication, Previous: MPI Data Distribution Functions, Up: FFTW MPI Reference

6.12.5 MPI Plan Creation

Complex-data MPI DFTs

Plans for complex-data DFTs (see 2d MPI example) are created by:

     fftw_plan fftw_mpi_plan_dft_1d(ptrdiff_t n0, fftw_complex *in, fftw_complex *out,
                                    MPI_Comm comm, int sign, unsigned flags);
     fftw_plan fftw_mpi_plan_dft_2d(ptrdiff_t n0, ptrdiff_t n1,
                                    fftw_complex *in, fftw_complex *out,
                                    MPI_Comm comm, int sign, unsigned flags);
     fftw_plan fftw_mpi_plan_dft_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
                                    fftw_complex *in, fftw_complex *out,
                                    MPI_Comm comm, int sign, unsigned flags);
     fftw_plan fftw_mpi_plan_dft(int rnk, const ptrdiff_t *n,
                                 fftw_complex *in, fftw_complex *out,
                                 MPI_Comm comm, int sign, unsigned flags);
     fftw_plan fftw_mpi_plan_many_dft(int rnk, const ptrdiff_t *n,
                                      ptrdiff_t howmany, ptrdiff_t block, ptrdiff_t tblock,
                                      fftw_complex *in, fftw_complex *out,
                                      MPI_Comm comm, int sign, unsigned flags);

These are similar to their serial counterparts (see Complex DFTs) in specifying the dimensions, sign, and flags of the transform. The comm argument gives an MPI communicator that specifies the set of processes to participate in the transform; plan creation is a collective function that must be called for all processes in the communicator. The in and out pointers refer only to a portion of the overall transform data (see MPI Data Distribution) as specified by the ‘local_size’ functions in the previous section. Unless flags contains FFTW_ESTIMATE, these arrays are overwritten during plan creation as for the serial interface. For multi-dimensional transforms, any dimensions > 1 are supported; for one-dimensional transforms, only composite (non-prime) n0 are currently supported (unlike the serial FFTW). Requesting an unsupported transform size will yield a NULL plan. (As in the serial interface, highly composite sizes generally yield the best performance.)

The advanced-interface fftw_mpi_plan_many_dft additionally allows you to specify the block sizes for the first dimension (block) of the n₀ × n₁ × n₂ × … × n_d-1 input data and the first dimension (tblock) of the n₁ × n₀ × n₂ ×…× n_d-1 transposed data (at intermediate steps of the transform, and for the output if FFTW_TRANSPOSED_OUT is specified in flags). These must be the same block sizes as were passed to the corresponding ‘local_size’ function; you can pass FFTW_MPI_DEFAULT_BLOCK to use FFTW's default block size as in the basic interface. Also, the howmany parameter specifies that the transform is of contiguous howmany-tuples rather than individual complex numbers; this corresponds to the same parameter in the serial advanced interface (see Advanced Complex DFTs) with stride = howmany and dist = 1.

MPI flags

The flags can be any of those for the serial FFTW (see Planner Flags), and in addition may include one or more of the following MPI-specific flags, which improve performance at the cost of changing the output or input data formats.

FFTW_MPI_SCRAMBLED_OUT, FFTW_MPI_SCRAMBLED_IN: valid for 1d transforms only, these flags indicate that the output/input of the transform are in an undocumented “scrambled” order. A forward FFTW_MPI_SCRAMBLED_OUT transform can be inverted by a backward FFTW_MPI_SCRAMBLED_IN (times the usual 1/N normalization). See One-dimensional distributions.
FFTW_MPI_TRANSPOSED_OUT, FFTW_MPI_TRANSPOSED_IN: valid for multidimensional (rnk > 1) transforms only, these flags specify that the output or input of an n₀ × n₁ × n₂ × … × n_d-1 transform is transposed to n₁ × n₀ × n₂ ×…× n_d-1. See Transposed distributions.

Real-data MPI DFTs

Plans for real-input/output (r2c/c2r) DFTs (see Multi-dimensional MPI DFTs of Real Data) are created by:

     fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1,
                                        double *in, fftw_complex *out,
                                        MPI_Comm comm, unsigned flags);
     fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1,
                                        double *in, fftw_complex *out,
                                        MPI_Comm comm, unsigned flags);
     fftw_plan fftw_mpi_plan_dft_r2c_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
                                        double *in, fftw_complex *out,
                                        MPI_Comm comm, unsigned flags);
     fftw_plan fftw_mpi_plan_dft_r2c(int rnk, const ptrdiff_t *n,
                                     double *in, fftw_complex *out,
                                     MPI_Comm comm, unsigned flags);
     fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1,
                                        fftw_complex *in, double *out,
                                        MPI_Comm comm, unsigned flags);
     fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1,
                                        fftw_complex *in, double *out,
                                        MPI_Comm comm, unsigned flags);
     fftw_plan fftw_mpi_plan_dft_c2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
                                        fftw_complex *in, double *out,
                                        MPI_Comm comm, unsigned flags);
     fftw_plan fftw_mpi_plan_dft_c2r(int rnk, const ptrdiff_t *n,
                                     fftw_complex *in, double *out,
                                     MPI_Comm comm, unsigned flags);

Similar to the serial interface (see Real-data DFTs), these transform logically n₀ × n₁ × n₂ × … × n_d-1 real data to/from n₀ × n₁ × n₂ × … × (n_d-1/2 + 1) complex data, representing the non-redundant half of the conjugate-symmetry output of a real-input DFT (see Multi-dimensional Transforms). However, the real array must be stored within a padded n₀ × n₁ × n₂ × … × [2 (n_d-1/2 + 1)]

array (much like the in-place serial r2c transforms, but here for out-of-place transforms as well). Currently, only multi-dimensional (rnk > 1) r2c/c2r transforms are supported (requesting a plan for rnk = 1 will yield NULL). As explained above (see Multi-dimensional MPI DFTs of Real Data), the data distribution of both the real and complex arrays is given by the ‘local_size’ function called for the dimensions of the complex array. Similar to the other planning functions, the input and output arrays are overwritten when the plan is created except in FFTW_ESTIMATE mode.

As for the complex DFTs above, there is an advance interface that allows you to manually specify block sizes and to transform contiguous howmany-tuples of real/complex numbers:

     fftw_plan fftw_mpi_plan_many_dft_r2c
                   (int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
                    ptrdiff_t iblock, ptrdiff_t oblock,
                    double *in, fftw_complex *out,
                    MPI_Comm comm, unsigned flags);
     fftw_plan fftw_mpi_plan_many_dft_c2r
                   (int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
                    ptrdiff_t iblock, ptrdiff_t oblock,
                    fftw_complex *in, double *out,
                    MPI_Comm comm, unsigned flags);

MPI r2r transforms

There are corresponding plan-creation routines for r2r transforms (see More DFTs of Real Data), currently supporting multidimensional (rnk > 1) transforms only (rnk = 1 will yield a NULL plan):

     fftw_plan fftw_mpi_plan_r2r_2d(ptrdiff_t n0, ptrdiff_t n1,
                                    double *in, double *out,
                                    MPI_Comm comm,
                                    fftw_r2r_kind kind0, fftw_r2r_kind kind1,
                                    unsigned flags);
     fftw_plan fftw_mpi_plan_r2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
                                    double *in, double *out,
                                    MPI_Comm comm,
                                    fftw_r2r_kind kind0, fftw_r2r_kind kind1, fftw_r2r_kind kind2,
                                    unsigned flags);
     fftw_plan fftw_mpi_plan_r2r(int rnk, const ptrdiff_t *n,
                                 double *in, double *out,
                                 MPI_Comm comm, const fftw_r2r_kind *kind,
                                 unsigned flags);
     fftw_plan fftw_mpi_plan_many_r2r(int rnk, const ptrdiff_t *n,
                                      ptrdiff_t iblock, ptrdiff_t oblock,
                                      double *in, double *out,
                                      MPI_Comm comm, const fftw_r2r_kind *kind,
                                      unsigned flags);

The parameters are much the same as for the complex DFTs above, except that the arrays are of real numbers (and hence the outputs of the ‘local_size’ data-distribution functions should be interpreted as counts of real rather than complex numbers). Also, the kind parameters specify the r2r kinds along each dimension as for the serial interface (see Real-to-Real Transform Kinds). See Other Multi-dimensional Real-data MPI Transforms.

MPI transposition

FFTW also provides routines to plan a transpose of a distributed n0 by n1 array of real numbers, or an array of howmany-tuples of real numbers with specified block sizes (see FFTW MPI Transposes):

     fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1,
                                       double *in, double *out,
                                       MPI_Comm comm, unsigned flags);
     fftw_plan fftw_mpi_plan_many_transpose
                     (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany,
                      ptrdiff_t block0, ptrdiff_t block1,
                      double *in, double *out, MPI_Comm comm, unsigned flags);

These plans are used with the fftw_mpi_execute_r2r new-array execute function (see Using MPI Plans), since they count as (rank zero) r2r plans from FFTW's perspective.