Contains the main part of the sequential routines for Toeplitz algebra. More...

Functions
int	print_error_message (int error_number, char const *file, int line)
	Prints error message corresponding to an error number. More...

int	define_blocksize (int n, int lambda, int bs_flag, int fixed_bs)
	Defines an optimal size of the block used in the sliding windows algorithm. More...

int	define_nfft (int n_thread, int flag_nfft, int fixed_nfft)

int	tpltz_init (int n, int lambda, int nfft, int blocksize, fftw_complex *T_fft, double T, fftw_complex V_fft, double V_rfft, fftw_plan plan_f, fftw_plan plan_b, Flag flag_stgy)

int	fftw_init_omp_threads (int fftw_n_thread)
	Initialize omp threads for fftw plans. More...

int	rhs_init_fftw (const int nfft, int fft_size, fftw_complex V_fft, double V_rfft, fftw_plan plan_f, fftw_plan *plan_b, int fftw_flag)
	Initializes fftw array and plan for the right hand side, general matrix V. More...

int	circ_init_fftw (const double T, int fft_size, int lambda, fftw_complex *T_fft)

int	tpltz_cleanup (fftw_complex T_fft, fftw_complex V_fft, double *V_rfft, fftw_plan plan_f, fftw_plan *plan_b)

int	copy_block (int ninrow, int nincol, double Vin, int noutrow, int noutcol, double Vout, int inrow, int incol, int nblockrow, int nblockcol, int outrow, int outcol, double norm, int set_zero_flag)

int	scmm_direct (int fft_size, int nfft, fftw_complex C_fft, int ncol, double V_rfft, double *CV, fftw_complex V_fft, fftw_plan plan_f_V, fftw_plan plan_b_CV)

int	scmm_basic (double *V, int blocksize, int m, fftw_complex C_fft, double *CV, fftw_complex V_fft, double *V_rfft, int nfft, fftw_plan plan_f_V, fftw_plan plan_b_CV)

int	stmm_core (double *V, int n, int m, double T, fftw_complex T_fft, int blocksize, int lambda, fftw_complex V_fft, double *V_rfft, int nfft, fftw_plan plan_f, fftw_plan plan_b, int flag_offset, int flag_nofft)

int	stmm_main (double *V, int n, int m, int id0, int l, double T, fftw_complex T_fft, int lambda, fftw_complex V_fft, double *V_rfft, fftw_plan plan_f, fftw_plan plan_b, int blocksize, int nfft, Flag flag_stgy)

int	mpi_stmm (double *V, int n, int m, int id0, int l, double T, int lambda, Flag flag_stgy, MPI_Comm comm)

Variables
int	VERBOSE
	Verbose mode. More...

int	VERBOSE_FIRSTINIT = 1

int	PRINT_RANK = -1

Detailed Description

Contains the main part of the sequential routines for Toeplitz algebra.

version 1.2b, November 2012

Author: Frederic Dauvergne, Maude Le Jeune, Antoine Rogier, Radek Stompor

Project: Midapack library, ANR MIDAS'09 - Toeplitz Algebra module Purpose: Provide Toeplitz algebra tools suitable for Cosmic Microwave Background (CMB) data analysis.

Note: Copyright (c) 2010-2012 APC CNRS Université Paris Diderot; This program is free software; you can redistribute it and/or modify it under the terms; of the GNU Lesser General Public License as published by the Free Software Foundation;; either version 3 of the License, or (at your option) any later version. This program is; distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even; the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU; Lesser General Public License for more details.; You should have received a copy of the GNU Lesser General Public License along with this; program; if not, see http://www.gnu.org/licenses/lgpl.html; For more information about ANR MIDAS'09 project see :; http://www.apc.univ-paris7.fr/APC_CS/Recherche/Adamis/MIDAS09/index.html; ACKNOWLEDGMENT: This work has been supported in part by the French National Research; Agency (ANR) through COSINUS program (project MIDAS no. ANR-09-COSI-009).

Log: toeplitz*.c

Revision 1.0b 2012/05/07 Frederic Dauvergne (APC) Official release 1.0beta. The first installement of the library is the Toeplitz algebra module.

Revision 1.1b 2012/07/- Frederic Dauvergne (APC)

mpi_stbmm allows now rowi-wise order per process datas and no-blocking communications.
OMP improvment for optimal cpu time.
bug fixed for OMP in the stmm_basic routine.
distcorrmin is used to communicate only lambda-1 datas when it is needed.
new reshaping routines using transformation functions in stmm. Thus, only one copy at most is needed.
tpltz_init improvement using define_nfft and define_blocksize routines.
add Block struture to define each Toeplitz block.
add Flag structure and preprocessing parameters to define the computational strategy. All the flag parameters are then available directly from the API.

Revision 1.2b 2012/11/30 Frederic Dauvergne (APC)

extend the mpi product routine to rowwise order data distribution. This is now allowing tree kinds of distribution.
add int64 for some variables to extend the global volume of data you can use.
Openmp improvments.
Add toeplitz_wizard.c, which contains a set of easy to use routines with defined structures.

Definition in file toeplitz.c.

Function Documentation

◆ print_error_message()

int print_error_message	(	int	error_number,
		char const *	file,
		int	line
	)

Prints error message corresponding to an error number.

Parameters

error_number	error number
file	file name
line	line number

Definition at line 127 of file toeplitz.c.

◆ define_blocksize()

int define_blocksize	(	int	n,
		int	lambda,
		int	bs_flag,
		int	fixed_bs
	)

Defines an optimal size of the block used in the sliding windows algorithm.

The optimal block size is computed as the minimum power of two above 3*lambda, i.e. the smallest value equal to 2^x, where x is an integer, and above 3*lambda. If bs_flag is set to one, a different formula is used to compute the optimal block size (see MADmap: A MASSIVELY PARALLEL MAXIMUM LIKELIHOOD COSMIC MICROWAVE BACKGROUND MAP-MAKER, C. M. Cantalupo, J. D. Borrill, A. H. Jaffe, T. S. Kisner, and R. Stompor, The Astrophysical Journal Supplement Series, 187:212–227, 2010 March). To avoid using block size much bigger than the matrix, the block size is set to 3*lambda when his previous computed size is bigger than the matrix size n. This case append mostly for small matrix compared to his bandwith.

Parameters

n	matrix row dimension
lambda	half bandwidth of the Toeplitz matrix
bs_flag	flag to use a different formula for optimal block size computation
fixed_bs	fixed blocksize value if needed

Definition at line 168 of file toeplitz.c.

◆ define_nfft()

int define_nfft	(	int	n_thread,
		int	flag_nfft,
		int	fixed_nfft
	)

Defines the number of simultaneous ffts for the Toeplitz matrix product computation.

Parameters

n_thread	number of omp threads
flag_nfft	flag to set the strategy to define nfft
fixed_nfft	fixed nfft value if nedeed (used for the case where flag_nfft=1)

Definition at line 266 of file toeplitz.c.

◆ tpltz_init()

int tpltz_init	(	int	n,
		int	lambda,
		int *	nfft,
		int *	blocksize,
		fftw_complex **	T_fft,
		double *	T,
		fftw_complex **	V_fft,
		double **	V_rfft,
		fftw_plan *	plan_f,
		fftw_plan *	plan_b,
		Flag	flag_stgy
	)

Sets a block size and initializes all fftw arrays and plans needed for the computation.

Initializes the fftw arrays and plans is necessary before any computation of the Toeplitz matrix matrix product. Use tpltz_cleanup afterwards.

See also: tpltz_cleanup

Parameters

n	row size of the matrix used for later product
lambda	Toeplitz band width
nfft	maximum number of FFTs you want to compute at the same time
blocksize	optimal block size used in the sliding window algorithm to compute an optimize value)
T_fft	complex array used for FFTs
T	Toeplitz matrix
V_fft	complex array used for FFTs
V_rfft	real array used for FFTs
plan_f	fftw plan forward (r2c)
plan_b	fftw plan backward (c2r)

Definition at line 298 of file toeplitz.c.

◆ fftw_init_omp_threads()

int fftw_init_omp_threads ( int fftw_n_thread )

Initialize omp threads for fftw plans.

Initialize omp threads for fftw plans. The number of threads used for ffts (define by the variable n_thread) is read from OMP_NUM_THREAD environment variable. fftw multithreaded option is controlled by fftw_MULTITHREADING macro.

Definition at line 379 of file toeplitz.c.

◆ rhs_init_fftw()

int rhs_init_fftw	(	const int *	nfft,
		int	fft_size,
		fftw_complex **	V_fft,
		double **	V_rfft,
		fftw_plan *	plan_f,
		fftw_plan *	plan_b,
		int	fftw_flag
	)

Initializes fftw array and plan for the right hand side, general matrix V.

Initialize fftw array and plan for the right hand side matrix V.

Parameters

nfft	maximum number of FFTs you want to compute at the same time
fft_size	effective FFT size for the general matrix V (usually equal to blocksize)
V_fft	complex array used for FFTs
V_rfft	real array used for FFTs
plan_f	fftw plan forward (r2c)
plan_b	fftw plan backward (c2r)
fftw_flag	fftw plan allocation flag

Definition at line 408 of file toeplitz.c.

◆ circ_init_fftw()

int circ_init_fftw	(	const double *	T,
		int	fft_size,
		int	lambda,
		fftw_complex **	T_fft
	)

Initializes fftw array and plan for the circulant matrix T_circ obtained from T.

Builds the circulant matrix T_circ from T and initilizes its fftw arrays and plans. Use tpltz_cleanup afterwards.

See also: tpltz_cleanup

Parameters

T	Toeplitz matrix.
fft_size	effective FFT size for the circulant matrix (usually equal to blocksize)
lambda	Toeplitz band width.
T_fft	complex array used for FFTs.

Definition at line 441 of file toeplitz.c.

◆ tpltz_cleanup()

int tpltz_cleanup	(	fftw_complex **	T_fft,
		fftw_complex **	V_fft,
		double **	V_rfft,
		fftw_plan *	plan_f,
		fftw_plan *	plan_b
	)

Cleans fftw workspace used in the Toeplitz matrix matrix product's computation.

Destroy fftw plans, free memory and reset fftw workspace.

See also: tpltz_init

Parameters

T_fft	complex array used for FFTs
V_fft	complex array used for FFTs
V_rfft	real array used for FFTs
plan_f	fftw plan forward (r2c)
plan_b	fftw plan backward (c2r)

Definition at line 485 of file toeplitz.c.

◆ copy_block()

int copy_block	(	int	ninrow,
		int	nincol,
		double *	Vin,
		int	noutrow,
		int	noutcol,
		double *	Vout,
		int	inrow,
		int	incol,
		int	nblockrow,
		int	nblockcol,
		int	outrow,
		int	outcol,
		double	norm,
		int	set_zero_flag
	)

Copies (and potentially reshapes) a selected block of the input matrix to a specified position of the output matrix.

Copy a matrix block of a size nblockrow x nblockcol from the input matrix Vin (size ninrow x nincol) starting with the element (inrow, incol) to the output matrix Vout (size notrow x noutcol) starting with the element (outrow, outcol) after multiplying by norm. If the output matrix is larger than the block the extra elements are either left as they were on the input or zeroed if zero_flag is set to 1. If the block to be copied is larger than either the input or the output matrix an error occurs.

Definition at line 514 of file toeplitz.c.

◆ scmm_direct()

int scmm_direct	(	int	fft_size,
		int	nfft,
		fftw_complex *	C_fft,
		int	ncol,
		double *	V_rfft,
		double **	CV,
		fftw_complex *	V_fft,
		fftw_plan	plan_f_V,
		fftw_plan	plan_b_CV
	)

Performs the product of a circulant matrix C_fft by a matrix V_rfft using fftw plans.

Performs the product of a circulant matrix C_fft by a matrix V_rfft using fftw plans: forward - plan_f_V; and backward - plan_b_CV. C_fft is a Fourier (complex representation of the circulant matrix) of length fft_size/2+1; V_rfft is a matrix with ncol columns and fft_size rows; V_fft is a workspace of fft_size/2+1 complex numbers as required by the backward FFT (plan_b_CV); CV is the output matrix of the same size as the input V_rfft one. The FFTs transform ncol vectors simultanously.

Parameters

	fft_size	row dimension
	nfft	number of simultaneous FFTs
	C_fft	complex array used for FFTs
	ncol	column dimension
	V_rfft	real array used for FFTs
[out]	CV	product of the circulant matrix C_fft by the matrix V_rfft
	V_fft	complex array used for FFTs
	plan_f_V	fftw plan forward (r2c)
	plan_b_CV	fftw plan backward (c2r)

Definition at line 569 of file toeplitz.c.

◆ scmm_basic()

int scmm_basic	(	double **	V,
		int	blocksize,
		int	m,
		fftw_complex *	C_fft,
		double **	CV,
		fftw_complex *	V_fft,
		double *	V_rfft,
		int	nfft,
		fftw_plan	plan_f_V,
		fftw_plan	plan_b_CV
	)

Performs the product of a circulant matrix by a matrix using FFT's (an INTERNAL routine)

This routine multiplies a circulant matrix, represented by C_fft, by a general matrix V, and stores the output as a matrix CV. In addition the routine requires two workspace objects, V_fft and V_rfft, to be allocated prior to a call to it as well as two fftw plans: one forward (plan_f_V), and one backward (plan_b_TV). The sizes of the input general matrix V and the ouput CV are given by blocksize rows and m columns. They are stored as a vector in the column-wise order. The circulant matrix, which is assumed to be band-diagonal with a band-width lambda, is represented by a Fourier transform with its coefficients stored in a vector C_fft (length blocksize). blocksize also defines the size of the FFTs, which will be performed and therefore this is the value which has to be used while creating the fftw plans and allocating the workspaces. The latter are given as: nfft*(blocksize/2+1) for V_fft and nfft*blocksize for V_rfft. The fftw plans should correspond to doing the transforms of nfft vectors simultaneously. Typically, the parameters of this routine are fixed by a preceding call to Toeplitz_init(). The parameters are :

Parameters

	V	matrix (with the convention V(i,j)=V[i+j*n])
	blocksize	row dimension of V
	m	column dimension of V
	C_fft	complex array used for FFTs (FFT of the Toeplitz matrix)
[out]	CV	product of the circulant matrix C_fft by the matrix V_rfft
	V_fft	complex array used for FFTs
	V_rfft	real array used for FFTs
	nfft	number of simultaneous FFTs
	plan_f_V	fftw plan forward (r2c)
	plan_b_CV	fftw plan backward (c2r)

Definition at line 651 of file toeplitz.c.

◆ stmm_core()

int stmm_core	(	double **	V,
		int	n,
		int	m,
		double *	T,
		fftw_complex *	T_fft,
		int	blocksize,
		int	lambda,
		fftw_complex *	V_fft,
		double *	V_rfft,
		int	nfft,
		fftw_plan	plan_f,
		fftw_plan	plan_b,
		int	flag_offset,
		int	flag_nofft
	)

Performs the stand alone product of a Toeplitz matrix by a matrix using the sliding window algorithm. (an INTERNAL routine)

The product is performed block-by-block with a defined block size or a computed optimized block size that reflects a trade off between cost of a single FFT of a length block_size and a number of blocks needed to perform the mutiplicaton. The latter determines how many spurious values are computed extra due to overlaps between the blocks. Use flag_offset=0 for "classic" algorithm and flag_offset=1 to put an offset to avoid the first and last lambdas terms. Usefull when a reshaping was done before with optimal column for a nfft. Better be inside the arguments of the routine. The parameters are:

Parameters

V	[input] data matrix (with the convention V(i,j)=V[i+j*n]) ; [out] result of the product TV
n	number of rows of V
m	number of columns of V
T	Toeplitz matrix data composed of the non-zero entries of his first row
T_fft	complex array used for FFTs
blocksize	block size used in the sliding window algorithm
lambda	Toeplitz band width
V_fft	complex array used for FFTs
V_rfft	real array used for FFTs
nfft	number of simultaneous FFTs
plan_f	fftw plan forward (r2c)
plan_b	fftw plan backward (c2r)
flag_offset	flag to avoid extra 2*lambda padding to zeros on the edges
flag_nofft	flag to do product without using fft

Definition at line 731 of file toeplitz.c.

◆ stmm_main()

int stmm_main	(	double **	V,
		int	n,
		int	m,
		int	id0,
		int	l,
		double *	T,
		fftw_complex *	T_fft,
		int	lambda,
		fftw_complex *	V_fft,
		double *	V_rfft,
		fftw_plan	plan_f,
		fftw_plan	plan_b,
		int	blocksize,
		int	nfft,
		Flag	flag_stgy
	)

Performs the product of a Toeplitz matrix by a general matrix using the sliding window algorithm with optimize reshaping. (an INTERNAL routine)

The input matrix is formatted into an optimized matrix depending on the block size and the number of simultaneous ffts (defined with the variable nfft). The obtained number of columns represent the number of vectors FFTs of which are computed simulatenously. The multiplication is then performed block-by-block with the chosen block size using the core routine. The parameters are :

Parameters

V	[input] data matrix (with the convention V(i,j)=V[i+j*n]) ; [out] result of the product TV
n	number of rows of V
m	number of columns of V
id0	first index of V
l	length of V
T	Toeplitz matrix data composed of the non-zero entries of his first row
T_fft	complex array used for FFTs
lambda	Toeplitz band width
V_fft	complex array used for FFTs
V_rfft	real array used for FFTs
plan_f	fftw plan forward (r2c)
plan_b	fftw plan backward (c2r)
blocksize	block size
nfft	number of simultaneous FTTs
flag_stgy	flag strategy for the product computation

Definition at line 888 of file toeplitz.c.

◆ mpi_stmm()

int mpi_stmm	(	double **	V,
		int	n,
		int	m,
		int	id0,
		int	l,
		double *	T,
		int	lambda,
		Flag	flag_stgy,
		MPI_Comm	comm
	)

Performs the product of a Toeplitz matrix by a general matrix using MPI. We assume that the matrix has already been scattered. (a USER routine)

The multiplication is performed using FFT applied to circulant matrix in order to diagonalized it. The parameters are :

Parameters

V	[input] distributed data matrix (with the convention V(i,j)=V[i+j*n]); [out] result of the product TV
n	number of rows of V
m	number of columns of V
id0	first index of scattered V
l	length of the scattered V
T	Toeplitz matrix.
lambda	Toeplitz band width.
flag_stgy	flag strategy for the product computation
comm	communicator (usually MPI_COMM_WORLD)

Definition at line 1079 of file toeplitz.c.

Variable Documentation

◆ VERBOSE

int VERBOSE

Verbose mode.

Prints some informative messages during the computation.

Definition at line 113 of file toeplitz.c.

◆ VERBOSE_FIRSTINIT

int VERBOSE_FIRSTINIT = 1

Definition at line 114 of file toeplitz.c.

◆ PRINT_RANK

int PRINT_RANK = -1

Definition at line 117 of file toeplitz.c.

Functions

Variables

Detailed Description

Function Documentation

◆ print_error_message()

◆ define_blocksize()

◆ define_nfft()

◆ tpltz_init()

◆ fftw_init_omp_threads()

◆ rhs_init_fftw()

◆ circ_init_fftw()

◆ tpltz_cleanup()

◆ copy_block()

◆ scmm_direct()

◆ scmm_basic()

◆ stmm_core()

◆ stmm_main()

◆ mpi_stmm()

Variable Documentation

◆ VERBOSE

◆ VERBOSE_FIRSTINIT

◆ PRINT_RANK