Title
Discrete fourier transform on multicore
Abstract
This article gives an overview on the techniques needed to implement the discrete Fourier transform (DFT) efficiently on current multicore systems. The focus is on Intel-compatible multicores, but we also discuss the IBM Cell and, briefly, graphics processing units (GPUs). The performance optimization is broken down into three key challenges: parallelization, vectorization, and memory hierarchy optimization. In each case, we use the Kronecker product formalism to formally derive the necessary algorithmic transformations based on a few hardware parameters. Further code-level optimizations are discussed. The rigorous nature of this framework enables the complete automation of the implementation task as shown by the program generator Spiral. Finally, we show and analyze DFT benchmarks of the fastest libraries available for the considered platforms.
Year
DOI
Venue
2009
10.1109/MSP.2009.934155
Signal Processing Magazine, IEEE
Keywords
DocType
Volume
multicore performance,gpu,dft,multicore systems,program generator,parallel architectures,graphics processing units,matrix algebra,discrete fourier transform,intel-compatible multicores,discrete fourier transforms,ibm cell,kronecker product formalism,performance evaluation,memory hierarchy optimization,coprocessors,spiral,code-level optimizations,spirals,instruction sets,graphics,kronecker product,optimization,central processing unit,multicore processing
Journal
26
Issue
ISSN
Citations 
6
1053-5888
38
PageRank 
References 
Authors
2.35
25
6
Name
Order
Citations
PageRank
Franz Franchetti197488.39
Markus P2382.35
Yevgen Voronenko323917.54
Srinivas Chellappa4939.51
Jos ´ e5839.14
José M. F. Moura65137426.14