Abstract | ||
---|---|---|
Fourier Transform is one of the most critical algorithms, and is applied in a wide range of fields like signal processing and data compression. In real world applications, such as image compression (JPEG), Fourier Transform is concentrated in processing real number input. These transforms are called real DFT (real discrete fourier transform) in this paper. Thus it is critical to optimize real DFT for specific platforms. In this paper, we implement 1D and 2D real DFT on ARMv8 platform which is the flagship architecture of ARM. Real DFT kinds implemented and optimized include R2HC, HC2R, DHT, DCTI-IV, DSTI-IV and are especially optimized when input size is (2^{q}3^{n}5^{m}). In order to achieve high performance, optimization is carried out in following aspects: (1) Reduction of the computation complexity of real DFT. (2) Implementation of high performance 1D complex DFT algorithm to support real DFT. (3) For the 2D real DFT, we propose a cache-aware blocking approach to improve cache performance. Experimental results show that: Compared with FFTw 3.3.7, 1D-Float DFT gains 1.52x speedup in average across all real DFT kinds, maximum speedup reaches 1.79x; 1D-Double DFT gains 1.34x speedup in average across all real DFT kinds, maximum speedup reaches 1.61x; 2D-Float DFT gains 1.41x speedup in average across all real DFT kinds, maximum speedup reaches 1.70x; 2D-Double DFT gains 1.10x speedup across all real DFT kinds, maximum speedup reaches 1.25x. |
Year | Venue | Field |
---|---|---|
2018 | ICA3PP | Program optimization,Signal processing,Computer science,Parallel computing,Fourier transform,Computational science,Fast Fourier transform,Discrete Fourier transform,Data compression,Image compression,Speedup |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
8 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xiao Wang | 1 | 33 | 24.83 |
Haipeng Jia | 2 | 43 | 5.40 |
Zhihao Li | 3 | 17 | 5.10 |
Yunquan Zhang | 4 | 327 | 43.92 |