论文部分内容阅读
龙芯3B处理器是龙芯3号多核处理器的第二款产品,主要面向高性能计算、高端嵌入式等应用领域.快速傅里叶变换(Fast Fourier Transform,FFT)作为数字信号处理、图像处理等领域的基本研究工具,其在龙芯3B处理器上的高效实现是必不可少的.然而目前的FFT算法因未能充分挖掘龙芯3B处理器的硬件特性,仍面临算法性能较低的问题.针对该问题,对FFT算法进行分析,并结合龙芯3B处理器的体系结构特征,提出基32迭代的向量化FFT算法.实验结果表明,在龙芯3B处理器上基32迭代的向量化FFT算法平均性能达到765.15M flops,是相同环境下FFTW软件包(Fast Fourier Transform in the West)性能的2.12倍,最高性能可以达到1341.12Mflops,是相同环境下FFTW软件包性能的3.51倍.
Godson 3B processor is Godson-3 multi-core processor, the second product, mainly for high performance computing, high-end embedded applications such as Fast Fourier Transform (Fast Fourier Transform, FFT) as a digital signal processing, image processing, etc. However, the current FFT algorithm still faces the problem of low performance of the algorithm because it can not fully tap the hardware features of Godson-3B processor. Based on the analysis of the FFT algorithm and the architecture features of the Loongson 3B processor, a 32-iteration vector FFT algorithm is proposed.The experimental results show that the average performance of the vector FFT algorithm based on 32 iterations on the Loongson 3B processor This is 765.15M flops, which is 2.12 times faster than the FFTW software package (Fast Fourier Transform in the West) under the same environment and achieves a maximum performance of 1341.12Mflops, which is 3.51 times the performance of the FFTW software package in the same environment.