秦书嘉, 缪磊, 崔龙, 席宁. 一种基二快速Hadamard变换的并行算法[J]. 信息与控制, 2016, 45(6): 707-712,721. DOI: 10.13976/j.cnki.xk.2016.0707
引用本文: 秦书嘉, 缪磊, 崔龙, 席宁. 一种基二快速Hadamard变换的并行算法[J]. 信息与控制, 2016, 45(6): 707-712,721. DOI: 10.13976/j.cnki.xk.2016.0707
QIN Shujia, MIAO Lei, CUI Long, XI Ning. Parallelized Algorithm for Radix-2 Fast Hadamard Transform[J]. INFORMATION AND CONTROL, 2016, 45(6): 707-712,721. DOI: 10.13976/j.cnki.xk.2016.0707
Citation: QIN Shujia, MIAO Lei, CUI Long, XI Ning. Parallelized Algorithm for Radix-2 Fast Hadamard Transform[J]. INFORMATION AND CONTROL, 2016, 45(6): 707-712,721. DOI: 10.13976/j.cnki.xk.2016.0707

一种基二快速Hadamard变换的并行算法

Parallelized Algorithm for Radix-2 Fast Hadamard Transform

  • 摘要: 快速Hadamard变换被广泛应用于信号与图像处理、通信系统、数字逻辑等领域中.当问题规模非常大时,快速Hadamard变换有可能不能满足计算时间的要求;这种情况下,算法并行化是一种行之有效的手段.本文以单像素相机的压缩感知图像复原为应用背景,利用基二快速Hadamard变换与快速傅里叶变换的结构相似性,提出一种通用的基二快速Hadamard变换的任务级并行算法,并用构造方式证明了该并行算法与串行算法计算结果之间的等价性.仿真表明对于小于220向量长度的问题规模以及并行子任务数少于210的情况,该并行算法对比串行算法的数值计算结果的欧氏距离平方误差小于10-18,佐证了并行算法的正确性.在PC平台通过多核CPU上POSIX线程实现的实验表明:在该特定平台和特定配置上对于220至225向量长度的问题规模并行计算加速比为1.33~1.42,证明了文中提出方法的可行性和有效性.

     

    Abstract: The fast Hadamard transform (FHT) has extensive application in signal and image processing, communication systems, digital logic, and other fields. Faced with problems at a very large scale, the serial algorithms of the FHT are probably unable to meet the calculation time requirements. In this situation, parallelizing the algorithm is an effective solution. Based on compressed sensing image reconstruction of a single-pixel camera and by using the structural similarity between the FHT and the fast Fourier transform, we propose a task-level parallel algorithm for the general radix-2 FHT. We prove equivalence between the results of the serial and parallel algorithms by construction. The simulation result shows that for a problem scale with an input vector length less than 220 and subtasks fewer than 210, the squared Euclidean distance error between the serial and parallel algorithms is less than 10-18, which substantiates the correctness of the parallel algorithm. An experiment using POSIX threads on a PC with a multicore CPU demonstrates that on a specific platform and under a specific configuration the observed speedup is 1.33~1.42 for problem scales with an input vector length from 220 to 225. This implies the feasibility and effectiveness of the proposed method.

     

/

返回文章
返回