C++性能优化系列——矩阵转置(八)IPP转置API性能测试

怼烎@ 2022-11-29 12:09 74阅读 0赞

本篇记录Intel 高性能计算函数库IPP中的转置函数ippiTranspose\_8u\_C1R的执行情况，方便性能优化系列篇中转置实现做性能对比。

## 函数说明 ##

解释来自IPP2018发布文档。  
Intel® Integrated Performance Primitives Developer Reference, Volume 2: Image Processing  
**函数API**  
IPPAPI(IppStatus, ippiTranspose\_8u\_C1R, ( const Ipp8u\* pSrc, int srcStep, Ipp8u\* pDst, int dstStep, IppiSize roiSize ))  
**参数说明**  
Parameters  
pSrc Pointer to the source image ROI. srcStep Distance, in bytes, between the starting points of consecutive lines in the source image. pDst Pointer to the destination image ROI.  
dstStep Distance, in bytes, between the starting points of consecutive lines in the  
destination image. pSrcDst Pointer to the source and destination ROI for in-place operation. srcDstStep Distance, in bytes, between the starting points of consecutive lines in the source and destination image buffer for the in-place operation. roiSize Size of the source ROI in pixels.  
**功能描述**  
This function operates with ROI. This function transposes the source image pSrc (pSrcDst for in-place flavors) and stores the result in pDst (pSrcDst). The destination image is obtained from the source image by transforming the columns to the rows: pDst(x,y) = pSrc(y,x) The parameter roiSize is specified for the source image. The value of the roiSize.width parameter for the destination image is equal to roiSize.height for the source image, and roiSize.height for the destination image is equal to roiSize.width for the source image.  
![在这里插入图片描述][watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3lhbjMxNDE1_size_16_color_FFFFFF_t_70_pic_center]  
API的基本功能与参数含义文档提供的说明已经解释的非常清楚了。在这里额外补充两点：  
1.C1R的含义：C Channel，为复数计算提供的功能，这里对8位无符号整数计算，因此用C1。 R ROI。  
2.接口提供了ROI转置功能，自己实现的过程中可以根据机器缓存情况自行调整内存尺寸。

## IPP性能测试 ##

测试程序

void IPPTranspose(unsigned char* pSource, unsigned char* pTarget)
    	{ 
    		clock_t begin = clock();
    		IppiSize ROI = {  NROW, NCOL };
    		for (int i = 0; i < REPEAT; ++i)
    		{ 
    			ippiTranspose_8u_C1R((const Ipp8u*)pSource, NREALCOL, ( Ipp8u*)pTarget, NROW, ROI);
    		}
    		clock_t end = clock();
    		std::cout << "IPPTranspose 10240 Time " << (end - begin) << std::endl;
    		std::cout << "IPPTranspose each Time (ms) " << ((float)(end - begin)) / (float)REPEAT << std::endl;
    	}

说明：其中原始矩阵已经是内存填充后的状态。  
执行时间：

IPPTranspose 10240 Time 842
    IPPTranspose each Time (ms) 0.0822266

[watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3lhbjMxNDE1_size_16_color_FFFFFF_t_70_pic_center]: /images/20221124/b0487c864ced41978d79ad2bdf111653.png