Quantcast
Channel: CSDN博客推荐文章
Viewing all articles
Browse latest Browse all 35570

SMP矩阵乘法

$
0
0

矩阵乘法的并行化基本都是用加农算法,但是在共享内存的情况下,我觉得加农并没有优势。

加农保证了在每个变量全局单副本的情况下,并行度的提升。在共享内存时,没有变量复制的成本,所以直接使用带状划分可以避免迭代中间的barrier开销,提高效率。

SMP下实现矩阵乘法

#include "stdafx.h"
#include "matrixOperation.h"
#include <omp.h>

int _tmain(int argc, _TCHAR* argv[])
{
	const int size=5000;
	double **a,**b,**c;
	a=new double*[size];
	b=new double*[size];
	c=new double*[size];
	for(int i=0;i<size;++i)
	{
		a[i]=new double[size];
		b[i]=new double[size];
		c[i]=new double[size];
	}
	cout<<"mem set"<<endl;
	//read file
	cout<<readMatrix("matrix",a,size)<<endl;
	cout<<readMatrix("matrix",b,size)<<endl;
	cout<<compareMatrix(a,b,size)<<endl;
	//for more cache hits 
	//transposition b and place data needed in one cache block
	matrixTransposition(b,size);
	cout<<"data prepared"<<endl<<"calculating"<<endl;
	long start=time(0);
//	omp_set_nested(true);
	#pragma omp parallel for num_threads(16) schedule(dynamic)
	for(int i=0;i<size;++i)
	{
//		#pragma omp parallel for firstprivate(i) num_threads(4)
		for(int j=0;j<size;++j)
		{
			c[i][j]=0;
			for(int k=0;k<size;++k)
			{
				c[i][j]+=a[i][k]*b[j][k];//different from the original formulation
			}
		}
		cout<<".";
	}
	long end=time(0);
	cout<<end-start<<" seconds"<<endl;
	writeMatrix("out",c,size);
	for(int i=0;i<size;++i)
	{
		delete[] a[i];
		delete[] b[i];
		delete[] c[i];
	}
	delete[] a;
	delete[] b;
	delete[] c;
	cin>>start;
	return 0;
}

i7 2600处理器,5000*5000的矩阵相乘上面的参数效果较好,纯计算时间在126秒左右。

matrixOperation头文件见另一个文章:http://blog.csdn.net/pouloghost/article/details/8746913

作者:pouloghost 发表于2013-4-1 15:47:29 原文链接
阅读:39 评论:0 查看评论

Viewing all articles
Browse latest Browse all 35570

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>