Improving multiprocessor performance with fine-grain coherence bypass

来源 :Science China(Information Sciences) | 被引量 : 0次 | 上传用户:kongxf04
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Efficient and scalable cache coherence protocol is crucial to high-performance servers with sharedmemory. The directory-based cache coherence protocol is more desirable than the snooping-based protocol with respect to the scalability. However, even for the former protocol, scaling to a large number of cores is also challenging due to the additional area requirements of the directories. We observed that a significant percentage of the referenced memory blocks were only accessed by a single core(even in parallel applications) which could be considered as private memory blocks. An intuitive motivation from this observation is that memory blocks accessed by a single core do not require coherence maintenance. The issue is to identify the private block and track the change of its access pattern. We propose a novel hardware approach to(1) dynamically identify the shared memory blocks at the cache block level, and(2) bypass the coherence procedure for the private memory blocks. This approach increases the effectiveness of the directory-based approach and therefore improves the system performance. Experimental results showed that, our approach can on an average(1) avoid the coherence tracking of about 54% referenced memory blocks,(2) reduce the coherence overhead by 77%,(3) avoid 8% L2 cache misses, and(4) shorten the execution time of parallel applications by 13%. Efficient and scalable cache coherence protocol is crucial to high-performance servers with shared memory. The directory-based cache coherence protocol is more desirable than the snooping-based protocol with respect to the scalability. However, even for the former protocol, scaling to a large number of cores is also challenging due to the additional area requirements of the directories. We observed that a significant percentage of the referenced memory blocks were only accessed by a single core (even in parallel applications) An intuitive motivation from this observation is that memory blocks accessed by a single core do not require coherence maintenance. The issue is to identify the private block and track the change of its access pattern. We propose a novel hardware approach to (1) dynamically identify the shared memory blocks at the cache block level, and (2) bypass the coherence procedure for the private memory blocks. This approach i ncreases the effectiveness of the directory-based approach and therefore improves the system performance. Experimental Methods showed that, our approach can on an average (1) avoid the coherence tracking of about 54% referenced memory blocks, (2) reduce the coherence overhead by 77%, (3) avoid 8% L2 cache misses, and (4) shorten the execution time of parallel applications by 13%.
其他文献
“日行一善”栏目又与大家见面了!大家可以把自己做的和身边发生的善举都发给我们与大家分享。让我们在这方寸之地,一起为雷锋精神赋予新时代的含义! “Japan is a good da
[考题示例]阅读下面的文字,根据要求作文。艺术大师达利说过:“不要担心完美,你永远也达不到它。”这句话的含意很丰富。有人推测说:“这是鼓励人们竭尽所能大胆发展自我,因
为满足北京市40万考生的需要 ,解决考生咨询难的问题 ,北京市自考办于4月底与160、268信息台联合开通人工咨询电话 ,共有8条连拨线 ,可同时接进8个电话 ,每天从8∶00—18∶30开通10多个小时 ,双休
目前,幼儿园遍布城乡,给家长送孩子入园提供了极大方便,同时,也给家长创造了可以给孩子经常调换幼儿园的机会。我认为,给孩子经常换幼儿园不利孩子的成长。其弊端表现在:经
合作学习是指在小组或团队中为了完成共同的任务,经历动手实践、自主探索和合作交流的过程,是有明确责任分工的互助性学习。它强调学生学习的亲历性、参与性、合作性,是一种
教学中,学生对于同一个数学问题,会有不同的理解方式,也会出现许多不同的结果,包括那些错误的结果。教师对于课堂中生成的“错误”资源,若能及时抓住并深度挖掘和利用,将错误
2008年,福建省南安市丰州镇北面、桃园村与西华村北部交界处的皇冠山发现两晋南朝砖室墓群,泉州市博物馆与福建博物院考古研究所共同对其进行了抢救性发掘。其中12号墓(M12)
合作学习,是新课程倡导的学生学习数学的重要方式之一。在课堂上,学生通过合作学习,分享自己的想法、倾听他人的见解、解除内心的疑惑,直至感到豁然开朗。走进小学数学课堂,
企业核心能力理论是当今管理学、经济学交叉融合的最新理论成果之一,日益受到企业管理理论界与实践界的关注。本文以系统观中的知识系统观为基础,对其划分的核心能力层次系统
近日我拜读了《少年儿童研究》杂志上“父范学堂”中的一篇名为“严父画像:责任感、权威、坚定性”的文章,有些观点值得商榷,不敢苟同。下面谈谈严父之我见。一、为父为母,自然而