【摘 要】
:
With the rapid growth of real-world graphs,the size of which can easily exceed the on-chip (board) storage capacity of an accelerator,processing large-scale graphs on a single Field Programmable Gate Array (FPGA) becomes difficult.The multi-FPGA accelerat
【机 构】
:
National Engineering Research Center for Big Data Technology and System,School of Computer Science a
论文部分内容阅读
With the rapid growth of real-world graphs,the size of which can easily exceed the on-chip (board) storage capacity of an accelerator,processing large-scale graphs on a single Field Programmable Gate Array (FPGA) becomes difficult.The multi-FPGA acceleration is of great necessity and importance.Many cloud providers (e.g.,Amazon,Microsoft,and Baidu) now expose FPGAs to users in their data centers,providing opportunities to accelerate large-scale graph processing.In this paper,we present a communication library,called FDGLib,which can easily scale out any existing single FPGA-based graph accelerator to a distributed version in a data center,with minimal hardware engineering efforts.FDGLib provides six APIs that can be easily used and integrated into any FPGA-based graph accelerator with only a few lines of code modifications.Considering the torus-based FPGA interconnection in data centers,FDGLib also improves communication efficiency using simple yet effective torus-friendly graph partition and placement schemes.We interface FDGLib into AccuGraph,a state-of-the-art graph accelerator.Our results on a 32-node Microsoft Catapult-like data center show that the distributed AccuGraph can be 2.32x and 4.77x faster than a state-of-the-art distributed FPGA-based graph accelerator ForeGraph and a distributed CPU-based graph system Gemini,with better scalability.
其他文献
The second-order random walk has recently been shown to effectively improve the accuracy in graph analysis tasks.Existing work mainly focuses on centralized second-order random walk (SOW) algorithms.SOW algorithms rely on edge-to-edge transition probabili
In view of the significant number of defective nanodevices in the Cmos/nanowire/MOLecular hybrid (CMOL)circuit,defect-tolerant mapping is an essential step to achieve correct logic operations in defective CMOL circuits.However,less effort has been made to
DOI:10.16644/j.cnki.cn33-1094/tp.2021.11.018 摘 要: 在当前国内的计算机学科教学中,学生的英文阅读存在着一些发音上的问题。这些发音错误既会影响国内同行之间的业务和技术交流,也会给国际间的沟通带来了误解和障碍。文章分析了国内计算机学科教学中的英语发音误区,给出了如何去学习和纠正专业词汇发音的建议,希望能为广大计算机学科的师生带来一定的帮助。 关键词:
广州市北部水厂一期工程自2019年初投运以来,对整个排泥水处理系统的运行方式进行了深入研究.结合实际生产情况和设备性能,逐渐形成了适用于不同时期原水浑浊度变化、控制精准、运行稳定可靠的全自动运行方法,不仅大大减轻运行维护工作强度,而且整个系统运行更加稳定,泥饼含固率提高,干泥处置费用大幅减少.北部水厂的运行方法、设计思路、各单体控制目标、运行参数设置等内容,为其他自来水厂的排泥水系统稳定运行并实现全自动运行提供有效的借鉴.
含Cd2+废水危害人类健康,通过蒙脱石类吸附剂处理Cd2+的研究受到广泛关注.文中对相关报告进行归纳总结,从蒙脱石的理化特性、无机改性、有机改性、复合改性、吸附性能和竞争吸附等方面的阐述表明了蒙脱石及其衍生物去除水体中Cd2+具备的良好可行性.然而,对于吸附的微观机理、多组分重金属废水的吸附和有关蒙脱石改性、再生成本等经济可行性方面还需要进一步深入研究.
节点流量分配对于供水管网建模过程至关重要,随着国内各城市和地区管网地理信息系统和数据采集系统的建立和完善,管网分区计量管理也得到了越来越广泛的应用.在管网分区计量体系下,供水管网建模节点水量可以被更加准确地计算并分配至管网中相应节点.文中首先对于节点流量基本分配方法做了简单的介绍,然后对于分区计量体系下供水管网建模节点流量分配流程做了详细的阐述,最后通过上海市奉贤区供水管网建模工程案例说明了该分配方法的合理性和有效性.
与城市供水相比,农村供水受自然地理、经济发展等诸多因素影响,极大限制了农村生活生产与新农村建设.在国家与地方政策和财政支持下,农村供水实现了跨越式发展.随着新时代农村生活水平的提高,广大农民群众在饮水方面提出了更高要求.文中剖析了在城乡一体化供水、小型集中供水、分散供水并存的城乡统筹供水背景下,受政策、标准和大量建成工程的多重约束,农村供水在水源、工艺运行、管网建设、运维管理等多方面面临诸多新挑战.在农村供水背景与现状下,从城乡供水一体化工程建设、小型集中供水工程升级改造、分散供水工程水质提升3个方面进一
生物膜耦合工艺采用生物膜法与其他污水处理工艺相结合,发挥了组合工艺的优势,弥补了单一工艺运行中存在的问题,被广泛应用于传统污水处理工艺的升级改造.文中针对国内外应用较多的生物膜-活性污泥耦合工艺、生物膜-电极法(BER)和生物膜-膜处理耦合工艺3种组合工艺进行分析和总结,综述了这3种耦合工艺的优缺点、运行中存在的问题,以及在污水处理中的研究和应用情况,并对未来该工艺的研究发展方向和重点进行了初步探讨.
Fuzzing is known to be one of the most effective techniques to uncover security vulnerabilities of large-scale software systems.During fuzzing,it is crucial to distribute the fuzzing resource appropriately so as to achieve the best fuzzing performance und
实施独立计量分区(district metered area,DMA)是辅助供水管网管理和漏损识别的重要手段.图划分算法是进行DMA分区的方法之一,常规图划分算法应用中,存在解空间受限、分区后原水流状态易发生较大改变、形成较多串联分区(对流量计算不利)的缺点.在常规图划分算法基础上进行了改进:粗化阶段按照特定规则匹配、合并非输水干管两端的节点,形成简化的管网拓扑结构;分区阶段得到管网初步分区方案;细化阶段提出基于贪心算法、枚举算法、蒙特卡洛算法的分区调整方法,结合改进的仪表安置方法、水力模拟、优劣解距离法