论文部分内容阅读
Based on variable sized chunking,this paper proposes a content aware chunking scheme,called CAC,that does not assume fully random file contents,but tonsiders the characteristics of the file types.CAC uses a candidate anchor histogram and the file-type specific knowledge to refine how anchors are determined when performing deduplication of file data and enforces the selected average chunk size.CAC yields more chunks being found which in turn produces smaller average chunks and a better reduction in data.We present a detailed evaluation of CAC and the experimental results show that this scheme can improve the compression ratio chunking for file types whose bytes are not randomly distributed (from 11.3% to 16.7% according to different datasets),and improve the write throughput on average by 9.7%.