论文部分内容阅读
文章提出了Web页面清洗的概念,给出了一种基于规则的Web页面清洗方案,并实现了基于此方案的系统。文中的页面清洗建立在Web页面的DOM树结构上,并通过人工判定的方法进行了实验和评估。实验结果表明该方案切实可行、清洗方法具有较快的速度和准确性。
This paper presents the concept of Web page cleaning, gives a rule-based Web page cleaning program, and implements the system based on this solution. The page cleaning is based on the DOM tree structure of the Web page and is experimentally and evaluated by manual methods. The experimental results show that the scheme is feasible, and the cleaning method has faster speed and accuracy.