论文部分内容阅读
为了解决垃圾邮件过滤问题,考虑到中文垃圾邮件的特点和过滤系统的效率要求,应用生物信息化技术中模式提取算法TE IRES IA S的原理,设计了基于生物序列模式提取技术的垃圾邮件过滤算法B ioM atrix,并实现了基于此算法的中英文邮件过滤系统。过滤系统由数量控制过滤提供垃圾邮件训练集,通过提取其中的特征模式对邮件进行分类,可以识别出约94.2%的垃圾邮件,误过滤率约0.04%。与B ayes过滤算法对比的实验结果表明,将生物序列模式提取技术应用于邮件过滤具有较好的研究和实用价值。
In order to solve the problem of spam filtering, taking into account the characteristics of Chinese spam and the efficiency of the filtering system, a spam filtering algorithm based on biological sequence pattern extraction technology is designed by applying the principle of pattern extraction algorithm TEI IAAS in bioinformatics B ioM atrix, and based on this algorithm in English mail filtering system. The filtering system provides a spam training set by quantitative control filtering. By sorting the e-mail messages in the signature mode, about 94.2% spam messages can be identified, accounting for about 0.04% of false positives. Compared with the B ayes filtering algorithm, experimental results show that the application of the biological sequence pattern extraction technique to mail filtering has good research and practical value.