论文部分内容阅读
[目的/意义]重点探讨基于特征表现的虚假评论人的预测,目的在于揭示真实网络环境中“网络水军”的特点和行为规律,构建一个简洁清晰、可解释的评论人身份预测模型,为深层次的评论挖掘研究奠定基础。[方法/过程]结合实证分析和机器学习技术,对目标网站“大众点评网”的内部评价机制进行探索,利用因子分析提炼评论人属性及行为表现特征,并在此基础上构建基于Logistic回归的预测模型。[结果/结论]对于目标网站,模型对虚假评论人的分类预测精度达到73.8%,AUC指标达到80.9%。而评论人的贡献度、活跃度以及文字素养被验证与其身份有统计意义上的显著关系,但评论人的层级、情绪以及评价偏差则对其身份预测的影响不显著。实验结论和经验分析基本保持一致,模型能够被合理解释。
[Purpose / Significance] Focusing on the prediction of false commentators based on feature manifestation, the purpose is to reveal the characteristics and behavior rules of “Network Water Army ” in the real network environment and to construct a concise, clear and interpretable model of commenter identity prediction , Which lays the foundation for further research on mining of comments. [Method / Process] This paper explores the internal evaluation mechanism of the target website “Public Comment Network ” combining with the empirical analysis and machine learning techniques, extracts the attributes and performance characteristics of the reviewers by means of factor analysis, and builds on the basis of Logistic Regression regression model. [Results / Conclusions] For the target website, the model predicts the accuracy of the classification of false commentators to 73.8% and the AUC index to 80.9%. However, the contribution, activity and literacy of commentators are verified to have a statistically significant relationship with their identities. However, the influence of commentators’ hierarchy, emotion and evaluation bias on their identity predictions is insignificant. Experimental conclusion and empirical analysis basically the same, the model can be reasonably explained.