论文部分内容阅读
本文提出一种基于Shannon信息熵的向量空间模型 (VSM )中的词权重算法。同时结合词与文献的相关权重的经典计算方法IDF(InverseDocumentFrequency) ,进一步总结了向量空间模型 (VSM)中两种词权重计算的具体公式。
In this paper, a word weight algorithm based on Shannon entropy in vector space model (VSM) is proposed. Combining the classical calculation method IDF (InverseDocumentFrequency), which is related to the weight of the document and the document, the specific formulas for weight calculation of two terms in vector space model (VSM) are further summarized.