论文部分内容阅读
大规模词表连续语音识别系统需要综合各种知识源,如声学模型、语言模型、发音词典等。其中,解码网络是识别引擎的基础,对提高解码器的性能有着至关重要的影响。有效综合这些知识源,构建一个紧致的解码网络,可以有效减少识别时的搜索空间和重复计算,显著提高解码速度。该文针对语音识别的动态解码网络进行研究,提出了词标志(word end,WE)节点前推算法,结合传统的前后向合并算法,实现了一个基于隐Markov模型状态为网络节点的紧凑动态解码网络。优化后的解码网络的节点数和边数分别是线性词典解码网络的1/4,是开源工具包HDecode的1/2;需要计算语言模型预测分数的节点数为HDecode的1/2。该声学模型基于三音子建模,可方便地移植到其他语种上。
Large-scale vocabularies continuous speech recognition systems need to integrate a variety of sources of knowledge, such as acoustic models, language models, pronunciation lexicons and the like. Among them, the decoding network is the basis of the recognition engine, to improve the performance of the decoder has a crucial impact. By effectively integrating these knowledge sources and constructing a compact decoding network, the search space and the double counting during identification can be effectively reduced, and the decoding speed is significantly improved. In this paper, the dynamic decoding network of speech recognition is studied, and a word forward (WE) node forward algorithm is proposed. Combined with the traditional forward-backward combining algorithm, a novel dynamic decoding scheme based on hidden Markov model is proposed. The internet. The number of nodes and sides of the optimized decoding network are respectively ¼ of the linear dictionary decoding network and ½ of HDecode of the open source toolkit. The number of nodes that need to calculate the predictive score of the language model is ½ of HDecode. The acoustic model is based on triphone modeling and can be easily ported to other languages.