A Text Similarity Measurement Based on Semantic Fingerprint of Characteristic Phrases

PANG Shanchen; YAO Jiamin; LIU Ting; ZHAO Hua; CHEN Hongqi

doi:10.1049/cje.2019.12.011

PANG Shanchen, YAO Jiamin, LIU Ting, ZHAO Hua, CHEN Hongqi. A Text Similarity Measurement Based on Semantic Fingerprint of Characteristic Phrases[J]. Chinese Journal of Electronics, 2020, 29(2): 233-241. DOI: 10.1049/cje.2019.12.011

Citation:

A Text Similarity Measurement Based on Semantic Fingerprint of Characteristic Phrases

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Text similarity measurements are the basis for measuring the degree of matching between two or more texts. Traditional large-scale similarity detection methods based on a digital fingerprint have the advantage of high detection speed, which are only suitable for accurate detection. We propose a method of Chinese text similarity measurement based on feature phrase semantics. Natural language processing (NLP) technology is used to pre-process text and extract the keywords by the Term frequency-Inverse document frequency (TF-IDF) model and further screen out the feature words. We get the exact meaning of a word and semantic similarities between words and a HowNet semantic dictionary. We substitute concepts to get the feature phrases and generate a semantic fingerprint and calculate similarity. The experimental results indicate that the method proposed is superior in similarity detection in terms of its accuracy rate, recall rate, and F-value to the traditional and digital fingerprinting method.