PANG Shanchen, YAO Jiamin, LIU Ting, ZHAO Hua, CHEN Hongqi. A Text Similarity Measurement Based on Semantic Fingerprint of Characteristic Phrases[J]. Chinese Journal of Electronics, 2020, 29(2): 233-241. DOI: 10.1049/cje.2019.12.011
Citation: PANG Shanchen, YAO Jiamin, LIU Ting, ZHAO Hua, CHEN Hongqi. A Text Similarity Measurement Based on Semantic Fingerprint of Characteristic Phrases[J]. Chinese Journal of Electronics, 2020, 29(2): 233-241. DOI: 10.1049/cje.2019.12.011

A Text Similarity Measurement Based on Semantic Fingerprint of Characteristic Phrases

  • Text similarity measurements are the basis for measuring the degree of matching between two or more texts. Traditional large-scale similarity detection methods based on a digital fingerprint have the advantage of high detection speed, which are only suitable for accurate detection. We propose a method of Chinese text similarity measurement based on feature phrase semantics. Natural language processing (NLP) technology is used to pre-process text and extract the keywords by the Term frequency-Inverse document frequency (TF-IDF) model and further screen out the feature words. We get the exact meaning of a word and semantic similarities between words and a HowNet semantic dictionary. We substitute concepts to get the feature phrases and generate a semantic fingerprint and calculate similarity. The experimental results indicate that the method proposed is superior in similarity detection in terms of its accuracy rate, recall rate, and F-value to the traditional and digital fingerprinting method.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return