Document Processing: Methods for Semantic Text Similarity Analysis

Abdul Wahab Qurashi, Violeta Holmes, Anju P. Johnson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

47 Citations (Scopus)

Abstract

The document text similarity measurement and analysis is a growing application of Natural Language Processing. This paper presents the results of using different techniques for semantic text similarity measurements in documents used for safety-critical systems. The research objective of this work is to measure the degree of semantic equivalence of multi-word sentences for rules and procedures contained in the documents on railway safety. These documents, with unstructured data and different formats, need to be preprocessed and cleaned before the set of Natural Language Processing toolkits, and Jaccard and Cosine similarity metrics are applied. The results demonstrate that it is feasible to automate the process of identifying equivalent rules and procedures and measure similarity of disparate safety-critical documents using Natural language processing and similarity measurement techniques.

Original languageEnglish
Title of host publication2020 International Conference on INnovations in Intelligent SysTems and Applications, Proceedings
Subtitle of host publicationINISTA 2020
EditorsMirjana Ivanovic, Tulay Yildirim, Goce Trajcevski, Costin Badica, Ladjel Bellatreche, Igor Kotenko, Amelia Badica, Burcu Erkmen, Milos Savic
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages6
ISBN (Electronic)9781728167992
ISBN (Print)9781728168005
DOIs
Publication statusPublished - 11 Sep 2020
Event2020 International Conference on INnovations in Intelligent SysTems and Applications - Novi Sad, Serbia
Duration: 24 Aug 202026 Aug 2020

Conference

Conference2020 International Conference on INnovations in Intelligent SysTems and Applications
Abbreviated titleINISTA 2020
Country/TerritorySerbia
CityNovi Sad
Period24/08/2026/08/20

Fingerprint

Dive into the research topics of 'Document Processing: Methods for Semantic Text Similarity Analysis'. Together they form a unique fingerprint.

Cite this