The document text similarity measurement and analysis is a growing application of Natural Language Processing. This paper presents the results of using different techniques for semantic text similarity measurements in documents used for safety-critical systems. The research objective of this work is to measure the degree of semantic equivalence of multi-word sentences for rules and procedures contained in the documents on railway safety. These documents, with unstructured data and different formats, need to be preprocessed and cleaned before the set of Natural Language Processing toolkits, and Jaccard and Cosine similarity metrics are applied. The results demonstrate that it is feasible to automate the process of identifying equivalent rules and procedures and measure similarity of disparate safety-critical documents using Natural language processing and similarity measurement techniques.
|Title of host publication
|2020 International Conference on INnovations in Intelligent SysTems and Applications, Proceedings
|Subtitle of host publication
|Mirjana Ivanovic, Tulay Yildirim, Goce Trajcevski, Costin Badica, Ladjel Bellatreche, Igor Kotenko, Amelia Badica, Burcu Erkmen, Milos Savic
|Institute of Electrical and Electronics Engineers Inc.
|Number of pages
|Published - 11 Sep 2020
|2020 International Conference on INnovations in Intelligent SysTems and Applications - Novi Sad, Serbia
Duration: 24 Aug 2020 → 26 Aug 2020
|2020 International Conference on INnovations in Intelligent SysTems and Applications
|24/08/20 → 26/08/20