New Synonyms Extraction Model Based on a Novel Terms Weighting Scheme

Ahmad Hussein Ababneh, Joan Lu, Qiang Xu

Research output: Contribution to journalArticlepeer-review

Abstract

The traditional statistical approach in synonyms extraction is time-consuming. It is necessary to develop a new model to improve the efficiency and accuracy. This research presents a new model in synonyms extraction called Noun Based Distinctive Verbs (NBDV). During the documents’ numerical representation phase, the NBDV replaces the traditional tf-idf weighting scheme with a novel weighting scheme called the Orbit Weighing Scheme (OWS). The OWS links the nouns to their semantic space by examining the singular verbs in each context. The weight of the term is determined by considering the three parameters: Verb_Noun Frequency, Verb_Noun Distribution, and Verb_Noun Distance. The Verb_Noun Distribution parameter is mathematically formulated to depict the semantic relation between the noun and a certain set of verbs that only appear in the context of this noun. We compared the new models with important models in the field, such as the Skip-Gram, the Continuous Bag of Words, and the GloVe model. The NBDV model was tested on both Arabic and English languages, and the results showed 47% recall and 51% precision in the dictionary-based evaluation and 57.5% precision in the human experts’ evaluation. Comparing with the synonyms extraction based on tf.idf, the NBDV obtained 11% higher recall and 10% higher precision. Regarding the efficiency, we found that on average, the synonyms extraction of a single noun requires the process of 186 verbs, and in 63% of the runs; the number of singular verbs was less than 200. It is concluded that the developed method is efficient and processes the single run in linear time.

Original languageEnglish
Pages (from-to)171-221
Number of pages51
JournalJournal of Information and Organizational Sciences
Volume45
Issue number1
DOIs
Publication statusPublished - 15 Jun 2021

Fingerprint

Dive into the research topics of 'New Synonyms Extraction Model Based on a Novel Terms Weighting Scheme'. Together they form a unique fingerprint.

Cite this