TY - JOUR
T1 - New Synonyms Extraction Model Based on a Novel Terms Weighting Scheme
AU - Ababneh, Ahmad Hussein
AU - Lu, Joan
AU - Xu, Qiang
N1 - Publisher Copyright:
© 2021, University of Zagreb, Faculty of Organization and Informatics. All rights reserved.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2021/6/15
Y1 - 2021/6/15
N2 - The traditional statistical approach in synonyms extraction is time-consuming. It is necessary to develop a new model to improve the efficiency and accuracy. This research presents a new model in synonyms extraction called Noun Based Distinctive Verbs (NBDV). During the documents’ numerical representation phase, the NBDV replaces the traditional tf-idf weighting scheme with a novel weighting scheme called the Orbit Weighing Scheme (OWS). The OWS links the nouns to their semantic space by examining the singular verbs in each context. The weight of the term is determined by considering the three parameters: Verb_Noun Frequency, Verb_Noun Distribution, and Verb_Noun Distance. The Verb_Noun Distribution parameter is mathematically formulated to depict the semantic relation between the noun and a certain set of verbs that only appear in the context of this noun. We compared the new models with important models in the field, such as the Skip-Gram, the Continuous Bag of Words, and the GloVe model. The NBDV model was tested on both Arabic and English languages, and the results showed 47% recall and 51% precision in the dictionary-based evaluation and 57.5% precision in the human experts’ evaluation. Comparing with the synonyms extraction based on tf.idf, the NBDV obtained 11% higher recall and 10% higher precision. Regarding the efficiency, we found that on average, the synonyms extraction of a single noun requires the process of 186 verbs, and in 63% of the runs; the number of singular verbs was less than 200. It is concluded that the developed method is efficient and processes the single run in linear time.
AB - The traditional statistical approach in synonyms extraction is time-consuming. It is necessary to develop a new model to improve the efficiency and accuracy. This research presents a new model in synonyms extraction called Noun Based Distinctive Verbs (NBDV). During the documents’ numerical representation phase, the NBDV replaces the traditional tf-idf weighting scheme with a novel weighting scheme called the Orbit Weighing Scheme (OWS). The OWS links the nouns to their semantic space by examining the singular verbs in each context. The weight of the term is determined by considering the three parameters: Verb_Noun Frequency, Verb_Noun Distribution, and Verb_Noun Distance. The Verb_Noun Distribution parameter is mathematically formulated to depict the semantic relation between the noun and a certain set of verbs that only appear in the context of this noun. We compared the new models with important models in the field, such as the Skip-Gram, the Continuous Bag of Words, and the GloVe model. The NBDV model was tested on both Arabic and English languages, and the results showed 47% recall and 51% precision in the dictionary-based evaluation and 57.5% precision in the human experts’ evaluation. Comparing with the synonyms extraction based on tf.idf, the NBDV obtained 11% higher recall and 10% higher precision. Regarding the efficiency, we found that on average, the synonyms extraction of a single noun requires the process of 186 verbs, and in 63% of the runs; the number of singular verbs was less than 200. It is concluded that the developed method is efficient and processes the single run in linear time.
KW - Automatic Synonyms Extraction
KW - Cosine Similarity
KW - Orbit Weighting Scheme
KW - Semantic Context Analysis
KW - Vector Space-based Extraction
UR - http://www.scopus.com/inward/record.url?scp=85108890461&partnerID=8YFLogxK
U2 - 10.31341/jios.45.1.9
DO - 10.31341/jios.45.1.9
M3 - Article
AN - SCOPUS:85108890461
VL - 45
SP - 171
EP - 221
JO - Journal of Information and Organizational Sciences
JF - Journal of Information and Organizational Sciences
SN - 1846-3312
IS - 1
ER -