TY - JOUR
T1 - An investigation into the deep learning approach in sentimental analysis using graph-based theories
AU - Kentour, Mohamed
AU - Lu, Joan
N1 - Publisher Copyright:
© 2021 Kentour, Lu. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2021/12/2
Y1 - 2021/12/2
N2 - Sentiment analysis is a branch of natural language analytics that aims to correlate what is expressed which comes normally within unstructured format with what is believed and learnt. Several attempts have tried to address this gap (i.e., Naive Bayes, RNN, LSTM, word embedding, etc.), even though the deep learning models achieved high performance, their generative process remains a “black-box” and not fully disclosed due to the high dimensional feature and the non-deterministic weights assignment. Meanwhile, graphs are becoming more popular when modeling complex systems while being traceable and understood. Here, we reveal that a good trade-off transparency and efficiency could be achieved with a Deep Neural Network by exploring the Credit Assignment Paths theory. To this end, we propose a novel algorithm which alleviates the features’ extraction mechanism and attributes an importance level of selected neurons by applying a deterministic edge/node embeddings with attention scores on the input unit and backward path respectively. We experiment on the Twitter Health News dataset were the model has been extended to approach different approximations (tweet/aspect and tweets’ source levels, frequency, polarity/subjectivity), it was also transparent and traceable. Moreover, results of comparing with four recent models on same data corpus for tweets analysis showed a rapid convergence with an overall accuracy of ≈83% and 94% of correctly identified true positive sentiments. Therefore, weights can be ideally assigned to specific active features by following the proposed method. As opposite to other compared works, the inferred features are conditioned through the users’ preferences (i.e., frequency degree) and via the activation’s derivatives (i.e., reject feature if not scored). Future direction will address the inductive aspect of graph embeddings to include dynamic graph structures and expand the model resiliency by considering other datasets like SemEval task7, covid-19 tweets, etc.
AB - Sentiment analysis is a branch of natural language analytics that aims to correlate what is expressed which comes normally within unstructured format with what is believed and learnt. Several attempts have tried to address this gap (i.e., Naive Bayes, RNN, LSTM, word embedding, etc.), even though the deep learning models achieved high performance, their generative process remains a “black-box” and not fully disclosed due to the high dimensional feature and the non-deterministic weights assignment. Meanwhile, graphs are becoming more popular when modeling complex systems while being traceable and understood. Here, we reveal that a good trade-off transparency and efficiency could be achieved with a Deep Neural Network by exploring the Credit Assignment Paths theory. To this end, we propose a novel algorithm which alleviates the features’ extraction mechanism and attributes an importance level of selected neurons by applying a deterministic edge/node embeddings with attention scores on the input unit and backward path respectively. We experiment on the Twitter Health News dataset were the model has been extended to approach different approximations (tweet/aspect and tweets’ source levels, frequency, polarity/subjectivity), it was also transparent and traceable. Moreover, results of comparing with four recent models on same data corpus for tweets analysis showed a rapid convergence with an overall accuracy of ≈83% and 94% of correctly identified true positive sentiments. Therefore, weights can be ideally assigned to specific active features by following the proposed method. As opposite to other compared works, the inferred features are conditioned through the users’ preferences (i.e., frequency degree) and via the activation’s derivatives (i.e., reject feature if not scored). Future direction will address the inductive aspect of graph embeddings to include dynamic graph structures and expand the model resiliency by considering other datasets like SemEval task7, covid-19 tweets, etc.
KW - Graph-based theories
KW - Sentiment analysis
KW - deep learning
UR - http://www.scopus.com/inward/record.url?scp=85120696058&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0260761
DO - 10.1371/journal.pone.0260761
M3 - Article
VL - 16
JO - PLoS One
JF - PLoS One
SN - 1932-6203
IS - 12
M1 - e0260761
ER -