TY - JOUR
T1 - A deep semantic search method for random tweets
AU - Inuwa-Dutse, Isa
AU - Liptrott, Mark
AU - Korkontzelos, Ioannis
N1 - Funding Information:
The authors would like to thank Prof. Francesco Rizzuto for the fruitful discussions and exchange of ideas about a multitude of aspects related to the research. The third author has participated in this research work as part of the CROSSMINER Project, which has received funding from the European Unions Horizon 2020 Research and Innovation Programme under grant agreement No. 732223.
Funding Information:
The authors would like to thank Prof. Francesco Rizzuto for the fruitful discussions and exchange of ideas about a multitude of aspects related to the research. The third author has participated in this research work as part of the CROSSMINER Project, which has received funding from the European Unions Horizon 2020 Research and Innovation Programme under grant agreement No. 732223 .
Publisher Copyright:
© 2019 The Authors
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
Part of special issue:
SI: Social and Human Mining with Online Social Networks and Media
PY - 2019/9/1
Y1 - 2019/9/1
N2 - Contemporary social media platforms enable users to act as both producers and consumers of content, leading to the generation of enormous amounts of data. While this ability is empowering, it is also posing many challenges concerning efficient searches for relevant information. Many search approaches have been proposed in the literature. However, searching for information on Twitter is particularly challenging due to both the inconsistency in writing styles and the high generation rate of spurious and duplicate content. The quest for instant and efficient data processing to retrieve relevant information renders many existing techniques ineffective when applied to Twitter. We present a multilevel approach based on state-of-the-art deep learning methods and a novel scalable windowing approach for pairwise-similarity search (SWAPS) to improve search efficiency. SWAPS optimises searches using a strategic balancing criterion to assess the trade-off between accuracy and search speed, thereby circumnavigating sequential search problems. Moreover, we propose a deep search strategy that establishes a relationship between the status of a tweet and its longevity measured in terms of engagement lifespan since posting. Deep search utilises a convolutional neural network for textual n-grams features extraction and meta-features from the tweet to train a fully connected network on a vast number of tweets. This approach differs from existing ones by recognising the relationship between the status of a tweet and its engagement lifespan to ensure a better understanding of the compositional semantics in tweets. The results highlight interesting symmetrical properties with respect to similarity distribution and duration. We evaluate our approach on various benchmark datasets and demonstrate the efficacy and applicability of the method. Problems of event detection, clustering and ads, among others, can utilise this approach to detect items of interest effectively.
AB - Contemporary social media platforms enable users to act as both producers and consumers of content, leading to the generation of enormous amounts of data. While this ability is empowering, it is also posing many challenges concerning efficient searches for relevant information. Many search approaches have been proposed in the literature. However, searching for information on Twitter is particularly challenging due to both the inconsistency in writing styles and the high generation rate of spurious and duplicate content. The quest for instant and efficient data processing to retrieve relevant information renders many existing techniques ineffective when applied to Twitter. We present a multilevel approach based on state-of-the-art deep learning methods and a novel scalable windowing approach for pairwise-similarity search (SWAPS) to improve search efficiency. SWAPS optimises searches using a strategic balancing criterion to assess the trade-off between accuracy and search speed, thereby circumnavigating sequential search problems. Moreover, we propose a deep search strategy that establishes a relationship between the status of a tweet and its longevity measured in terms of engagement lifespan since posting. Deep search utilises a convolutional neural network for textual n-grams features extraction and meta-features from the tweet to train a fully connected network on a vast number of tweets. This approach differs from existing ones by recognising the relationship between the status of a tweet and its engagement lifespan to ensure a better understanding of the compositional semantics in tweets. The results highlight interesting symmetrical properties with respect to similarity distribution and duration. We evaluate our approach on various benchmark datasets and demonstrate the efficacy and applicability of the method. Problems of event detection, clustering and ads, among others, can utilise this approach to detect items of interest effectively.
KW - Deep learning
KW - Information search
KW - Semantic search
KW - Tweets
KW - Twitter
UR - http://www.scopus.com/inward/record.url?scp=85070498534&partnerID=8YFLogxK
U2 - 10.1016/j.osnem.2019.07.002
DO - 10.1016/j.osnem.2019.07.002
M3 - Article
AN - SCOPUS:85070498534
VL - 13
JO - Online Social Networks and Media
JF - Online Social Networks and Media
SN - 2468-6964
M1 - 100046
ER -