Lexical analysis of automated accounts on twitter

Isa Inuwa-Dutse, Bello Shehu Bello, Ioannis Korkontzelos

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


In recent years, social bots have been using increasingly more sophisticated, challenging detection strategies. While many approaches and features have been proposed, social bots evade detection and interact much like humans making it difficult to distinguish real human accounts from bot accounts. For detection systems, various features under the broader categories of account profile, tweet content, network and temporal pattern have been utilised. The use of tweet content features is limited to analysis of basic terms such as URLs, hashtags, name entities and sentiment. Given a set of tweet contents with no obvious pattern can we distinguish contents produced by social bots from that of humans? We aim to answer this question by analysing the lexical richness of tweets produced by the respective accounts using large collections of different datasets. Our results show a clear margin between the two classes in lexical diversity, lexical sophistication and distribution of emoticons. We found that the proposed lexical features significantly improve the performance of classifying both account types. These features are useful for training a standard machine learning classifier for effective detection of social bot accounts. A new dataset is made freely available for further exploration.

Original languageEnglish
Title of host publicationProceedings of the International Conferences on WWW/Internet 2018 and Applied Computing 2018
EditorsPedro Isaias, Hans Weghorn
PublisherIADIS Press
Number of pages8
ISBN (Electronic)9789898533821
ISBN (Print)9781510875401
Publication statusPublished - 1 Dec 2018
Externally publishedYes
EventInternational Conferences on WWW/Internet, ICWI 2018 and Applied Computing 2018 - Budapest, Hungary
Duration: 21 Oct 201823 Oct 2018


ConferenceInternational Conferences on WWW/Internet, ICWI 2018 and Applied Computing 2018
Internet address


Dive into the research topics of 'Lexical analysis of automated accounts on twitter'. Together they form a unique fingerprint.

Cite this