Towards potential content-based features evaluation to tackle meaningful citations

Faiza Qayyum, Harun Jamil, Faisal Jamil, Do Hyeun Kim

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)


The scientific community has presented various citation classification models to refute the concept of pure quantitative citation analysis systems wherein all citations are treated equally. However, a small number of benchmark datasets exist, which makes the asymmetric citation data-driven modeling quite complex. These models classify citations for varying reasons, mostly harnessing metadata and content-based features derived from research papers. Presently, researchers are more inclined toward binary citation classification with the belief that exploiting the datasets of incomplete nature in the best possible way is adequate to address the issue. We argue that contemporary ML citation classification models overlook essential aspects while selecting the appropriate features that hinder elutriating the asymmetric citation data. This study presents a novel binary citation classification model exploiting a list of potential natural language processing (NLP) based features. Machine learning classifiers, including SVM, KLR, and RF, are harnessed to classify citations into important and non-important classes. The evaluation is performed using two benchmark data sets containing a corpus of around 953 paper-citation pairs annotated by the citing authors and domain experts. The study outcomes exhibit that the proposed model outperformed the contemporary approaches by attaining a precision of 0.88.

Original languageEnglish
Article number1973
Number of pages19
Issue number10
Publication statusPublished - 19 Oct 2021
Externally publishedYes


Dive into the research topics of 'Towards potential content-based features evaluation to tackle meaningful citations'. Together they form a unique fingerprint.

Cite this