A New Semantic Similarity Scheme for more Accurate Identification in Medical Data

Colin Wilcox, Soufiene Djahel, Vasilios Giagos, Kristopher Welsh, Nicholas Costen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


This paper aims to design a new measure of similarity between personal textual information retrieved from historic medical records to correct errors introduced due to poor encoding and data omission. The key motivation underlying our proposed layered algorithm, named Semantic Similarity scheme (SSIM), is to create a consistent, complete and accurate data set that may then be used as a basis for the identification and authentication of individuals in a medical context. Such consistent data may provide a basis for use as part of an access control system without compromising medical ethics or security. The obtained evaluation results, using four sample data sets from the UK, USA, Canada and Australia, highlight promising benefits compared to other similarity measures including Jaccard index, Sorensen-Dice and Cosine Similarity - especially when nicknames, abbreviations and synonyms are used to determine similarity.
Original languageEnglish
Title of host publicationProceedings of 2023 IEEE International Smart Cities Conference
Subtitle of host publication(ISC2 2023)
Number of pages7
ISBN (Electronic)9798350397758, 9798350397741
ISBN (Print)9798350397765
Publication statusPublished - 31 Oct 2023
Event9th IEEE International Smart Cities Conference - University Politehnica of Bucharest, Bucharest, Romania
Duration: 24 Sep 202327 Sep 2023
Conference number: 9

Publication series

NameProceedings of the IEEE International Smart Cities Conference
ISSN (Print)2687-8852
ISSN (Electronic)2687-8860


Conference9th IEEE International Smart Cities Conference
Abbreviated titleISC2 2023
Internet address

Cite this