The Impact of Different Training Sets on Medical Documents Classification

Roberto Gatta, Mauro Vallati, Berardino De Bari, Mahmut Ozsahin

Research output: Contribution to journalConference articlepeer-review

2 Citations (Scopus)


The clinical documents stored in a textual and unstructured manner represent a precious source of information that can be gathered by exploiting Information Retrieval techniques. Classification algorithms can be used for organizing this huge amount of data, but are usually tested on standardized corpora, which significantly differ from actual clinical documents that can be found in a modern hospital. The result is that observed performance are different from expected ones. Given such differences, it is unclear how should be the "right" training set, and how its characteristics affects the classification performance. In this paper we present the results of an experimental analysis, conducted on actual clinical documents from a medical Department, which aims to evaluate the impact of differently sized and assembled training sets on well-known classification techniques.

Original languageEnglish
Pages (from-to)1-5
Number of pages5
JournalCEUR Workshop Proceedings
Publication statusPublished - 17 Aug 2014
Event3rd International Workshop on Artificial Intelligence and Assistive Medicine - Prague, Czech Republic
Duration: 18 Aug 201418 Aug 2014
Conference number: 3


Dive into the research topics of 'The Impact of Different Training Sets on Medical Documents Classification'. Together they form a unique fingerprint.

Cite this