The clinical documents stored in a textual and unstructured manner represent a precious source of information that can be gathered by exploiting Information Retrieval techniques. Classification algorithms can be used for organizing this huge amount of data, but are usually tested on standardized corpora, which significantly differ from actual clinical documents that can be found in a modern hospital. The result is that observed performance are different from expected ones. Given such differences, it is unclear how should be the "right" training set, and how its characteristics affects the classification performance. In this paper we present the results of an experimental analysis, conducted on actual clinical documents from a medical Department, which aims to evaluate the impact of differently sized and assembled training sets on well-known classification techniques.
|Number of pages
|CEUR Workshop Proceedings
|Published - 17 Aug 2014
|3rd International Workshop on Artificial Intelligence and Assistive Medicine - Prague, Czech Republic
Duration: 18 Aug 2014 → 18 Aug 2014
Conference number: 3