Abstract
In recent years large volumes of short text data can be eas- ily collected from platforms such as microblogs and product review sites. Very often the obtained short text data contains several domains, which poses many challenges in effective multi-domain text processing because it is challenging to distinguish among the multiple domains in the text data. The concept of multiple domain taxonomy (MDT) has shown promis- ing performance in processing multi-domain text data. However, MDT has to be constructed manually, which requires much expert knowledge about the relevant domains and is time consuming. To address such is- sues, in this paper, we introduce a semi-automatic method to construct an MDT that only requires a small amount of manual input, in com- bination of an unsupervised method for ranking multi-domain concepts based on semantic relationships learned from unlabeled data. We show that the iteratively-constructed MDT using our semi-automatic method can achieve higher accuracy than existing methods in domain classifica- tion, where the accuracy can be improved by up to 11%.
| Original language | English |
|---|---|
| Title of host publication | 29th International Conference on Database and Expert Systems Applications (DEXA 2018) |
| Place of Publication | Cham |
| Publisher | Springer Verlag |
| Pages | 501-509 |
| Number of pages | 9 |
| Edition | 1st |
| ISBN (Electronic) | 9783319988122 |
| ISBN (Print) | 9783319988115 |
| DOIs | |
| Publication status | Published - 9 Aug 2018 |
| Event | 29th International Conference on Database and Expert Systems Applications - Regensburg, Germany Duration: 3 Sept 2018 → 6 Sept 2018 Conference number: 29 http://www.dexa.org/dexa2018 (Link to Conference Website) |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Publisher | Springer |
Conference
| Conference | 29th International Conference on Database and Expert Systems Applications |
|---|---|
| Abbreviated title | DEXA 2018 |
| Country/Territory | Germany |
| City | Regensburg |
| Period | 3/09/18 → 6/09/18 |
| Internet address |
|