Abstract
Recognition of environmental sound is usually based on two main architectures, depending on whether the model is trained with frame-level features or with aggregated descriptions of acoustic scenes or events. The former architecture is appropriate for applications where target categories are known in advance, while the later affords a less supervised approach. In this paper, we propose a framework for environmental sound recognition based on blind segmentation and feature aggregation. We describe a new set of descriptors, based on Recurrence Quantification Analysis (RQA), which can be extracted from the similarity matrix of a time series of audio descriptors. We analyze their usefulness for recognition of acoustic scenes and events in addition to standard feature aggregation. Our results show the potential of non-linear time series analysis techniques for dealing with environmental sounds.
Original language | English |
---|---|
Pages (from-to) | 457-475 |
Number of pages | 19 |
Journal | Journal of Intelligent Information Systems |
Volume | 51 |
Issue number | 3 |
Early online date | 19 Aug 2017 |
DOIs | |
Publication status | Published - Dec 2018 |
Externally published | Yes |
Fingerprint Dive into the research topics of 'Environmental sound recognition using short-time feature aggregation'. Together they form a unique fingerprint.
Profiles
-
Gerard Roma
- Department of History, English, Linguistics and Music - Senior Research Fellow in Interactive Machine Listening
- School of Music, Humanities and Media
- Centre for Research in New Music - Member
Person: Academic