Abstract
The aim of the study was to develop a method for automatic classification of the three spatial audio scenes, differing in horizontal distribution of foreground and background audio content around a listener in binaurally rendered recordings of music. For the purpose of the study, audio recordings were synthesized using thirteen sets of binaural-room-impulse-responses (BRIRs), representing room acoustics of both semi-anechoic and reverberant venues. Head movements were not considered in the study. The proposed method was assumption-free with regards to the number and characteristics of the audio sources. A least absolute shrinkage and selection operator was employed as a classifier. According to the results, it is possible to automatically identify the spatial scenes using a combination of binaural and spectro-temporal features. The method exhibits a satisfactory classification accuracy when it is trained and then tested on different stimuli but synthesized using the same BRIRs (accuracy ranging from 74% to 98%), even in highly reverberant conditions. However, the generalizability of the method needs to be further improved. This study demonstrates that in addition to the binaural cues, the Mel-frequency cepstral coefficients constitute an important carrier of spatial information, imperative for the classification of spatial audio scenes.
| Original language | English |
|---|---|
| Article number | 1724 |
| Number of pages | 22 |
| Journal | Applied Sciences (Switzerland) |
| Volume | 9 |
| Issue number | 9 |
| Early online date | 26 Apr 2019 |
| DOIs | |
| Publication status | Published - 1 May 2019 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 9 Industry, Innovation, and Infrastructure
Fingerprint
Dive into the research topics of 'Automatic Spatial Audio Scene Classification in Binaural Recordings of Music'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver