Spatial Audio Scene Characterization (SASC): Automatic Localization of Front-, Back-, Up-, and Down- Positioned Music Ensembles in Binaural Recordings

Slawomir K. Zielinski, Paweł Antoniuk, Hyunkook Lee

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

The automatic localization of audio sources distributed symmetrically with respect to coronal or transverse planes using binaural signals still poses a challenging task, due to the front–back and up–down confusion effects. This paper demonstrates that the convolutional neural network (CNN) can be used to automatically localize music ensembles panned to the front, back, up, or down positions. The network was developed using the repository of the binaural excerpts obtained by the convolution of multi-track music recordings with the selected sets of head-related transfer functions (HRTFs). They were generated in such a way that a music ensemble (of circular shape in terms of its boundaries) was positioned in one of the following four locations with respect to the listener: front, back, up, and down. According to the obtained results, CNN identified the location of the ensembles with the average accuracy levels of 90.7% and 71.4% when tested under the HRTF-dependent and HRTF-independent conditions, respectively. For HRTF-dependent tests, the accuracy decreased monotonically with the increase in the ensemble size. A modified image occlusion sensitivity technique revealed selected frequency bands as being particularly important in terms of the localization process. These frequency bands are largely in accordance with the psychoacoustical literature.
Original languageEnglish
Article number1569
Number of pages23
JournalApplied Sciences (Switzerland)
Volume12
Issue number3
DOIs
Publication statusPublished - 1 Feb 2022

Fingerprint

Dive into the research topics of 'Spatial Audio Scene Characterization (SASC): Automatic Localization of Front-, Back-, Up-, and Down- Positioned Music Ensembles in Binaural Recordings'. Together they form a unique fingerprint.

Cite this