Automatic Spatial Audio Scene Classification in Binaural Recordings of Music

Sławomir Zieliński, Hyunkook Lee

Research output: Contribution to journalArticle

Abstract

The aim of the study was to develop a method for automatic classification of the three spatial audio scenes, differing in horizontal distribution of foreground and background audio content around a listener in binaurally rendered recordings of music. For the purpose of the study, audio recordings were synthesized using thirteen sets of binaural-room-impulse-responses (BRIRs), representing room acoustics of both semi-anechoic and reverberant venues. Head movements were not considered in the study. The proposed method was assumption-free with regards to the number and characteristics of the audio sources. A least absolute shrinkage and selection operator was employed as a classifier. According to the results, it is possible to automatically identify the spatial scenes using a combination of binaural and spectro-temporal features. The method exhibits a satisfactory classification accuracy when it is trained and then tested on different stimuli but synthesized using the same BRIRs (accuracy ranging from 74% to 98%), even in highly reverberant conditions. However, the generalizability of the method needs to be further improved. This study demonstrates that in addition to the binaural cues, the Mel-frequency cepstral coefficients constitute an important carrier of spatial information, imperative for the classification of spatial audio scenes.
LanguageEnglish
Article number1724
Number of pages22
JournalApplied Sciences (Switzerland)
Volume9
Issue number9
DOIs
Publication statusPublished - 26 Apr 2019

Fingerprint

music
rooms
recording
Impulse response
impulses
head movement
horizontal distribution
Audio recordings
cues
classifiers
shrinkage
stimuli
Classifiers
Acoustics
operators
acoustics
coefficients

Cite this

@article{519f79e9ab4047b5a8b236e6529928ad,
title = "Automatic Spatial Audio Scene Classification in Binaural Recordings of Music",
abstract = "The aim of the study was to develop a method for automatic classification of the three spatial audio scenes, differing in horizontal distribution of foreground and background audio content around a listener in binaurally rendered recordings of music. For the purpose of the study, audio recordings were synthesized using thirteen sets of binaural-room-impulse-responses (BRIRs), representing room acoustics of both semi-anechoic and reverberant venues. Head movements were not considered in the study. The proposed method was assumption-free with regards to the number and characteristics of the audio sources. A least absolute shrinkage and selection operator was employed as a classifier. According to the results, it is possible to automatically identify the spatial scenes using a combination of binaural and spectro-temporal features. The method exhibits a satisfactory classification accuracy when it is trained and then tested on different stimuli but synthesized using the same BRIRs (accuracy ranging from 74{\%} to 98{\%}), even in highly reverberant conditions. However, the generalizability of the method needs to be further improved. This study demonstrates that in addition to the binaural cues, the Mel-frequency cepstral coefficients constitute an important carrier of spatial information, imperative for the classification of spatial audio scenes.",
author = "Sławomir Zieliński and Hyunkook Lee",
year = "2019",
month = "4",
day = "26",
doi = "10.3390/app9091724",
language = "English",
volume = "9",
journal = "Applied Sciences (Switzerland)",
issn = "2076-3417",
publisher = "MDPI",
number = "9",

}

Automatic Spatial Audio Scene Classification in Binaural Recordings of Music. / Zieliński, Sławomir; Lee, Hyunkook.

In: Applied Sciences (Switzerland), Vol. 9, No. 9, 1724, 26.04.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Automatic Spatial Audio Scene Classification in Binaural Recordings of Music

AU - Zieliński, Sławomir

AU - Lee, Hyunkook

PY - 2019/4/26

Y1 - 2019/4/26

N2 - The aim of the study was to develop a method for automatic classification of the three spatial audio scenes, differing in horizontal distribution of foreground and background audio content around a listener in binaurally rendered recordings of music. For the purpose of the study, audio recordings were synthesized using thirteen sets of binaural-room-impulse-responses (BRIRs), representing room acoustics of both semi-anechoic and reverberant venues. Head movements were not considered in the study. The proposed method was assumption-free with regards to the number and characteristics of the audio sources. A least absolute shrinkage and selection operator was employed as a classifier. According to the results, it is possible to automatically identify the spatial scenes using a combination of binaural and spectro-temporal features. The method exhibits a satisfactory classification accuracy when it is trained and then tested on different stimuli but synthesized using the same BRIRs (accuracy ranging from 74% to 98%), even in highly reverberant conditions. However, the generalizability of the method needs to be further improved. This study demonstrates that in addition to the binaural cues, the Mel-frequency cepstral coefficients constitute an important carrier of spatial information, imperative for the classification of spatial audio scenes.

AB - The aim of the study was to develop a method for automatic classification of the three spatial audio scenes, differing in horizontal distribution of foreground and background audio content around a listener in binaurally rendered recordings of music. For the purpose of the study, audio recordings were synthesized using thirteen sets of binaural-room-impulse-responses (BRIRs), representing room acoustics of both semi-anechoic and reverberant venues. Head movements were not considered in the study. The proposed method was assumption-free with regards to the number and characteristics of the audio sources. A least absolute shrinkage and selection operator was employed as a classifier. According to the results, it is possible to automatically identify the spatial scenes using a combination of binaural and spectro-temporal features. The method exhibits a satisfactory classification accuracy when it is trained and then tested on different stimuli but synthesized using the same BRIRs (accuracy ranging from 74% to 98%), even in highly reverberant conditions. However, the generalizability of the method needs to be further improved. This study demonstrates that in addition to the binaural cues, the Mel-frequency cepstral coefficients constitute an important carrier of spatial information, imperative for the classification of spatial audio scenes.

U2 - 10.3390/app9091724

DO - 10.3390/app9091724

M3 - Article

VL - 9

JO - Applied Sciences (Switzerland)

T2 - Applied Sciences (Switzerland)

JF - Applied Sciences (Switzerland)

SN - 2076-3417

IS - 9

M1 - 1724

ER -