TY - JOUR
T1 - A Comparison of Human against Machine-Classification of Spatial Audio Scenes in Binaural Recordings of Music
AU - Zieliński, Sławomir
AU - Lee, Hyunkook
AU - Antoniuk, Paweł
AU - Dadan, Oskar
PY - 2020/9/1
Y1 - 2020/9/1
N2 - The purpose of this paper is to compare the performance of human listeners against the selected machine learning algorithms in the task of the classification of spatial audio scenes in binaural recordings of music under practical conditions. The three scenes were subject to classification: (1) music ensemble (a group of musical sources) located in the front, (2) music ensemble located at the back, and (3) music ensemble distributed around a listener. In the listening test, undertaken remotely over the Internet, human listeners reached the classification accuracy of 42.5%. For the listeners who passed the post-screening test, the accuracy was greater, approaching 60%. The above classification task was also undertaken automatically using four machine learning algorithms: convolutional neural network, support vector machines, extreme gradient boosting framework, and logistic regression. The machine learning algorithms substantially outperformed human listeners, with the classification accuracy reaching 84%, when tested under the binaural-room-impulse-response (BRIR) matched conditions. However, when the algorithms were tested under the BRIR mismatched scenario, the accuracy obtained by the algorithms was comparable to that exhibited by the listeners who passed the post-screening test, implying that the machine learning algorithms capability to perform in unknown electro-acoustic conditions needs to be further improved.
AB - The purpose of this paper is to compare the performance of human listeners against the selected machine learning algorithms in the task of the classification of spatial audio scenes in binaural recordings of music under practical conditions. The three scenes were subject to classification: (1) music ensemble (a group of musical sources) located in the front, (2) music ensemble located at the back, and (3) music ensemble distributed around a listener. In the listening test, undertaken remotely over the Internet, human listeners reached the classification accuracy of 42.5%. For the listeners who passed the post-screening test, the accuracy was greater, approaching 60%. The above classification task was also undertaken automatically using four machine learning algorithms: convolutional neural network, support vector machines, extreme gradient boosting framework, and logistic regression. The machine learning algorithms substantially outperformed human listeners, with the classification accuracy reaching 84%, when tested under the binaural-room-impulse-response (BRIR) matched conditions. However, when the algorithms were tested under the BRIR mismatched scenario, the accuracy obtained by the algorithms was comparable to that exhibited by the listeners who passed the post-screening test, implying that the machine learning algorithms capability to perform in unknown electro-acoustic conditions needs to be further improved.
KW - Spatial audio scene classification
KW - Spatial audio information retrieval
KW - Convolutional neural networks
KW - Deep learning
UR - https://www.scopus.com/inward/record.uri?eid=2-s2.0-85090229147&doi=10.3390%2fapp10175956&partnerID=40&md5=516a3baedf0a4c945ed9a3d58
U2 - 10.3390/app10175956
DO - 10.3390/app10175956
M3 - Article
VL - 10
JO - Applied Sciences (Switzerland)
JF - Applied Sciences (Switzerland)
SN - 2076-3417
IS - 17
M1 - 5956
ER -