Environmental sound recognition using short-time feature aggregation

Gerard Roma, Perfecto Herrera, Waldo Nogueira

Research output: Contribution to journalArticle

Abstract

Recognition of environmental sound is usually based on two main architectures, depending on whether the model is trained with frame-level features or with aggregated descriptions of acoustic scenes or events. The former architecture is appropriate for applications where target categories are known in advance, while the later affords a less supervised approach. In this paper, we propose a framework for environmental sound recognition based on blind segmentation and feature aggregation. We describe a new set of descriptors, based on Recurrence Quantification Analysis (RQA), which can be extracted from the similarity matrix of a time series of audio descriptors. We analyze their usefulness for recognition of acoustic scenes and events in addition to standard feature aggregation. Our results show the potential of non-linear time series analysis techniques for dealing with environmental sounds.
LanguageEnglish
Pages457-475
Number of pages19
JournalJournal of Intelligent Information Systems
Volume51
Issue number3
Early online date19 Aug 2017
DOIs
Publication statusPublished - Dec 2018
Externally publishedYes

Fingerprint

Agglomeration
Acoustic waves
Acoustics
Time series analysis
Time series

Cite this

Roma, Gerard ; Herrera, Perfecto ; Nogueira, Waldo. / Environmental sound recognition using short-time feature aggregation. In: Journal of Intelligent Information Systems. 2018 ; Vol. 51, No. 3. pp. 457-475.
@article{688566bdb2c24a1c9adf0640583bcc57,
title = "Environmental sound recognition using short-time feature aggregation",
abstract = "Recognition of environmental sound is usually based on two main architectures, depending on whether the model is trained with frame-level features or with aggregated descriptions of acoustic scenes or events. The former architecture is appropriate for applications where target categories are known in advance, while the later affords a less supervised approach. In this paper, we propose a framework for environmental sound recognition based on blind segmentation and feature aggregation. We describe a new set of descriptors, based on Recurrence Quantification Analysis (RQA), which can be extracted from the similarity matrix of a time series of audio descriptors. We analyze their usefulness for recognition of acoustic scenes and events in addition to standard feature aggregation. Our results show the potential of non-linear time series analysis techniques for dealing with environmental sounds.",
keywords = "Audio databases, Audio features, Environmental sound recognition, Event detection, Pattern recognition, Recurrence quantification analysis",
author = "Gerard Roma and Perfecto Herrera and Waldo Nogueira",
year = "2018",
month = "12",
doi = "10.1007/s10844-017-0481-4",
language = "English",
volume = "51",
pages = "457--475",
journal = "Journal of Intelligent Information Systems",
issn = "0925-9902",
publisher = "Springer US",
number = "3",

}

Environmental sound recognition using short-time feature aggregation. / Roma, Gerard; Herrera, Perfecto; Nogueira, Waldo.

In: Journal of Intelligent Information Systems, Vol. 51, No. 3, 12.2018, p. 457-475.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Environmental sound recognition using short-time feature aggregation

AU - Roma, Gerard

AU - Herrera, Perfecto

AU - Nogueira, Waldo

PY - 2018/12

Y1 - 2018/12

N2 - Recognition of environmental sound is usually based on two main architectures, depending on whether the model is trained with frame-level features or with aggregated descriptions of acoustic scenes or events. The former architecture is appropriate for applications where target categories are known in advance, while the later affords a less supervised approach. In this paper, we propose a framework for environmental sound recognition based on blind segmentation and feature aggregation. We describe a new set of descriptors, based on Recurrence Quantification Analysis (RQA), which can be extracted from the similarity matrix of a time series of audio descriptors. We analyze their usefulness for recognition of acoustic scenes and events in addition to standard feature aggregation. Our results show the potential of non-linear time series analysis techniques for dealing with environmental sounds.

AB - Recognition of environmental sound is usually based on two main architectures, depending on whether the model is trained with frame-level features or with aggregated descriptions of acoustic scenes or events. The former architecture is appropriate for applications where target categories are known in advance, while the later affords a less supervised approach. In this paper, we propose a framework for environmental sound recognition based on blind segmentation and feature aggregation. We describe a new set of descriptors, based on Recurrence Quantification Analysis (RQA), which can be extracted from the similarity matrix of a time series of audio descriptors. We analyze their usefulness for recognition of acoustic scenes and events in addition to standard feature aggregation. Our results show the potential of non-linear time series analysis techniques for dealing with environmental sounds.

KW - Audio databases

KW - Audio features

KW - Environmental sound recognition

KW - Event detection

KW - Pattern recognition

KW - Recurrence quantification analysis

U2 - 10.1007/s10844-017-0481-4

DO - 10.1007/s10844-017-0481-4

M3 - Article

VL - 51

SP - 457

EP - 475

JO - Journal of Intelligent Information Systems

T2 - Journal of Intelligent Information Systems

JF - Journal of Intelligent Information Systems

SN - 0925-9902

IS - 3

ER -