Combining mask estimates for single channel audio source separation using deep neural networks

Emad M Grais, Gerard Roma, Andrew JR Simpson, Mark D Plumbley

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

Deep neural networks (DNNs) are usually used for single channel source separation to predict either soft or binary time frequency masks. The masks are used to separate the sources from the mixed signal. Binary masks produce separated sources with more distortion and less interference than soft masks. In this paper, we propose to use another DNN to combine the estimates of binary and soft masks to achieve the advantages and avoid the disadvantages of using each mask individually. We aim to achieve separated sources with low distortion and low interference between each other. Our experimental results show that combining the estimates of binary and soft masks using DNN achieves lower distortion than using each estimate individually and achieves as low interference as the binary mask. Copyright © 2016 ISCA.
Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Pages3339-3343
Number of pages5
DOIs
Publication statusPublished - Sep 2016
Externally publishedYes
Event17th Annual Conference of the International Speech Communication Association - Hyatt Regency, San Francisco, United States
Duration: 8 Sep 201612 Sep 2016

Publication series

Name
ISSN (Electronic)1990-9772

Conference

Conference17th Annual Conference of the International Speech Communication Association
Abbreviated titleINTERSPEECH 2016
CountryUnited States
CitySan Francisco
Period8/09/1612/09/16

Fingerprint

Source separation
Masks
Deep neural networks

Cite this

Grais, E. M., Roma, G., Simpson, A. JR., & Plumbley, M. D. (2016). Combining mask estimates for single channel audio source separation using deep neural networks. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 3339-3343). Interspeech2016 Proceedings https://doi.org/10.21437/Interspeech.2016-216
Grais, Emad M ; Roma, Gerard ; Simpson, Andrew JR ; Plumbley, Mark D. / Combining mask estimates for single channel audio source separation using deep neural networks. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2016. pp. 3339-3343 (Interspeech2016 Proceedings).
@inproceedings{dbb1ab60952f4bc488ceac7a4d19f9fa,
title = "Combining mask estimates for single channel audio source separation using deep neural networks",
abstract = "Deep neural networks (DNNs) are usually used for single channel source separation to predict either soft or binary time frequency masks. The masks are used to separate the sources from the mixed signal. Binary masks produce separated sources with more distortion and less interference than soft masks. In this paper, we propose to use another DNN to combine the estimates of binary and soft masks to achieve the advantages and avoid the disadvantages of using each mask individually. We aim to achieve separated sources with low distortion and low interference between each other. Our experimental results show that combining the estimates of binary and soft masks using DNN achieves lower distortion than using each estimate individually and achieves as low interference as the binary mask. Copyright {\circledC} 2016 ISCA.",
keywords = "Combining estimates, Deep learning, Deep neural networks, Neural network ensembles, Single channel source separation",
author = "Grais, {Emad M} and Gerard Roma and Simpson, {Andrew JR} and Plumbley, {Mark D}",
year = "2016",
month = "9",
doi = "10.21437/Interspeech.2016-216",
language = "English",
pages = "3339--3343",
booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

}

Grais, EM, Roma, G, Simpson, AJR & Plumbley, MD 2016, Combining mask estimates for single channel audio source separation using deep neural networks. in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Interspeech2016 Proceedings, pp. 3339-3343, 17th Annual Conference of the International Speech Communication Association , San Francisco, United States, 8/09/16. https://doi.org/10.21437/Interspeech.2016-216

Combining mask estimates for single channel audio source separation using deep neural networks. / Grais, Emad M; Roma, Gerard; Simpson, Andrew JR; Plumbley, Mark D.

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2016. p. 3339-3343 (Interspeech2016 Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Combining mask estimates for single channel audio source separation using deep neural networks

AU - Grais, Emad M

AU - Roma, Gerard

AU - Simpson, Andrew JR

AU - Plumbley, Mark D

PY - 2016/9

Y1 - 2016/9

N2 - Deep neural networks (DNNs) are usually used for single channel source separation to predict either soft or binary time frequency masks. The masks are used to separate the sources from the mixed signal. Binary masks produce separated sources with more distortion and less interference than soft masks. In this paper, we propose to use another DNN to combine the estimates of binary and soft masks to achieve the advantages and avoid the disadvantages of using each mask individually. We aim to achieve separated sources with low distortion and low interference between each other. Our experimental results show that combining the estimates of binary and soft masks using DNN achieves lower distortion than using each estimate individually and achieves as low interference as the binary mask. Copyright © 2016 ISCA.

AB - Deep neural networks (DNNs) are usually used for single channel source separation to predict either soft or binary time frequency masks. The masks are used to separate the sources from the mixed signal. Binary masks produce separated sources with more distortion and less interference than soft masks. In this paper, we propose to use another DNN to combine the estimates of binary and soft masks to achieve the advantages and avoid the disadvantages of using each mask individually. We aim to achieve separated sources with low distortion and low interference between each other. Our experimental results show that combining the estimates of binary and soft masks using DNN achieves lower distortion than using each estimate individually and achieves as low interference as the binary mask. Copyright © 2016 ISCA.

KW - Combining estimates

KW - Deep learning

KW - Deep neural networks

KW - Neural network ensembles

KW - Single channel source separation

UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-84994242533&origin=resultslist&sort=plf-f&src=s&st1=Roma&st2=Gerard&nlo=1&nlr=20&nls=count-f&sid=74ef08fcc4f7901894ed324151f41ece&sot=anl&sdt=aut&sl=33&s=AU-ID%28%22Roma%2c+Gerard%22+57191952463%29&relpos=8&citeCnt=3&searchTerm=

U2 - 10.21437/Interspeech.2016-216

DO - 10.21437/Interspeech.2016-216

M3 - Conference contribution

SP - 3339

EP - 3343

BT - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

ER -

Grais EM, Roma G, Simpson AJR, Plumbley MD. Combining mask estimates for single channel audio source separation using deep neural networks. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2016. p. 3339-3343. (Interspeech2016 Proceedings). https://doi.org/10.21437/Interspeech.2016-216