Combining mask estimates for single channel audio source separation using deep neural networks

Emad M Grais, Gerard Roma, Andrew JR Simpson, Mark D Plumbley

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

23 Citations (Scopus)

Abstract

Deep neural networks (DNNs) are usually used for single channel source separation to predict either soft or binary time frequency masks. The masks are used to separate the sources from the mixed signal. Binary masks produce separated sources with more distortion and less interference than soft masks. In this paper, we propose to use another DNN to combine the estimates of binary and soft masks to achieve the advantages and avoid the disadvantages of using each mask individually. We aim to achieve separated sources with low distortion and low interference between each other. Our experimental results show that combining the estimates of binary and soft masks using DNN achieves lower distortion than using each estimate individually and achieves as low interference as the binary mask. Copyright © 2016 ISCA.
Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Pages3339-3343
Number of pages5
DOIs
Publication statusPublished - Sept 2016
Externally publishedYes
Event17th Annual Conference of the International Speech Communication Association - Hyatt Regency, San Francisco, United States
Duration: 8 Sept 201612 Sept 2016

Publication series

Name
ISSN (Electronic)1990-9772

Conference

Conference17th Annual Conference of the International Speech Communication Association
Abbreviated titleINTERSPEECH 2016
Country/TerritoryUnited States
CitySan Francisco
Period8/09/1612/09/16

Fingerprint

Dive into the research topics of 'Combining mask estimates for single channel audio source separation using deep neural networks'. Together they form a unique fingerprint.
  • Improving single-network single-channel separation of musical audio with convolutional layers

    Roma, G., Green, O. & Tremblay, P. A., 6 Jun 2018, Latent Variable Analysis and Signal Separation: 14th International Conference, LVA/ICA 2018, Guildford, UK, July 2–5, 2018, Proceedings. Gannot, S., Deville, Y., Mason, R., Plumbley, M. D. & Ward, D. (eds.). Springer Verlag, p. 306-315 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 10891 LNCS).

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Open Access
    File
    6 Citations (Scopus)
  • Discriminative Enhancement for Single Channel Audio Source Separation Using Deep Neural Networks

    Grais, E. M., Roma, G., Simpson, A. J. & Plumbley, M. D., 15 Feb 2017, Latent Variable Analysis and Signal Separation. Tichavský, P., Babaie-Zadeh, M., Michel, O. J. J. & Thirion-Moreau, N. (eds.). Springer, Cham, p. 236-246 11 p. (Lecture Notes in Computer Science; vol. 10169).

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Open Access
    8 Citations (Scopus)
  • Two-stage single-channel audio source separation using deep neural networks

    Grais, E. M., Roma, G., Simpson, A. J. & Plumbley, M. D., 1 Sept 2017, In: IEEE/ACM Transactions on Audio, Speech, and Language Processing. 25, 9, p. 1773-1783 11 p.

    Research output: Contribution to journalArticlepeer-review

    46 Citations (Scopus)
  • Single-Channel Audio Source Separation Using Deep Neural Network Ensembles

    Grais, E. M., Roma, G., Simpson, A. J. & Plumbley, M. D., 26 May 2016, Audio Engineering Society Convention 140. Audio Engineering Society, 9494

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Open Access
  • Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network

    Simpson, A. J., Roma, G. & Plumbley, M. D., 15 Aug 2015, Latent Variable Analysis and Signal Separation. Vincent, E., Yeredor, A., Koldovský, Z. & Tichavský, P. (eds.). Springer, Cham, p. 429-436 8 p. (Lecture Notes in Computer Science; vol. 9237).

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    71 Citations (Scopus)

Cite this