Improving single-network single-channel separation of musical audio with convolutional layers

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Most convolutional neural network architectures explored so far for musical audio separation follow an autoencoder structure, where the mixture is considered to be a corrupted version of the original source. On the other hand, many approaches based on deep neural networks make use of several networks with different objectives for estimating the sources. In this paper we propose a discriminative approach based on traditional convolutional neural network architectures for image classification and speech recognition. Our results show that this architecture performs similarly to current state of the art approaches for separating singing voice, and that the addition of convolutional layers allows improving separation results with respect to using only fully-connected layers.

Original languageEnglish
Title of host publicationLatent Variable Analysis and Signal Separation
Subtitle of host publication14th International Conference, LVA/ICA 2018, Guildford, UK, July 2–5, 2018, Proceedings
EditorsSharon Gannot, Yannick Deville, Russell Mason, Mark D. Plumbley, Dominic Ward
PublisherSpringer Verlag
Pages306-315
ISBN (Electronic)9783319937649
ISBN (Print)9783319937632
DOIs
Publication statusPublished - 6 Jun 2018
Event14th International Conference on Latent Variable Analysis and Signal Seperation - University of Surrey, Guildford, United Kingdom
Duration: 2 Jul 20186 Jul 2018
http://cvssp.org/events/lva-ica-2018/ (Link to Conference Website)

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10891 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th International Conference on Latent Variable Analysis and Signal Seperation
Abbreviated titleLVA / ICA 2018
CountryUnited Kingdom
CityGuildford
Period2/07/186/07/18
Internet address

Fingerprint

Network architecture
Neural Networks
Network Architecture
Neural networks
Image classification
Speech recognition
Image Classification
Speech Recognition
Deep neural networks

Cite this

Roma, G., Green, O., & Tremblay, P. A. (2018). Improving single-network single-channel separation of musical audio with convolutional layers. In S. Gannot, Y. Deville, R. Mason, M. D. Plumbley, & D. Ward (Eds.), Latent Variable Analysis and Signal Separation: 14th International Conference, LVA/ICA 2018, Guildford, UK, July 2–5, 2018, Proceedings (pp. 306-315). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10891 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-93764-9_29
Roma, Gerard ; Green, Owen ; Tremblay, Pierre Alexandre. / Improving single-network single-channel separation of musical audio with convolutional layers. Latent Variable Analysis and Signal Separation: 14th International Conference, LVA/ICA 2018, Guildford, UK, July 2–5, 2018, Proceedings. editor / Sharon Gannot ; Yannick Deville ; Russell Mason ; Mark D. Plumbley ; Dominic Ward. Springer Verlag, 2018. pp. 306-315 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{e6448cb4ea924a7496cbd1ab92fd4b4a,
title = "Improving single-network single-channel separation of musical audio with convolutional layers",
abstract = "Most convolutional neural network architectures explored so far for musical audio separation follow an autoencoder structure, where the mixture is considered to be a corrupted version of the original source. On the other hand, many approaches based on deep neural networks make use of several networks with different objectives for estimating the sources. In this paper we propose a discriminative approach based on traditional convolutional neural network architectures for image classification and speech recognition. Our results show that this architecture performs similarly to current state of the art approaches for separating singing voice, and that the addition of convolutional layers allows improving separation results with respect to using only fully-connected layers.",
keywords = "audio source separation, convolutional neural networks, Audio source separation, Convolutional neural networks",
author = "Gerard Roma and Owen Green and Tremblay, {Pierre Alexandre}",
note = "The final authenticated publication is available online at https://doi.org/10.1007/978-3-319-93764-9_29",
year = "2018",
month = "6",
day = "6",
doi = "10.1007/978-3-319-93764-9_29",
language = "English",
isbn = "9783319937632",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "306--315",
editor = "Sharon Gannot and Yannick Deville and Russell Mason and Plumbley, {Mark D.} and Dominic Ward",
booktitle = "Latent Variable Analysis and Signal Separation",

}

Roma, G, Green, O & Tremblay, PA 2018, Improving single-network single-channel separation of musical audio with convolutional layers. in S Gannot, Y Deville, R Mason, MD Plumbley & D Ward (eds), Latent Variable Analysis and Signal Separation: 14th International Conference, LVA/ICA 2018, Guildford, UK, July 2–5, 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10891 LNCS, Springer Verlag, pp. 306-315, 14th International Conference on Latent Variable Analysis and Signal Seperation, Guildford, United Kingdom, 2/07/18. https://doi.org/10.1007/978-3-319-93764-9_29

Improving single-network single-channel separation of musical audio with convolutional layers. / Roma, Gerard; Green, Owen; Tremblay, Pierre Alexandre.

Latent Variable Analysis and Signal Separation: 14th International Conference, LVA/ICA 2018, Guildford, UK, July 2–5, 2018, Proceedings. ed. / Sharon Gannot; Yannick Deville; Russell Mason; Mark D. Plumbley; Dominic Ward. Springer Verlag, 2018. p. 306-315 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10891 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Improving single-network single-channel separation of musical audio with convolutional layers

AU - Roma, Gerard

AU - Green, Owen

AU - Tremblay, Pierre Alexandre

N1 - The final authenticated publication is available online at https://doi.org/10.1007/978-3-319-93764-9_29

PY - 2018/6/6

Y1 - 2018/6/6

N2 - Most convolutional neural network architectures explored so far for musical audio separation follow an autoencoder structure, where the mixture is considered to be a corrupted version of the original source. On the other hand, many approaches based on deep neural networks make use of several networks with different objectives for estimating the sources. In this paper we propose a discriminative approach based on traditional convolutional neural network architectures for image classification and speech recognition. Our results show that this architecture performs similarly to current state of the art approaches for separating singing voice, and that the addition of convolutional layers allows improving separation results with respect to using only fully-connected layers.

AB - Most convolutional neural network architectures explored so far for musical audio separation follow an autoencoder structure, where the mixture is considered to be a corrupted version of the original source. On the other hand, many approaches based on deep neural networks make use of several networks with different objectives for estimating the sources. In this paper we propose a discriminative approach based on traditional convolutional neural network architectures for image classification and speech recognition. Our results show that this architecture performs similarly to current state of the art approaches for separating singing voice, and that the addition of convolutional layers allows improving separation results with respect to using only fully-connected layers.

KW - audio source separation

KW - convolutional neural networks

KW - Audio source separation

KW - Convolutional neural networks

UR - http://cvssp.org/events/lva-ica-2018/

UR - http://www.scopus.com/inward/record.url?scp=85048585573&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-93764-9_29

DO - 10.1007/978-3-319-93764-9_29

M3 - Conference contribution

SN - 9783319937632

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 306

EP - 315

BT - Latent Variable Analysis and Signal Separation

A2 - Gannot, Sharon

A2 - Deville, Yannick

A2 - Mason, Russell

A2 - Plumbley, Mark D.

A2 - Ward, Dominic

PB - Springer Verlag

ER -

Roma G, Green O, Tremblay PA. Improving single-network single-channel separation of musical audio with convolutional layers. In Gannot S, Deville Y, Mason R, Plumbley MD, Ward D, editors, Latent Variable Analysis and Signal Separation: 14th International Conference, LVA/ICA 2018, Guildford, UK, July 2–5, 2018, Proceedings. Springer Verlag. 2018. p. 306-315. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-93764-9_29