Improving single-network single-channel separation of musical audio with convolutional layers

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Most convolutional neural network architectures explored so far for musical audio separation follow an autoencoder structure, where the mixture is assumed to be a corrupted version of the original source. On the other hand, many approaches based on deep neural networks make use of several networks with different objectives for estimating the sources. In this paper we propose a discriminative approach based on traditional convolutional neural network architectures for image classi- fication and speech recognition. Our results show that this architecture performs similarly to current state of the art approaches for separating singing voice, and that the addition of convolutional layers allow improv- ing separation results with respect to using only fully-connected layers.
LanguageEnglish
Title of host publicationLatent Variable Analysis and Signal Separation
Subtitle of host publication14th International Conference, LVA/ICA 2018, Guildford, UK, July 2–5, 2018, Proceedings
EditorsYannick Deville, Sharon Gannot, Russell Mason, Mark D. Plumbley, Dominic Ward
PublisherSpringer Verlag
Pages306-315
ISBN (Electronic)9783319937649
ISBN (Print)9783319937632
DOIs
Publication statusPublished - 6 Jun 2018
Event14th International Conference on Latent Variable Analysis and Signal Seperation - University of Surrey, Guildford, United Kingdom
Duration: 2 Jul 20186 Jul 2018
http://cvssp.org/events/lva-ica-2018/ (Link to Conference Website)

Publication series

NameLecture Notes in Computer Science (LNCS)
PublisherSpringer
Volume10891
ISSN (Electronic)0302-9743

Conference

Conference14th International Conference on Latent Variable Analysis and Signal Seperation
Abbreviated titleLVA / ICA 2018
CountryUnited Kingdom
CityGuildford
Period2/07/186/07/18
Internet address

Fingerprint

Network architecture
Neural networks
Image classification
Speech recognition
Deep neural networks

Cite this

Roma, G., Green, O., & Tremblay, P. A. (2018). Improving single-network single-channel separation of musical audio with convolutional layers. In Y. Deville, S. Gannot, R. Mason, M. D. Plumbley, & D. Ward (Eds.), Latent Variable Analysis and Signal Separation: 14th International Conference, LVA/ICA 2018, Guildford, UK, July 2–5, 2018, Proceedings (pp. 306-315). (Lecture Notes in Computer Science (LNCS); Vol. 10891). Springer Verlag. https://doi.org/10.1007/978-3-319-93764-9_29
Roma, Gerard ; Green, Owen ; Tremblay, Pierre Alexandre. / Improving single-network single-channel separation of musical audio with convolutional layers. Latent Variable Analysis and Signal Separation: 14th International Conference, LVA/ICA 2018, Guildford, UK, July 2–5, 2018, Proceedings. editor / Yannick Deville ; Sharon Gannot ; Russell Mason ; Mark D. Plumbley ; Dominic Ward. Springer Verlag, 2018. pp. 306-315 (Lecture Notes in Computer Science (LNCS)).
@inproceedings{e6448cb4ea924a7496cbd1ab92fd4b4a,
title = "Improving single-network single-channel separation of musical audio with convolutional layers",
abstract = "Most convolutional neural network architectures explored so far for musical audio separation follow an autoencoder structure, where the mixture is assumed to be a corrupted version of the original source. On the other hand, many approaches based on deep neural networks make use of several networks with different objectives for estimating the sources. In this paper we propose a discriminative approach based on traditional convolutional neural network architectures for image classi- fication and speech recognition. Our results show that this architecture performs similarly to current state of the art approaches for separating singing voice, and that the addition of convolutional layers allow improv- ing separation results with respect to using only fully-connected layers.",
keywords = "audio source separation, convolutional neural networks",
author = "Gerard Roma and Owen Green and Tremblay, {Pierre Alexandre}",
note = "The final authenticated publication is available online at https://doi.org/10.1007/978-3-319-93764-9_29",
year = "2018",
month = "6",
day = "6",
doi = "10.1007/978-3-319-93764-9_29",
language = "English",
isbn = "9783319937632",
series = "Lecture Notes in Computer Science (LNCS)",
publisher = "Springer Verlag",
pages = "306--315",
editor = "Yannick Deville and Sharon Gannot and Russell Mason and Plumbley, {Mark D.} and Dominic Ward",
booktitle = "Latent Variable Analysis and Signal Separation",

}

Roma, G, Green, O & Tremblay, PA 2018, Improving single-network single-channel separation of musical audio with convolutional layers. in Y Deville, S Gannot, R Mason, MD Plumbley & D Ward (eds), Latent Variable Analysis and Signal Separation: 14th International Conference, LVA/ICA 2018, Guildford, UK, July 2–5, 2018, Proceedings. Lecture Notes in Computer Science (LNCS), vol. 10891, Springer Verlag, pp. 306-315, 14th International Conference on Latent Variable Analysis and Signal Seperation, Guildford, United Kingdom, 2/07/18. https://doi.org/10.1007/978-3-319-93764-9_29

Improving single-network single-channel separation of musical audio with convolutional layers. / Roma, Gerard; Green, Owen; Tremblay, Pierre Alexandre.

Latent Variable Analysis and Signal Separation: 14th International Conference, LVA/ICA 2018, Guildford, UK, July 2–5, 2018, Proceedings. ed. / Yannick Deville; Sharon Gannot; Russell Mason; Mark D. Plumbley; Dominic Ward. Springer Verlag, 2018. p. 306-315 (Lecture Notes in Computer Science (LNCS); Vol. 10891).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Improving single-network single-channel separation of musical audio with convolutional layers

AU - Roma, Gerard

AU - Green, Owen

AU - Tremblay, Pierre Alexandre

N1 - The final authenticated publication is available online at https://doi.org/10.1007/978-3-319-93764-9_29

PY - 2018/6/6

Y1 - 2018/6/6

N2 - Most convolutional neural network architectures explored so far for musical audio separation follow an autoencoder structure, where the mixture is assumed to be a corrupted version of the original source. On the other hand, many approaches based on deep neural networks make use of several networks with different objectives for estimating the sources. In this paper we propose a discriminative approach based on traditional convolutional neural network architectures for image classi- fication and speech recognition. Our results show that this architecture performs similarly to current state of the art approaches for separating singing voice, and that the addition of convolutional layers allow improv- ing separation results with respect to using only fully-connected layers.

AB - Most convolutional neural network architectures explored so far for musical audio separation follow an autoencoder structure, where the mixture is assumed to be a corrupted version of the original source. On the other hand, many approaches based on deep neural networks make use of several networks with different objectives for estimating the sources. In this paper we propose a discriminative approach based on traditional convolutional neural network architectures for image classi- fication and speech recognition. Our results show that this architecture performs similarly to current state of the art approaches for separating singing voice, and that the addition of convolutional layers allow improv- ing separation results with respect to using only fully-connected layers.

KW - audio source separation

KW - convolutional neural networks

UR - http://cvssp.org/events/lva-ica-2018/

U2 - 10.1007/978-3-319-93764-9_29

DO - 10.1007/978-3-319-93764-9_29

M3 - Conference contribution

SN - 9783319937632

T3 - Lecture Notes in Computer Science (LNCS)

SP - 306

EP - 315

BT - Latent Variable Analysis and Signal Separation

A2 - Deville, Yannick

A2 - Gannot, Sharon

A2 - Mason, Russell

A2 - Plumbley, Mark D.

A2 - Ward, Dominic

PB - Springer Verlag

ER -

Roma G, Green O, Tremblay PA. Improving single-network single-channel separation of musical audio with convolutional layers. In Deville Y, Gannot S, Mason R, Plumbley MD, Ward D, editors, Latent Variable Analysis and Signal Separation: 14th International Conference, LVA/ICA 2018, Guildford, UK, July 2–5, 2018, Proceedings. Springer Verlag. 2018. p. 306-315. (Lecture Notes in Computer Science (LNCS)). https://doi.org/10.1007/978-3-319-93764-9_29