Multichannel Audio Source Separation with Probabilistic Reverberation Priors


We present some source separation results with the proposed method [1]. They are compared with the results from the baseline approach [2, 3], where no priors are considered on the mixing filters. The baseline method corresponds to [2] except for the estimation of the NMF source parameters which is done as in [3]. Both algorithms are run from the same blind initialization.

The stereo source images have been created using room impulse responses simulated with the Roomsimove toolbox. The reverberation time is 128 ms.

Matlab code for [1] is available here.

[1] S. Leglaive, R. Badeau, G. Richard. "Multichannel audio source separation with probabilistic reverberation priors", in IEEE Transactions on Audio, Speech and Language Processing, vol. 24, no. 12, pp. 2453-2465, 2016.
[2] A. Ozerov, C. Févotte. "Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation", in IEEE Transactions on Audio, Speech and Language Processing, vol. 18, no. 3, pp. 550-563, 2010.
[3] A. Ozerov, C. Févotte, R. Blouet, J.-L. Durrieu. "Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation", in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 257-260.


1. Excerpt from "TV on" by Kismet. It corresponds to mixture 4 in [1]. Source signals are available with the MTG MASS database.

Stereo mixture:

Drums Voice Guitar 1 Guitar 2
Original source images (stereo)
Baseline (w/o priors)
[2, 3]

SDR (dB): -1.4
SIR (dB): -2.1
SAR (dB): 13.0
ISR (dB): 9.4

SDR (dB): 4.3
SIR (dB): 11.7
SAR (dB): 7.9
ISR (dB): 6.0

SDR (dB): 0.1
SIR (dB): -0.2
SAR (dB): 3.9
ISR (dB): 0.6

SDR (dB): -0.5
SIR (dB): -0.3
SAR (dB): 6.8
ISR (dB): 7.4
Proposed (w/ priors) [1]
SDR (dB): 6.4
SIR (dB): 9.1
SAR (dB): 11.5
ISR (dB): 11.5

SDR (dB): 3.7
SIR (dB): 7.7
SAR (dB): 8.6
ISR (dB): 6.0

SDR (dB): 7.1
SIR (dB): 11.3
SAR (dB): 11.8
ISR (dB): 11.2

SDR (dB): -0.5
SIR (dB): -0.9
SAR (dB): 6.6
ISR (dB): 4.7

2. Excerpt from "Borrowed heart" by Hezekiah Jones. Source signals are available with the MedleyDB database.

Stereo mixture:

Drums Banjo Guitar
Original source images (stereo)
Baseline (w/o priors)
[2, 3]

SDR (dB): -0.5
SIR (dB): 0
SAR (dB): 13.4
ISR (dB): 12.6

SDR (dB): 2.3
SIR (dB): 11.2
SAR (dB): 8.9
ISR (dB): 2.8

SDR (dB): 4.5
SIR (dB): 6.4
SAR (dB): 9.0
ISR (dB): 8.7
Proposed (w/ priors) [1]
SDR (dB): 1.9
SIR (dB): 2.9
SAR (dB): 9.4
ISR (dB): 16.8

SDR (dB): 3.9
SIR (dB): 12.3
SAR (dB): 7.9
ISR (dB): 5.3

SDR (dB): 4.4
SIR (dB): 6.6
SAR (dB): 9.3
ISR (dB): 9.4

3. Excerpt from "Sunrise" by Shannon Hurley. Source signals are available here.

Stereo mixture:

Voice Drums Piano
Original source images (stereo)
Baseline (w/o priors)
[2, 3]

SDR (dB): 4.5
SIR (dB): 15.8
SAR (dB): 9.3
ISR (dB): 5.6

SDR (dB): -0.9
SIR (dB): -0.6
SAR (dB): 8.0
ISR (dB): 11.3

SDR (dB): 8.6
SIR (dB): 11.5
SAR (dB): 12.7
ISR (dB): 16.0
Proposed (w/ priors) [1]
SDR (dB): 11.0
SIR (dB): 17.7
SAR (dB): 14.8
ISR (dB): 14.6

SDR (dB): 5.1
SIR (dB): 8.4
SAR (dB): 7.9
ISR (dB): 12.2

SDR (dB): 10.7
SIR (dB): 14.5
SAR (dB): 14.0
ISR (dB): 16.3