Multichannel Audio Source Separation:
Variational Inference of Time-Frequency Sources from Time-Domain Observations


We present some source separation results with the proposed method [1]. They are compared with the results from the baseline approach [2, 3]. The baseline method corresponds to [2] except for the estimation of the NMF source parameters which is done as in [3].

Both algorithms are run from oracle initializations (knowing the source and mixing parameters).

The two methods are here compared on one stereo mixture with different reverberation times. The musical excerpt is from "Ana" by Vieux Farka Toure. Source signals are available with the MTG MASS database.

The stereo source images have been created using room impulse responses simulated with the Roomsimove toolbox.

Matlab code for [1] is available here.

[1] S. Leglaive, R. Badeau, G. Richard. "Multichannel audio source separation: variational inference of time-frequency sources from time-domain observations", in Proc. of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), New-Orleans, LA, USA, 2017.
[2] A. Ozerov, C. Févotte. "Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation", in IEEE Transactions on Audio, Speech and Language Processing, vol. 18, no. 3, pp. 550-563, 2010.
[3] A. Ozerov, C. Févotte, R. Blouet, J.-L. Durrieu. "Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation", in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 257-260.


1. Reverberation time: 512 ms

Stereo mixture:

Original source images (stereo) Baseline [2, 3] Proposed [1]
Drums
Voice
Guitar 1
Guitar 2
Bass

2. Reverberation time: 256 ms

Stereo mixture:

Original source images (stereo) Baseline [2, 3] Proposed [1]
Drums
Voice
Guitar 1
Guitar 2
Bass

3. Reverberation time: 128 ms

Stereo mixture:

Original source images (stereo) Baseline [2, 3] Proposed [1]
Drums
Voice
Guitar 1
Guitar 2
Bass

4. Reverberation time: 64 ms

Stereo mixture:

Original source images (stereo) Baseline [2, 3] Proposed [1]
Drums
Voice
Guitar 1
Guitar 2
Bass

5. Reverberation time: 32 ms

Stereo mixture:

Original source images (stereo) Baseline [2, 3] Proposed [1]
Drums
Voice
Guitar 1
Guitar 2
Bass