Semi-blind Student’s t Source Separation for Multichannel Audio Convolutive Mixtures


We present some source separation results with the proposed method [1]. It relies on Student's t source modeling in the MDCT domain and time-domain convolutive mixture modeling. We compare this approach with three other ones from the literature:

  • [2]: Lasso method with l1 regularization on the source time-frequency coefficients and exact time-domain convolutive mixture modeling;
  • [3, 4]: Gaussian NMF source model with approximate convolutive mixture model in the STFT domain;
  • [5]: Gaussian NMF source model in the MDCT domain and exact time-domain convolutive mixture modeling.

All the algorithms are run from blindly initialized source parameters while the mixing filters are known and fixed.

The methods are here compared on one stereo mixture with a reverberation time equal to 256 ms. The musical excerpt is from "Ana" by Vieux Farka Toure. Source signals are available with the MTG MASS database.

The stereo source images have been created using room impulse responses simulated with the Roomsimove toolbox.

Matlab code for [1] is available here.

[1] S. Leglaive, R. Badeau, G. Richard. "Semi-blind Student’s t source separation for multichannel audio convolutive mixtures", submitted for publication in Proc. of the European Signal Processing Conference (Eusipco), Kos Island, Greece, 2017.
[2] M. Kowalski, E. Vincent, R. Gribonval. "Beyond the narrowband approximation: Wideband convex methods for under-determined reverberant audio source separation", in IEEE Transactions on Audio, Speech and Language Processing, vol. 18, no. 7, pp. 1818-1829, 2010.
[3] A. Ozerov, C. Févotte. "Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation", in IEEE Transactions on Audio, Speech and Language Processing, vol. 18, no. 3, pp. 550-563, 2010.
[4] A. Ozerov, C. Févotte, R. Blouet, J.-L. Durrieu. "Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation", in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 257-260.
[5] S. Leglaive, R. Badeau, G. Richard. "Multichannel audio source separation: variational inference of time-frequency sources from time-domain observations", in Proc. of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), New-Orleans, LA, USA, 2017.


Stereo mixture:

Original sources (mono) Student's t NMF model [1] Student's t sparse model [1] Kowalski et al. [2] Ozerov et al. [3, 4] Leglaive et al. [5]
Drums
Voice
Guitar 1
Guitar 2
Bass