Separating time-frequency sources from time-domain convolutive mixtures using non-negative matrix factorization


We present some source separation results with the proposed method [1]. We compare this approach with two other ones from the literature: [2] and [3,4].

All algorithms are run from blindly initialized source parameters while the mixing filters are known and fixed.

The methods are here compared on one stereo mixture with a reverberation time of 470 ms. The musical excerpt is from "Ana" by Vieux Farka Toure. Source signals are available with the MTG MASS database. The stereo source images have been created using room impulse responses from the RWCP database.

Matlab code for [1] is available here.

[1] S. Leglaive, R. Badeau, G. Richard. "Separating time-frequency sources from time-domain convolutive mixtures using non-negative matrix factorization", submitted for publication in Proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 2017.
[2] M. Kowalski, E. Vincent, R. Gribonval. "Beyond the narrowband approximation: Wideband convex methods for under-determined reverberant audio source separation", in IEEE Transactions on Audio, Speech and Language Processing, vol. 18, no. 7, pp. 1818-1829, 2010.
[3] A. Ozerov, C. Févotte. "Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation", in IEEE Transactions on Audio, Speech and Language Processing, vol. 18, no. 3, pp. 550-563, 2010.
[4] A. Ozerov, C. Févotte, R. Blouet, J.-L. Durrieu. "Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation", in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 257-260.


Stereo mixture:

Voice Guitar 1 Guitar 2 Bass Drums
Original sources (mono)
MDCT
OFSTFT - overlap 25%
OFSTFT - overlap 50%
OFSTFT - overlap 75%
Kowalski et al. [2]
Ozerov et al. [3, 4]