This web page presents source separation examples obtained with the proposed method [1]. It relies on a Student's t NMF-based source model defined in the MDCT domain. The impulse response of the mixing filters is also modeled using the Student's distribution. We compare this approach with other ones from the literature:
The algorithms for the baselines and the proposed methods are run using oracle NMF dictionaries. All other model parameters are blindly estimated.
Matlab code for [1] is available here.
The stereo mixtures have been created using source signals from the MTG MASS database and room impulse responses from the MIRD database [5]. Note that we evaluate the source separation quality in terms of reconstructed stereo source images. You can therefore pay attention to the estimated spatial position of the sources.
[1] S. Leglaive, R. Badeau, and G. Richard. "Student's t source and mixing models for multichannel audio source separation", in IEEE Transactions on Audio, Speech and Language Processing, vol. 26, no. 6, 2018.
[2] A. Ozerov, E. Vincent, and F. Bimbot. "A general flexible framework for the handling of prior information in audio source separation", in IEEE Transactions on Audio, Speech and Language Processing, vol. 20, no. 4, 2012.
[3] H. Sawada, H. Kameoka, S. Araki, and N. Ueda. "Multichannel extensions of non-negative matrix factorization with complex-valued data", in IEEE Transactions on Audio, Speech and Language Processing, vol. 21, no. 5, 2013.
[4] S. Leglaive, R. Badeau, and G. Richard. "Multichannel audio source separation: variational inference of time-frequency sources from time-domain observations", in Proc. of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), New-Orleans, LA, USA, 2017.
[5] E. Hadad, F. Heese, P. Vary, and S. Gannot. "Multichannel audio database in various acoustic environments", in Proc. of IEEE Int. Workshop on Acoustic Signal Enhancement (IWAENC), Antibes - Juan les Pins, France, 2014.
Drums | Guitar 1 | Guitar 2 | Voice | |
---|---|---|---|---|
Original source images (stereo) | ||||
Proposed - w/o adapted TF window | ||||
Proposed - w/ adapted TF window |
Ozerov et al. - rank 1 | ||||
Ozerov et al. - rank 2 | ||||
Sawada et al. - rank 2 | ||||
Unconstrained time-domain filters |
In this second section we present the source separation results for a mixture created with three different reverberation times : 160, 360 and 610 ms. The musical excerpt is from "Ana" by Vieux Farka Toure. We chose this specific excerpt because of the impulsiveness of the drums, which allows us to carefully listen to reverberation. Moreover some issues with our previous method [3] are particularly audible from this excerpt (see section 3 below).
a) Reverberation time of 160 ms
Stereo mixture:Drums | Voice | Bass | |
---|---|---|---|
Original source images (stereo) | |||
Proposed - w/o adapted TF window | |||
Proposed - w/ adapted TF window | |||
Ozerov et al. - rank 1 | |||
Ozerov et al. - rank 2 | |||
Sawada et al. - rank 2 | |||
Unconstrained time-domain filters |
b) Reverberation time of 360 ms
Stereo mixture:Drums | Voice | Bass | |
---|---|---|---|
Original source images (stereo) | |||
Proposed - w/o adapted TF window | |||
Proposed - w/ adapted TF window | |||
Ozerov et al. - rank 1 | |||
Ozerov et al. - rank 2 | |||
Sawada et al. - rank 2 | |||
Unconstrained time-domain filters |
c) Reverberation time of 610 ms
Stereo mixture:Drums | Voice | Bass | |
---|---|---|---|
Original source images (stereo) | |||
Proposed - w/o adapted TF window | |||
Proposed - w/ adapted TF window | |||
Ozerov et al. - rank 1 | |||
Ozerov et al. - rank 2 | |||
Sawada et al. - rank 2 | |||
Unconstrained time-domain filters |
It can be noticed from the mixtures with a reverberation time of 360 and 610 ms that the results obtained with our previous method [3] are not satisfactory. This is due to the unconstrained nature of the estimation of the mixing filters in this method. To illustrate this point you can listen below to the true stereo mixing filter used for the drums in the previous audio example (reverberation time of 610 ms), along with its estimate using [3] (unconstrained) and the proposed method (constrained). As can be heard, a part of the voice source signal is contained in the unconstrained mixing filter. We do not have this issue with the proposed method, precisely because we used probabilistic priors to guide the estimation of the mixing filters.
True mixing filter | Unconstrained estimation | Constrained estimation |
---|---|---|