Description and objectives

DEGREASE (ANR-23-CE23-0009) is a 45-month project (2024/04 - 2028/01) funded by the French National Research Agency (ANR) within the Young Researcher program (JCJC) and coordinated by Simon Leglaive.

Speech enhancement

Figure 1: Illustration of the speech enhacement task.

DEGREASE stands for deep generative and inference models for weakly-supervised speech enhancement. Speech enhancement consists of improving the quality and intelligibility of a speech signal in a degraded recording, for instance due to interferring sound sources and reverberation (see Figure 1). Speech enhancement finds applications in various technologies for human and machine listening (hearing aids, assistive listening, vocal assistants, smartphones, smart homes, etc.)

The conventional fully-supervised approach

Figure 2: The (now) conventional approach to supervised speech enhancement.

In recent years, there has been great progress in speech enhancement thanks to deep learning models trained in a supervised manner. Supervised speech enhancement involves three main ingredients, as illustrated in Figure 2:

Unfortunately, it is very difficult, if not impossible, to acquire labeled noisy speech signals in real-world conditions due to cross-talk between microphones. Therefore, datasets for supervised learning have to be generated artificially, by creating synthetic mixtures of isolated speech and noise signals. Artificially-generated training data are however inevitably mismatched with real-world noisy speech recordings, which can result in poor speech enhancement performance in case of severe mismatch. Moreover, if the task or the evaluation domain changes, supervised learning will require collecting new data and retraining the model, which is time- and computationally-consuming. These limitations of supervised speech enhancement contrast with the impressive adaptability of the human auditory system when it comes to perceive speech in unknown adversary acoustic conditions.

DEGREASE

Figure 3: High-level overview of the methodology proposed in DEGREASE.

The scientific ambition of the DEGREASE project is to develop speech enhancement methods that can leverage real unlabeled recordings of noisy and reverberant speech at training time and that can adapt to new acoustic conditions at test time. To reach this objective we propose a methodology at the crossroads of audio signal processing, probabilistic graphical modeling, and deep learning, which is based on deep generative and inference models specifically designed for the processing of multi-microphone speech signals.

The probabilistic generative modeling approach will allow us to consider the clean speech signals as partially-observed variables during training. Models will thus be learned in a semi-supervised manner at training time, and they will be adapted in an unsupervised manner at test time. Speech enhancement will be achieved by inverting the learned generative model, i.e., performing inference.

The outcomes of the DEGREASE project are expected to help building more reliable speech technologies that can work optimally in diverse and uncontrolled acoustic environments.

People

Simon Leglaive Simon Leglaive - Principal Investigator

Sofiene Kammoun Sofiene Kammoun - PhD Student

Louis Bahrman Louis Bahrman - Postdoctoral Researcher

Publications

Modeling strategies for speech enhancement in the latent space of a neural audio codec
Sofiene Kammoun, Xavier Alameda-Pineda, Simon Leglaive
arXiv preprint arXiv:2510.26299, 2025
Accepted at IEEE ICASSP 2026
Paper | Webpage
Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge
Simon Leglaive, Matthieu Fraticelli, Hend ElGhazaly, Léonie Borne, Mostafa Sadeghi, Scott Wisdom, Manuel Pariente, John R. Hershey, Daniel Pressnitzer, Jon P. Barker
Computer Speech & Language, vol. 89, 2025
Paper | GitHub | Data
AnCoGen: Analysis, control and generation of speech with a masked autoencoder
Samir Sadok, Simon Leglaive, Laurent Girin, Gaël Richard, Xavier Alameda-Pineda
IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Hyderabad, India, 2025
Paper | Webpage | Code
Débruitage de parole semi-supervisé par modélisation générative dans un espace de représentation discret des signaux audio
Sofiene Kammoun, Simon Leglaive
XXXe Colloque GRETSI, Strasbourg, France, August 2025