Publications

AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder

Published in IEEE ICASSP, 2025

We present AnCoGen, a new method using a masked autoencoder to unify speech signal analysis, control, and generation in single model.

Recommended citation: Samir Sadok, Simon Leglaive, Laurent Girin, Gaël Richard, Xavier Alameda-Pineda. AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder. IEEE ICASSP, 2025 ☍

A multimodal dynamical variational autoencoder for audiovisual speech representation learning

Published in Neural Networks (Elsevier), 2024

We present a multimodal and dynamical VAE (MDVAE) applied to unsupervised audio-visual speech representation learning.

Recommended citation: Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier. A multimodal dynamical variational autoencoder for audiovisual speech representation learning. Neural Networks (Elsevier), 2024 (link)

A vector quantized masked autoencoder for speech emotion recognition

Published in Workshop ICASSP (SASB), 2023

Combined VQ-VAE (unsupervised) with MAE (self-supervised) for speech emotion recognition.

Recommended citation: Sadok Samir, Simon Leglaive and Renaud Séguier. “A vector quantized masked autoencoder for speech emotion recognition.” (2023). ☍

Learning and controlling the source-filter representation of speech with a variational autoencoder

Published in Speech Communication, 2023

We show that the source-filter model of speech production naturally emerges in the latent space of an unsupervised VAE and we propose a weakly-supervised method to control the pitch and formant frequencies of speech signals in the VAE latent space.

Recommended citation: Learning and controlling the source-filter representation of speech with a variational autoencoder Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier Speech Communication, vol. 148, 2023. https://www-sciencedirect-com.ezproxy.universite-paris-saclay.fr/science/article/pii/S0167639323000304