Publications

A multimodal dynamical variational autoencoder for audiovisual speech representation learning

Published in Neural Networks (Elsevier), 2024

We present a multimodal and dynamical VAE (MDVAE) applied to unsupervised audio-visual speech representation learning.

Recommended citation: Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier. A multimodal dynamical variational autoencoder for audiovisual speech representation learning. Neural Networks (Elsevier), 2024 https://www.sciencedirect.com/science/article/pii/S0893608024000340

A vector quantized masked autoencoder for speech emotion recognition

Published in Workshop ICASSP (SASB), 2023

Combined VQ-VAE (unsupervised) with MAE (self-supervised) for speech emotion recognition.

Recommended citation: Sadok Samir, Simon Leglaive and Renaud Séguier. “A vector quantized masked autoencoder for speech emotion recognition.” (2023). https://arxiv.org/pdf/2304.11117.pdf

Learning and controlling the source-filter representation of speech with a variational autoencoder

Published in Speech Communication, 2023

We show that the source-filter model of speech production naturally emerges in the latent space of an unsupervised VAE and we propose a weakly-supervised method to control the pitch and formant frequencies of speech signals in the VAE latent space.

Recommended citation: Learning and controlling the source-filter representation of speech with a variational autoencoder Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier Speech Communication, vol. 148, 2023. https://www-sciencedirect-com.ezproxy.universite-paris-saclay.fr/science/article/pii/S0167639323000304