A multimodal dynamical variational autoencoder for audiovisual speech representation learning
Published in Neural Networks (Elsevier), 2024
We present a multimodal and dynamical VAE (MDVAE) applied to unsupervised audio-visual speech representation learning.
Recommended citation: Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier. A multimodal dynamical variational autoencoder for audiovisual speech representation learning. Neural Networks (Elsevier), 2024 https://www.sciencedirect.com/science/article/pii/S0893608024000340