Do you notice a mistake?
NaN:NaN
00:00
Speech production is a complex motor process involving several physiological phenomena, such as the neural, nervous and muscular activities that drive our respiratory, laryngeal and articulatory movements. Modeling speech production, in particular the relationship between articulatory gestures (tongue, lips, jaw, velum) and acoustic realizations of speech, is a challenging, and still evolving, research question. From an applicative point of view, such models could be embedded into assistive devices able to restore oral communication when part of the speech production chain is damaged (articulatory synthesis). They could also help rehabilitate speech sound disorders using a therapy based on biofeedback (and articulatory inversion). From a more fundamental research perspective, such models can also be used to question the cognitive mechanisms underlying speech perception and motor control. In this talk, I will present different studies conducted in our group, aiming at learning acoustic-articulatory models from real-world data, using (deep, but not only) machine learning. First, I will focus on different attempts to adapt a direct or inverse model, pre-trained on a reference speaker, to any new speaker. Then, I will present a recent work on the integration of articulatory priors into the latent space of a variational auto-encoder, for potential application to speech enhancement. Finally, I will describe a recent line of research aiming at studying, through modeling and simulation, how a child learns the acoustic-to-articulatory inverse mapping in a self-supervised manner when repeating auditory-only speech stimuli.
Depuis plus d’un demi-siècle, la théorie source-filtre reste au cœur de la modélisation, de l’analyse et de la synthèse de la voix humaine et de ses expressions, comme la parole et le chant. Dans cette présentation, nous reviendrons sur cet
October 20, 2022 01 h 05 min
For some years, the state-of-the-art in speech synthesis and processing has been dominated by data-driven methods and deep neural networks. The use of ever larger amounts of data allows the exploitation of ever more parameters, leading to e
October 20, 2022 56 min
October 20, 2022 26 min
L’exposé porte sur la prédiction de la forme géométrique du conduit vocal à partir d’une suite de phonèmes. Il commencera présenter les différentes approches qui ont été utilisées par le passé, en particulier celles qui reposent sur l’util
October 20, 2022 01 h 09 min
Do you notice a mistake?