9 papers with code • 0 benchmarks • 0 datasets
Generation of gestures, as a sequence of 3d poses
In this paper, we present an automatic gesture generation model that uses the multimodal context of speech text, audio, and speaker identity to reliably generate gestures.
We provide an analysis of different representations for the input (speech) and the output (motion) of the network by both objective and subjective evaluations.
During speech, people spontaneously gesticulate, which plays a key role in conveying information.
We evaluate different representation sizes in order to find the most effective dimensionality for the representation.
Gesture Generation Human-Computer Interaction I.2.6; I.5.1; J.4
To date, recent end-to-end gesture generation methods have not been evaluated in a real-time interaction with users.
We find that DeepNAG outperforms DeepGAN in accuracy, training time (up to 17x faster), and realism, thereby opening the door to a new line of research in generator network design and training for gesture synthesis.
We study relationships between spoken language and co-speech gestures in context of two key challenges.
A key challenge, called gesture style transfer, is to learn a model that generates these gestures for a speaking agent 'A' in the gesturing style of a target speaker 'B'.