no code implementations • • Cesar Ilharco, Afsaneh Shirazi, Arjun Gopalan, Arsha Nagrani, Blaz Bratanic, Chris Bregler, Christina Funk, Felipe Ferreira, Gabriel Barcik, Gabriel Ilharco, Georg Osang, Jannis Bulian, Jared Frank, Lucas Smaira, Qin Cao, Ricardo Marino, Roma Patel, Thomas Leung, Vaiva Imbrasaite
How information is created, shared and consumed has changed rapidly in recent decades, in part thanks to new social platforms and technologies on the web.
In this paper, we present a video-based learning framework for animating personalized 3D talking faces from audio.
We propose a self-supervised training strategy where we only need a set of captioned images.
In this work we propose a model that can manipulate individual visual attributes of objects in a real scene using examples of how respective attribute manipulations affect the output of a simulation.
Trained on COCO data alone, our final system achieves average precision of 0. 649 on the COCO test-dev set and the 0. 643 test-standard sets, outperforming the winner of the 2016 COCO keypoints challenge and other recent state-of-art.
Ranked #6 on Keypoint Detection on COCO test-challenge
Our approach unites a discriminative image-based joint detection method with a model-based generative motion tracking algorithm through a combined pose optimization energy.