1 code implementation • 17 Jun 2024 • Jefferson Hernandez, Ruben Villegas, Vicente Ordonez
Using this instruction set and the existing LLaVA-Finetune instruction set for visual understanding tasks, we produce GenLLaVA, a Generative Large Language and Visual Assistant.
no code implementations • 22 May 2024 • Divya Kothandaraman, Kihyuk Sohn, Ruben Villegas, Paul Voigtlaender, Dinesh Manocha, Mohammad Babaeizadeh
We present a method for multi-concept customization of pretrained text-to-video (T2V) models.
1 code implementation • NeurIPS 2023 • Emanuele Bugliarello, Hernan Moraldo, Ruben Villegas, Mohammad Babaeizadeh, Mohammad Taghi Saffar, Han Zhang, Dumitru Erhan, Vittorio Ferrari, Pieter-Jan Kindermans, Paul Voigtlaender
To fill this gap, we collect comprehensive human annotations on three existing datasets, and introduce StoryBench: a new, challenging multi-task benchmark to reliably evaluate forthcoming text-to-video models.
2 code implementations • 21 Mar 2023 • Jefferson Hernandez, Ruben Villegas, Vicente Ordonez
We show that visual representations learned under ViC-MAE generalize well to both video and image classification tasks.
Ranked #5 on
Image Classification
on Places365
(using extra training data)
2 code implementations • 5 Oct 2022 • Ruben Villegas, Mohammad Babaeizadeh, Pieter-Jan Kindermans, Hernan Moraldo, Han Zhang, Mohammad Taghi Saffar, Santiago Castro, Julius Kunze, Dumitru Erhan
To the best of our knowledge, this is the first time a paper studies generating videos from time variable prompts.
Ranked #4 on
Video Prediction
on BAIR Robot Pushing
no code implementations • 14 May 2022 • Yunseok Jang, Ruben Villegas, Jimei Yang, Duygu Ceylan, Xin Sun, Honglak Lee
We test the effectiveness of our representation on the human image harmonization task by predicting shading that is coherent with a given background image.
no code implementations • ICCV 2021 • Ruben Villegas, Duygu Ceylan, Aaron Hertzmann, Jimei Yang, Jun Saito
Self-contacts, such as when hands touch each other or the torso or the head, are important attributes of human body language and dynamics, yet existing methods do not model or preserve these contacts.
no code implementations • ICCV 2021 • Mohamed Hassan, Duygu Ceylan, Ruben Villegas, Jun Saito, Jimei Yang, Yi Zhou, Michael Black
A long-standing goal in computer vision is to capture, model, and realistically synthesize human behavior.
no code implementations • 15 Jul 2021 • Manuel Lagunas, Xin Sun, Jimei Yang, Ruben Villegas, Jianming Zhang, Zhixin Shu, Belen Masia, Diego Gutierrez
We present a single-image data-driven method to automatically relight images with full-body humans in them.
no code implementations • 7 Jun 2021 • Jiaman Li, Ruben Villegas, Duygu Ceylan, Jimei Yang, Zhengfei Kuang, Hao Li, Yajie Zhao
We demonstrate the effectiveness of our hierarchical motion variational autoencoder in a variety of tasks including video-based human pose estimation, motion completion from partial observations, and motion synthesis from sparse key-frames.
Ranked #4 on
Motion Synthesis
on LaFAN1
1 code implementation • ECCV 2020 • Davis Rempe, Leonidas J. Guibas, Aaron Hertzmann, Bryan Russell, Ruben Villegas, Jimei Yang
Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors that violate physical constraints, such as feet penetrating the ground and bodies leaning at extreme angles.
no code implementations • NeurIPS 2019 • Ruben Villegas, Arkanath Pathak, Harini Kannan, Dumitru Erhan, Quoc V. Le, Honglak Lee
Predicting future video frames is extremely challenging, as there are many factors of variation that make up the dynamics of how frames change through time.
1 code implementation • NeurIPS 2019 • Matthias Minderer, Chen Sun, Ruben Villegas, Forrester Cole, Kevin Murphy, Honglak Lee
Extracting and predicting object structure and dynamics from videos without supervision is a major challenge in machine learning.
Ranked #11 on
Video Prediction
on KTH
9 code implementations • 12 Nov 2018 • Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson
Planning has been very successful for control tasks with known environment dynamics.
Ranked #2 on
Continuous Control
on DeepMind Walker Walk (Images)
1 code implementation • ECCV 2018 • Xinchen Yan, Akash Rastogi, Ruben Villegas, Kalyan Sunkavalli, Eli Shechtman, Sunil Hadap, Ersin Yumer, Honglak Lee
Our model jointly learns a feature embedding for motion modes (that the motion sequence can be reconstructed from) and a feature transformation that represents the transition of one motion mode to the next motion mode.
Ranked #8 on
Human Pose Forecasting
on Human3.6M
(ADE metric)
no code implementations • ICML 2018 • Nevan Wichers, Ruben Villegas, Dumitru Erhan, Honglak Lee
Much of recent research has been devoted to video prediction and generation, yet most of the previous works have demonstrated only limited success in generating videos on short-term horizons.
1 code implementation • CVPR 2018 • Ruben Villegas, Jimei Yang, Duygu Ceylan, Honglak Lee
We propose a recurrent neural network architecture with a Forward Kinematics layer and cycle consistency based adversarial training objective for unsupervised motion retargetting.
1 code implementation • 25 Jun 2017 • Ruben Villegas, Jimei Yang, Seunghoon Hong, Xunyu Lin, Honglak Lee
To the best of our knowledge, this is the first end-to-end trainable network architecture with motion and content separation to model the spatiotemporal dynamics for pixel-level future prediction in natural videos.
Ranked #1 on
Video Prediction
on KTH
(Cond metric)
2 code implementations • ICML 2017 • Ruben Villegas, Jimei Yang, Yuliang Zou, Sungryull Sohn, Xunyu Lin, Honglak Lee
To avoid inherent compounding errors in recursive pixel-level prediction, we propose to first estimate high-level structure in the input frames, then predict how that structure evolves in the future, and finally by observing a single frame from the past and the predicted high-level structure, we construct the future frames without having to observe any of the pixel-level predictions.
no code implementations • CVPR 2015 • Yuting Zhang, Kihyuk Sohn, Ruben Villegas, Gang Pan, Honglak Lee
Object detection systems based on the deep convolutional neural network (CNN) have recently made ground- breaking advances on several object detection benchmarks.
no code implementations • CVPR 2014 • Afshin Dehghan, Enrique. G. Ortiz, Ruben Villegas, Mubarak Shah
Recent years have seen a major push for face recognition technology due to the large expansion of image sharing on social networks.
Ranked #4 on
Kinship Verification
on KinFaceW-II