no code implementations • 21 Dec 2023 • Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David A. Ross, Bryan Seybold, Lu Jiang
We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals.
Ranked #3 on Text-to-Video Generation on MSR-VTT
no code implementations • 11 Dec 2023 • Agrim Gupta, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, José Lezama
We present W. A. L. T, a transformer-based approach for photorealistic video generation via diffusion modeling.
Ranked #1 on Video Prediction on Kinetics-600 12 frames, 64x64
no code implementations • 9 Oct 2023 • Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang
While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation.
Ranked #2 on Video Prediction on Kinetics-600 12 frames, 64x64
1 code implementation • 5 Aug 2023 • Guillermo Carbajal, Patricia Vitoria, José Lezama, Pablo Musé
Then, a second network trained jointly with the first one, unrolls a non-blind deconvolution method using the motion kernel field estimated by the first network.
no code implementations • 27 Dec 2022 • Bruno Galerne, Lara Raad, José Lezama, Jean-Michel Morel
Neural style transfer is a deep learning technique that produces an unprecedentedly rich style transfer from a style image to a content image and is particularly impressive when it comes to transferring style from a painting to an image.
1 code implementation • CVPR 2023 • Lijun Yu, Yong Cheng, Kihyuk Sohn, José Lezama, Han Zhang, Huiwen Chang, Alexander G. Hauptmann, Ming-Hsuan Yang, Yuan Hao, Irfan Essa, Lu Jiang
We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model.
Ranked #1 on Video Prediction on Something-Something V2
1 code implementation • CVPR 2023 • Kihyuk Sohn, Yuan Hao, José Lezama, Luisa Polania, Huiwen Chang, Han Zhang, Irfan Essa, Lu Jiang
We base our framework on state-of-the-art generative vision transformers that represent an image as a sequence of visual tokens to the autoregressive or non-autoregressive transformers.
1 code implementation • 26 Sep 2022 • Guillermo Carbajal, Patricia Vitoria, Pablo Musé, José Lezama
Successful training of end-to-end deep networks for real motion deblurring requires datasets of sharp/blurred image pairs that are realistic and diverse enough to achieve generalization to real blurred images.
1 code implementation • 9 Sep 2022 • José Lezama, Huiwen Chang, Lu Jiang, Irfan Essa
Given a masked-and-reconstructed real image, the Token-Critic model is trained to distinguish which visual tokens belong to the original image and which were sampled by the generative transformer.
1 code implementation • 1 Feb 2021 • Guillermo Carbajal, Patricia Vitoria, Mauricio Delbracio, Pablo Musé, José Lezama
In recent years, the removal of motion blur in photographs has seen impressive progress in the hands of deep learning-based methods, trained to map directly from blurry to sharp images.
no code implementations • ICLR 2019 • Igor M. Quintanilha, Roberto de M. E. Filho, José Lezama, Mauricio Delbracio, Leonardo O. Nunes
The ability to detect when an input sample was not drawn from the training distribution is an important desirable property of deep neural networks.
1 code implementation • ICLR 2019 • José Lezama
A major challenge in learning image representations is the disentangling of the factors of variation underlying the image formation.
no code implementations • 25 May 2018 • José Lezama, Samy Blusseau, Jean-Michel Morel, Gregory Randall, Rafael Grompone von Gioi
Using a computational quantitative version of the non-accidentalness principle, we raise the possibility that the psychophysical and the (older) gestaltist setups, both applicable on dot or Gabor patterns, find a useful complement in a Turing test.
1 code implementation • 5 Dec 2017 • José Lezama, Qiang Qiu, Pablo Musé, Guillermo Sapiro
Deep neural networks trained using a softmax layer at the top and the cross-entropy loss are ubiquitous tools for image classification.