no code implementations • 4 Sep 2024 • Vighnesh Birodkar, Gabriel Barcik, James Lyon, Sergey Ioffe, David Minnen, Joshua V. Dillon
Our work combines autoencoder representation learning with diffusion and is, to our knowledge, the first to demonstrate the efficacy of jointly learning a continuous encoder and decoder under a diffusion-based loss.
no code implementations • 21 Dec 2023 • Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David A. Ross, Bryan Seybold, Lu Jiang
We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals.
Ranked #4 on Text-to-Video Generation on MSR-VTT
2 code implementations • 9 Oct 2023 • Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang
While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation.
Ranked #2 on Video Prediction on Kinetics-600 12 frames, 64x64
3 code implementations • 27 Sep 2023 • Fabian Mentzer, David Minnen, Eirikur Agustsson, Michael Tschannen
Each dimension is quantized to a small set of fixed values, leading to an (implicit) codebook given by the product of these sets.
no code implementations • 26 Sep 2023 • David Minnen, Nick Johnston
The rate-distortion performance of neural image compression models has exceeded the state-of-the-art for non-learned codecs, but neural codecs are still far from widespread deployment and adoption.
1 code implementation • CVPR 2023 • Eirikur Agustsson, David Minnen, George Toderici, Fabian Mentzer
By optimizing the rate-distortion-realism trade-off, generative compression approaches produce detailed, realistic images, even at low bit rates, instead of the blurry reconstructions produced by rate-distortion optimized models.
1 code implementation • 15 Jun 2022 • Fabian Mentzer, George Toderici, David Minnen, Sung-Jin Hwang, Sergi Caelles, Mario Lucic, Eirikur Agustsson
The resulting video compression transformer outperforms previous methods on standard video compression data sets.
no code implementations • 26 Jul 2021 • Fabian Mentzer, Eirikur Agustsson, Johannes Ballé, David Minnen, Nick Johnston, George Toderici
Our approach significantly outperforms previous neural and non-neural video compression methods in a user study, setting a new state-of-the-art in visual quality for neural methods.
3 code implementations • 17 Jul 2020 • David Minnen, Saurabh Singh
In learning-based approaches to image compression, codecs are developed by optimizing a computational model to minimize a rate-distortion objective.
no code implementations • ICLR 2019 • Johannes Ballé, Nick Johnston, David Minnen
We consider the problem of using variational latent-variable models for data compression.
3 code implementations • NeurIPS 2018 • David Minnen, Johannes Ballé, George Toderici
While it is well known that autoregressive models come with a significant computational penalty, we find that in terms of compression performance, autoregressive and hierarchical priors are complementary and, together, exploit the probabilistic structure in the latents better than all previous learned models.
no code implementations • 1 Aug 2018 • Troy Chinen, Johannes Ballé, Chunhui Gu, Sung Jin Hwang, Sergey Ioffe, Nick Johnston, Thomas Leung, David Minnen, Sean O'Malley, Charles Rosenberg, George Toderici
We present a full reference, perceptual image metric based on VGG-16, an artificial neural network trained on object classification.
no code implementations • 31 May 2018 • David Minnen, George Toderici, Saurabh Singh, Sung Jin Hwang, Michele Covell
The leading approach for image compression with artificial neural networks (ANNs) is to learn a nonlinear transform and a fixed entropy model that are optimized for rate-distortion performance.
no code implementations • 7 Feb 2018 • David Minnen, George Toderici, Michele Covell, Troy Chinen, Nick Johnston, Joel Shor, Sung Jin Hwang, Damien Vincent, Saurabh Singh
Deep neural networks represent a powerful class of function approximators that can learn to compress and reconstruct images.
15 code implementations • ICLR 2018 • Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, Nick Johnston
We describe an end-to-end trainable model for image compression based on variational autoencoders.
no code implementations • 18 May 2017 • Michele Covell, Nick Johnston, David Minnen, Sung Jin Hwang, Joel Shor, Saurabh Singh, Damien Vincent, George Toderici
Our methods introduce a multi-pass training method to combine the training goals of high-quality reconstructions in areas around stop-code masking as well as in highly-detailed areas.
no code implementations • CVPR 2018 • Nick Johnston, Damien Vincent, David Minnen, Michele Covell, Saurabh Singh, Troy Chinen, Sung Jin Hwang, Joel Shor, George Toderici
We propose a method for lossy image compression based on recurrent, convolutional neural networks that outperforms BPG (4:2:0 ), WebP, JPEG2000, and JPEG as measured by MS-SSIM.
7 code implementations • CVPR 2017 • George Toderici, Damien Vincent, Nick Johnston, Sung Jin Hwang, David Minnen, Joel Shor, Michele Covell
As far as we know, this is the first neural network architecture that is able to outperform JPEG at image compression across most bitrates on the rate-distortion curve on the Kodak dataset images, with and without the aid of entropy coding.
1 code implementation • 19 Nov 2015 • George Toderici, Sean M. O'Malley, Sung Jin Hwang, Damien Vincent, David Minnen, Shumeet Baluja, Michele Covell, Rahul Sukthankar
A large fraction of Internet traffic is now driven by requests from mobile devices with relatively small screens and often stringent bandwidth requirements.