no code implementations • 14 Apr 2025 • Yushu Wu, Yanyu Li, Ivan Skorokhodov, Anil Kag, Willi Menapace, Sharath Girish, Aliaksandr Siarohin, Yanzhi Wang, Sergey Tulyakov
Our AE achieves an ultra-high compression ratio and real-time decoding speed on mobile while outperforming prior art in terms of reconstruction metrics by a large margin.
no code implementations • 5 Feb 2025 • Yunuo Chen, Junli Cao, Anil Kag, Vidit Goel, Sergei Korolev, Chenfanfu Jiang, Sergey Tulyakov, Jian Ren
Furthermore, our model improves the overall quality of video generation by promoting the 3D consistency of moving objects and reducing abrupt changes in shape and motion.
no code implementations • 13 Dec 2024 • Yushu Wu, Zhixing Zhang, Yanyu Li, Yanwu Xu, Anil Kag, Yang Sui, Huseyin Coskun, Ke Ma, Aleksei Lebedev, Ju Hu, Dimitris Metaxas, Yanzhi Wang, Sergey Tulyakov, Jian Ren
We have witnessed the unprecedented success of diffusion-based video generation over the past year.
no code implementations • 12 Dec 2024 • Dongting Hu, Jierun Chen, Xijie Huang, Huseyin Coskun, Arpit Sahni, Aarush Gupta, Anujraaj Goyal, Dishani Lahiri, Rajesh Singh, Yerlan Idelbayev, Junli Cao, Yanyu Li, Kwang-Ting Cheng, S. -H. Gary Chan, Mingming Gong, Sergey Tulyakov, Anil Kag, Yanwu Xu, Jian Ren
For the first time, our model SnapGen, demonstrates the generation of 1024x1024 px images on a mobile device around 1. 4 seconds.
Ranked #11 on
Text-to-Image Generation
on GenEval
no code implementations • 7 Nov 2024 • Anil Kag, Huseyin Coskun, Jierun Chen, Junli Cao, Willi Menapace, Aliaksandr Siarohin, Sergey Tulyakov, Jian Ren
Neural network architecture design requires making many crucial decisions.
no code implementations • 23 Oct 2024 • Shyamgopal Karthik, Huseyin Coskun, Zeynep Akata, Sergey Tulyakov, Jian Ren, Anil Kag
In this work, we investigate a scalable approach for collecting large-scale and fully synthetic datasets for DPO training.
1 code implementation • 27 Jun 2024 • Junli Cao, Vidit Goel, Chaoyang Wang, Anil Kag, Ju Hu, Sergei Korolev, Chenfanfu Jiang, Sergey Tulyakov, Jian Ren
Our key observation is that nearby points in the scene can share similar representations.
1 code implementation • 6 Jun 2024 • Yang Sui, Yanyu Li, Anil Kag, Yerlan Idelbayev, Junli Cao, Ju Hu, Dhritiman Sagar, Bo Yuan, Sergey Tulyakov, Jian Ren
Diffusion-based image generation models have achieved great success in recent years by showing the capability of synthesizing high-quality content.
1 code implementation • 6 Jun 2024 • Zhixing Zhang, Yanyu Li, Yushu Wu, Yanwu Xu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Dimitris Metaxas, Sergey Tulyakov, Jian Ren
Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process.
no code implementations • CVPR 2024 • Yanyu Li, Xian Liu, Anil Kag, Ju Hu, Yerlan Idelbayev, Dhritiman Sagar, Yanzhi Wang, Sergey Tulyakov, Jian Ren
Our findings reveal that, instead of replacing the CLIP text encoder used in Stable Diffusion with other large language models, we can enhance it through our proposed fine-tuning approach, TextCraftor, leading to substantial improvements in quantitative benchmarks and human assessments.
no code implementations • CVPR 2024 • Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Ekaterina Deyneka, Tsai-Shien Chen, Anil Kag, Yuwei Fang, Aleksei Stoliar, Elisa Ricci, Jian Ren, Sergey Tulyakov
Since video content is highly redundant, we argue that naively bringing advances of image models to the video generation domain reduces motion fidelity, visual quality and impairs scalability.
Ranked #1 on
Text-to-Video Generation
on MSR-VTT
1 code implementation • International Conference on Learning Representations 2023 • Anil Kag, Igor Fedorov, Aditya Gangrade, Paul Whatmough, Venkatesh Saligrama
Training a hybrid learner is difficult since we lack annotations of hard edge-examples.
1 code implementation • International Conference on Learning Representations 2023 • Anil Kag, Durmus Alp Emre Acar, Aditya Gangrade, Venkatesh Saligrama
We propose a novel knowledge distillation (KD) method to selectively instill teacher knowledge into a student model motivated by situations where the student's capacity is significantly smaller than that of the teachers.
1 code implementation • CVPR 2022 • Anil Kag, Venkatesh Saligrama
Convolutional neural networks (CNNs) rely on the depth of the architecture to obtain complex features.
1 code implementation • NeurIPS 2021 • Aditya Gangrade, Anil Kag, Ashok Cutkosky, Venkatesh Saligrama
For example, this may model an adaptive decision to invoke more resources on this instance.
1 code implementation • 29 Sep 2021 • Anil Kag, Igor Fedorov, Aditya Gangrade, Paul Whatmough, Venkatesh Saligrama
The first network is a low-capacity network that can be deployed on an edge device, whereas the second is a high-capacity network deployed in the cloud.
1 code implementation • International Conference on Machine Learning 2021 • Anil Kag, Venkatesh Saligrama
BPTT updates RNN parameters on an instance by back-propagating the error in time over the entire sequence length, and as a result, leads to poor trainability due to the well-known gradient explosion/decay phenomena.
1 code implementation • CVPR 2021 • Anil Kag, Venkatesh Saligrama
We propose a learning method that, dynamically modifies the time-constants of the continuous-time counterpart of a vanilla RNN.
1 code implementation • 15 Oct 2020 • Aditya Gangrade, Anil Kag, Venkatesh Saligrama
We propose a novel method for selective classification (SC), a problem which allows a classifier to abstain from predicting some instances, thus trading off accuracy against coverage (the fraction of instances predicted).
1 code implementation • ICLR 2020 • Anil Kag, Ziming Zhang, Venkatesh Saligrama
Recurrent neural networks (RNNs) are particularly well-suited for modeling long-term dependencies in sequential data, but are notoriously hard to train because the error backpropagated in time either vanishes or explodes at an exponential rate.
no code implementations • 22 Aug 2019 • Anil Kag, Ziming Zhang, Venkatesh Saligrama
Recurrent neural networks (RNNs) are particularly well-suited for modeling long-term dependencies in sequential data, but are notoriously hard to train because the error backpropagated in time either vanishes or explodes at an exponential rate.
no code implementations • 2 Mar 2019 • Ziming Zhang, Anil Kag, Alan Sullivan, Venkatesh Saligrama
We show that such self-feedback helps stabilize the hidden state transitions leading to fast convergence during training while efficiently learning discriminative latent features that result in state-of-the-art results on several benchmark datasets at test-time.
no code implementations • NIPS Workshop CDNNRIA 2018 • Sivaramakrishnan Sankarapandian, Anil Kag, Rachel Manzelli, Brian Kulis
We describe a training strategy that grows the number of units during training, and show on several benchmark datasets that our model yields architectures that are smaller than those obtained when tuning the number of hidden units on a standard fixed architecture.