Search Results for author: Sai Saketh Rambhatla

Found 15 papers, 3 papers with code

Diffusion Autoencoders are Scalable Image Tokenizers

no code implementations30 Jan 2025 Yinbo Chen, Rohit Girdhar, Xiaolong Wang, Sai Saketh Rambhatla, Ishan Misra

Our key insight is that a single learning objective, diffusion L2 loss, can be used for training scalable image tokenizers.

Image Generation Image Reconstruction

Movie Gen: A Cast of Media Foundation Models

2 code implementations17 Oct 2024 Adam Polyak, Amit Zohar, Andrew Brown, Andros Tjandra, Animesh Sinha, Ann Lee, Apoorv Vyas, Bowen Shi, Chih-Yao Ma, Ching-Yao Chuang, David Yan, Dhruv Choudhary, Dingkang Wang, Geet Sethi, Guan Pang, Haoyu Ma, Ishan Misra, Ji Hou, Jialiang Wang, Kiran Jagadeesh, Kunpeng Li, Luxin Zhang, Mannat Singh, Mary Williamson, Matt Le, Matthew Yu, Mitesh Kumar Singh, Peizhao Zhang, Peter Vajda, Quentin Duval, Rohit Girdhar, Roshan Sumbaly, Sai Saketh Rambhatla, Sam Tsai, Samaneh Azadi, Samyak Datta, Sanyuan Chen, Sean Bell, Sharadh Ramaswamy, Shelly Sheynin, Siddharth Bhattacharya, Simran Motwani, Tao Xu, Tianhe Li, Tingbo Hou, Wei-Ning Hsu, Xi Yin, Xiaoliang Dai, Yaniv Taigman, Yaqiao Luo, Yen-Cheng Liu, Yi-Chiao Wu, Yue Zhao, Yuval Kirstain, Zecheng He, Zijian He, Albert Pumarola, Ali Thabet, Artsiom Sanakoyeu, Arun Mallya, Baishan Guo, Boris Araya, Breena Kerr, Carleigh Wood, Ce Liu, Cen Peng, Dimitry Vengertsev, Edgar Schonfeld, Elliot Blanchard, Felix Juefei-Xu, Fraylie Nord, Jeff Liang, John Hoffman, Jonas Kohler, Kaolin Fire, Karthik Sivakumar, Lawrence Chen, Licheng Yu, Luya Gao, Markos Georgopoulos, Rashel Moritz, Sara K. Sampson, Shikai Li, Simone Parmeggiani, Steve Fine, Tara Fowler, Vladan Petrovic, Yuming Du

Our models set a new state-of-the-art on multiple tasks: text-to-video synthesis, video personalization, video editing, video-to-audio generation, and text-to-audio generation.

Audio Generation Video Editing +1

Trajectory-aligned Space-time Tokens for Few-shot Action Recognition

no code implementations25 Jul 2024 Pulkit Kumar, Namitha Padmanabhan, Luke Luo, Sai Saketh Rambhatla, Abhinav Shrivastava

We propose a simple yet effective approach for few-shot action recognition, emphasizing the disentanglement of motion and appearance representations.

Disentanglement Few-Shot action recognition +1

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning

no code implementations17 Nov 2023 Rohit Girdhar, Mannat Singh, Andrew Brown, Quentin Duval, Samaneh Azadi, Sai Saketh Rambhatla, Akbar Shah, Xi Yin, Devi Parikh, Ishan Misra

We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image.

Text-to-Video Generation Video Generation

SelfEval: Leveraging the discriminative nature of generative models for evaluation

no code implementations17 Nov 2023 Sai Saketh Rambhatla, Ishan Misra

We present an automated way to evaluate the text alignment of text-to-image generative diffusion models using standard image-text recognition datasets.

Attribute Visual Reasoning

MOST: Multiple Object localization with Self-supervised Transformers for object discovery

no code implementations ICCV 2023 Sai Saketh Rambhatla, Ishan Misra, Rama Chellappa, Abhinav Shrivastava

In this work, we present Multiple Object localization with Self-supervised Transformers (MOST) that uses features of transformers trained using self-supervised learning to localize multiple objects in real world images.

Object object-detection +6

Self-Denoising Neural Networks for Few Shot Learning

no code implementations26 Oct 2021 Steven Schwarcz, Sai Saketh Rambhatla, Rama Chellappa

This architecture, which we call a Self-Denoising Neural Network (SDNN), can be applied easily to most modern convolutional neural architectures, and can be used as a supplement to many existing few-shot learning techniques.

Action Detection Denoising +1

To Boost or not to Boost: On the Limits of Boosted Neural Networks

no code implementations28 Jul 2021 Sai Saketh Rambhatla, Michael Jones, Rama Chellappa

Boosting is a method for finding a highly accurate hypothesis by linearly combining many ``weak" hypotheses, each of which may be only moderately accurate.

Object Recognition

The Pursuit of Knowledge: Discovering and Localizing Novel Categories using Dual Memory

no code implementations ICCV 2021 Sai Saketh Rambhatla, Rama Chellappa, Abhinav Shrivastava

We tackle object category discovery, which is the problem of discovering and localizing novel objects in a large unlabeled dataset.

Object

Spatial Priming for Detecting Human-Object Interactions

no code implementations9 Apr 2020 Ankan Bansal, Sai Saketh Rambhatla, Abhinav Shrivastava, Rama Chellappa

The proposed method consists of a layout module which primes a visual module to predict the type of interaction between a human and an object.

Human-Object Interaction Detection Object

Detecting Human-Object Interactions via Functional Generalization

no code implementations5 Apr 2019 Ankan Bansal, Sai Saketh Rambhatla, Abhinav Shrivastava, Rama Chellappa

We present an approach for detecting human-object interactions (HOIs) in images, based on the idea that humans interact with functionally similar objects in a similar manner.

Human-Object Interaction Detection Object

Cannot find the paper you are looking for? You can Submit a new open access paper.