1 code implementation • 22 Mar 2024 • Nicholas Lee, Thanakul Wattanawong, Sehoon Kim, Karttikeya Mangalam, Sheng Shen, Gopala Anumanchipalli, Michael W. Mahoney, Kurt Keutzer, Amir Gholami
To address this, we propose LLM2LLM, a targeted and iterative data augmentation strategy that uses a teacher LLM to enhance a small seed dataset by augmenting additional data that can be used for fine-tuning on a specific task.
1 code implementation • 4 Mar 2024 • Ritwik Gupta, Shufan Li, Tyler Zhu, Jitendra Malik, Trevor Darrell, Karttikeya Mangalam
There are many downstream applications in which global context matters as much as high frequency details, such as in real-world satellite imagery; in such cases researchers have to make the uncomfortable choice of which information to discard.
1 code implementation • CVPR 2024 • Mayug Maniparambil, Raiymbek Akshulakov, Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Mohamed El Amine Seddik, Karttikeya Mangalam, Noel E. O'Connor
In the absence of statistical similarity in aligned encoders like CLIP, we show that a possible matching of unaligned encoders exists without any training.
1 code implementation • 8 Jan 2024 • Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem
We use two coefficients on either type of residual connections respectively, and introduce a dynamic training strategy that seamlessly transitions the pretrained model to a reversible network with much higher numerical precision.
no code implementations • CVPR 2024 • Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem
We use two coefficients on either type of residual connections respectively and introduce a dynamic training strategy that seamlessly transitions the pretrained model to a reversible network with much higher numerical precision.
no code implementations • 11 Dec 2023 • Neerja Thakkar, Karttikeya Mangalam, Andrea Bajcsy, Jitendra Malik
We formalize the problem of scene-specific adaptive trajectory prediction and propose a new adaptation approach inspired by prompt tuning called latent corridors.
1 code implementation • CVPR 2024 • Yutong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar, Alan Yuille, Trevor Darrell, Jitendra Malik, Alexei A Efros
We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data.
1 code implementation • NeurIPS 2023 • Karttikeya Mangalam, Raiymbek Akshulakov, Jitendra Malik
We introduce EgoSchema, a very long-form video question-answering dataset, and benchmark to evaluate long video understanding capabilities of modern vision and language systems.
1 code implementation • 15 Jun 2023 • Tyler Zhu, Karttikeya Mangalam
We present PaReprop, a fast Parallelized Reversible Backpropagation algorithm that parallelizes the additional activation re-computation overhead in reversible training with the gradient computation itself in backpropagation phase.
no code implementations • ICCV 2023 • Chen Wei, Karttikeya Mangalam, Po-Yao Huang, Yanghao Li, Haoqi Fan, Hu Xu, Huiyu Wang, Cihang Xie, Alan Yuille, Christoph Feichtenhofer
There has been a longstanding belief that generation can facilitate a true understanding of visual data.
1 code implementation • NeurIPS 2023 • Sehoon Kim, Karttikeya Mangalam, Suhong Moon, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer
To address this, we propose Big Little Decoder (BiLD), a framework that can improve inference efficiency and latency for a wide range of text generation applications.
4 code implementations • CVPR 2022 • Karttikeya Mangalam, Haoqi Fan, Yanghao Li, Chao-yuan Wu, Bo Xiong, Christoph Feichtenhofer, Jitendra Malik
Reversible Vision Transformers achieve a reduced memory footprint of up to 15. 5x at roughly identical model complexity, parameters and accuracy, demonstrating the promise of reversible vision transformers as an efficient backbone for hardware resource limited training regimes.
no code implementations • CVPR 2023 • Chen Zhao, Shuming Liu, Karttikeya Mangalam, Bernard Ghanem
Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content.
no code implementations • CVPR 2023 • Harshayu Girase, Nakul Agarwal, Chiho Choi, Karttikeya Mangalam
We present RAFTformer, a real-time action forecasting transformer for latency aware real-world action forecasting applications.
no code implementations • 20 Dec 2022 • Boyi Li, Rodolfo Corona, Karttikeya Mangalam, Catherine Chen, Daniel Flaherty, Serge Belongie, Kilian Q. Weinberger, Jitendra Malik, Trevor Darrell, Dan Klein
Are multimodal inputs necessary for grammar induction?
1 code implementation • 25 Nov 2022 • Chen Zhao, Shuming Liu, Karttikeya Mangalam, Bernard Ghanem
Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content.
no code implementations • 15 Jun 2022 • Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson
First, as both images and videos contain structured information, we enrich a transformer model with a set of \emph{object tokens} that can be used across images and videos.
Point- of-no-return (PNR) temporal localization Temporal Localization
no code implementations • 13 Jun 2022 • Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson
We explore a particular instantiation of scene structure, namely a \emph{Hand-Object Graph}, consisting of hands and objects with their locations as nodes, and physical relations of contact/no-contact as edges.
4 code implementations • 2 Jun 2022 • Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer
After re-examining the design choices for both the macro and micro-architecture of Conformer, we propose Squeezeformer which consistently outperforms the state-of-the-art ASR models under the same training schemes.
Ranked #38 on Speech Recognition on LibriSpeech test-clean
Automatic Speech Recognition Automatic Speech Recognition (ASR)
1 code implementation • CVPR 2022 • Chao-yuan Wu, Yanghao Li, Karttikeya Mangalam, Haoqi Fan, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer
Instead of trying to process more frames at once like most existing methods, we propose to process videos in an online fashion and cache "memory" at each iteration.
Ranked #6 on Action Anticipation on EPIC-KITCHENS-100 (using extra training data)
1 code implementation • 29 Dec 2021 • Karttikeya Mangalam, Rohin Garg
Generative Adversarial Networks (GANs) are a class of generative models used for various applications, but they have been known to suffer from the mode collapse problem, in which some modes of the target distribution are ignored by the generator.
8 code implementations • CVPR 2022 • Yanghao Li, Chao-yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer
In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection.
Ranked #1 on Action Classification on Kinetics-600 (GFLOPs metric)
1 code implementation • CVPR 2022 • Roei Herzig, Elad Ben-Avraham, Karttikeya Mangalam, Amir Bar, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson
In this work, we present Object-Region Video Transformers (ORViT), an \emph{object-centric} approach that extends video transformer layers with a block that directly incorporates object representations.
Ranked #7 on Action Recognition on Diving-48
8 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.
no code implementations • ICCV 2021 • Harshayu Girase, Haiming Gang, Srikanth Malla, Jiachen Li, Akira Kanehara, Karttikeya Mangalam, Chiho Choi
We also propose a model that jointly performs trajectory and intention prediction, showing that recurrently reasoning about intention can assist with trajectory prediction.
8 code implementations • ICCV 2021 • Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer
We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.
Ranked #14 on Action Classification on Charades
no code implementations • 1 Jan 2021 • Karttikeya Mangalam, Rohin Garg, Jathushan Rajasegaran, Taesung Park
Generative Adversarial Networks (GANs) are a class of generative models used for various applications, but they have been known to suffer from the mode collapse problem, in which some modes of the target distribution are ignored by the generator.
2 code implementations • ICCV 2021 • Karttikeya Mangalam, Yang An, Harshayu Girase, Jitendra Malik
Uncertainty in future trajectories stems from two sources: (a) sources that are known to the agent but unknown to the model, such as long term goals and (b)sources that are unknown to both the agent & the model, such as intent of other agents & irreducible randomness indecisions.
Ranked #2 on Trajectory Prediction on ETH/UCY
1 code implementation • ECCV 2020 • Zhe Cao, Hang Gao, Karttikeya Mangalam, Qi-Zhi Cai, Minh Vo, Jitendra Malik
Human movement is goal-directed and influenced by the spatial layout of the objects in the scene.
4 code implementations • ECCV 2020 • Karttikeya Mangalam, Harshayu Girase, Shreyas Agarwal, Kuan-Hui Lee, Ehsan Adeli, Jitendra Malik, Adrien Gaidon
In this work, we present Predicted Endpoint Conditioned Network (PECNet) for flexible human trajectory prediction.
Ranked #1 on Multi Future Trajectory Prediction on ETH/UCY
no code implementations • 4 Nov 2019 • Karttikeya Mangalam, Ehsan Adeli, Kuan-Hui Lee, Adrien Gaidon, Juan Carlos Niebles
In contrast to the previous work that aims to solve either the task of pose prediction or trajectory forecasting in isolation, we propose a framework to unify the two problems and address the practically useful task of pedestrian locomotion prediction in the wild.
1 code implementation • ICML Workshop Deep_Phenomen 2019 • Karttikeya Mangalam, Vinay Uday Prabhu
In this paper, we empirically investigate the training journey of deep neural networks relative to fully trained shallow machine learning models.
no code implementations • 1 Dec 2018 • Karttikeya Mangalam, Mathieu Salzamann
We study the use of knowledge distillation to compress the U-net architecture.
no code implementations • 12 Dec 2017 • Karttikeya Mangalam, Tanaya Guha
We investigate the effect and usefulness of spontaneity (i. e. whether a given speech is spontaneous or not) in speech in the context of emotion recognition.
1 code implementation • CVPR 2018 • Takuma Yagi, Karttikeya Mangalam, Ryo Yonetani, Yoichi Sato
We present a new task that predicts future locations of people observed in first-person videos.
no code implementations • 19 May 2017 • Karttikeya Mangalam, K. S. Venkatesh
The results indicate several interesting invariances in the application of the CA, such as the particular noise realization and the choice of sub-sampling of pixels to determine recombination weights.