Search Results for author: Karttikeya Mangalam

Found 35 papers, 21 papers with code

LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

1 code implementation • 22 Mar 2024 • Nicholas Lee, Thanakul Wattanawong, Sehoon Kim, Karttikeya Mangalam, Sheng Shen, Gopala Anumanchipali, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

LLM2LLM (1) fine-tunes a baseline student LLM on the initial seed data, (2) evaluates and extracts data points that the model gets wrong, and (3) uses a teacher LLM to generate synthetic data based on these incorrect data points, which are then added back into the training data.

Data Augmentation GSM8K +1

Paper
Code

xT: Nested Tokenization for Larger Context in Large Images

1 code implementation • 4 Mar 2024 • Ritwik Gupta, Shufan Li, Tyler Zhu, Jitendra Malik, Trevor Darrell, Karttikeya Mangalam

Modern computer vision pipelines handle large images in one of two sub-optimal ways: down-sampling or cropping.

Paper
Code

Do Vision and Language Encoders Represent the World Similarly?

1 code implementation • 10 Jan 2024 • Mayug Maniparambil, Raiymbek Akshulakov, Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Mohamed El Amine Seddik, Karttikeya Mangalam, Noel E. O'Connor

In the absence of statistical similarity in aligned encoders like CLIP, we show that a possible matching of unaligned encoders exists without any training.

Graph Matching Image Classification +3

Paper
Code

Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning

1 code implementation • 8 Jan 2024 • Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem

We use two coefficients on either type of residual connections respectively, and introduce a dynamic training strategy that seamlessly transitions the pretrained model to a reversible network with much higher numerical precision.

object-detection Small Object Detection +1

Paper
Code

Adaptive Human Trajectory Prediction via Latent Corridors

no code implementations • 11 Dec 2023 • Neerja Thakkar, Karttikeya Mangalam, Andrea Bajcsy, Jitendra Malik

We formalize the problem of scene-specific adaptive trajectory prediction and propose a new adaptation approach inspired by prompt tuning called latent corridors.

Trajectory Prediction Zero-shot Generalization

Paper
Add Code

Sequential Modeling Enables Scalable Learning for Large Vision Models

1 code implementation • 1 Dec 2023 • Yutong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar, Alan Yuille, Trevor Darrell, Jitendra Malik, Alexei A Efros

We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data.

1,586

Paper
Code

EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

1 code implementation • NeurIPS 2023 • Karttikeya Mangalam, Raiymbek Akshulakov, Jitendra Malik

We introduce EgoSchema, a very long-form video question-answering dataset, and benchmark to evaluate long video understanding capabilities of modern vision and language systems.

Multiple-choice Question Answering +2

Paper
Code

PaReprop: Fast Parallelized Reversible Backpropagation

no code implementations • 15 Jun 2023 • Tyler Zhu, Karttikeya Mangalam

We present PaReprop, a fast Parallelized Reversible Backpropagation algorithm that parallelizes the additional activation re-computation overhead in reversible training with the gradient computation itself in backpropagation phase.

Benchmarking

Paper
Add Code

Diffusion Models as Masked Autoencoders

no code implementations • ICCV 2023 • Chen Wei, Karttikeya Mangalam, Po-Yao Huang, Yanghao Li, Haoqi Fan, Hu Xu, Huiyu Wang, Cihang Xie, Alan Yuille, Christoph Feichtenhofer

There has been a longstanding belief that generation can facilitate a true understanding of visual data.

Denoising Image Inpainting

Paper
Add Code

Speculative Decoding with Big Little Decoder

1 code implementation • NeurIPS 2023 • Sehoon Kim, Karttikeya Mangalam, Suhong Moon, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer

To address this, we propose Big Little Decoder (BiLD), a framework that can improve inference efficiency and latency for a wide range of text generation applications.

Machine Translation Text Generation

Paper
Code

Reversible Vision Transformers

4 code implementations • CVPR 2022 • Karttikeya Mangalam, Haoqi Fan, Yanghao Li, Chao-yuan Wu, Bo Xiong, Christoph Feichtenhofer, Jitendra Malik

Reversible Vision Transformers achieve a reduced memory footprint of up to 15. 5x at roughly identical model complexity, parameters and accuracy, demonstrating the promise of reversible vision transformers as an efficient backbone for hardware resource limited training regimes.

Image Classification object-detection +2

6,264

Paper
Code

Latency Matters: Real-Time Action Forecasting Transformer

no code implementations • CVPR 2023 • Harshayu Girase, Nakul Agarwal, Chiho Choi, Karttikeya Mangalam

We present RAFTformer, a real-time action forecasting transformer for latency aware real-world action forecasting applications.

Paper
Add Code

Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization

no code implementations • CVPR 2023 • Chen Zhao, Shuming Liu, Karttikeya Mangalam, Bernard Ghanem

Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content.

Temporal Action Localization

Paper
Add Code

Re-evaluating the Need for Multimodal Signals in Unsupervised Grammar Induction

no code implementations • 20 Dec 2022 • Boyi Li, Rodolfo Corona, Karttikeya Mangalam, Catherine Chen, Daniel Flaherty, Serge Belongie, Kilian Q. Weinberger, Jitendra Malik, Trevor Darrell, Dan Klein

Are multimodal inputs necessary for grammar induction?

Constituency Parsing

Paper
Add Code

Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization

1 code implementation • 25 Nov 2022 • Chen Zhao, Shuming Liu, Karttikeya Mangalam, Bernard Ghanem

Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content.

Temporal Action Localization

Paper
Code

Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022

no code implementations • 15 Jun 2022 • Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson

First, as both images and videos contain structured information, we enrich a transformer model with a set of \emph{object tokens} that can be used across images and videos.

Point- of-no-return (PNR) temporal localization Temporal Localization

Paper
Add Code

Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens

no code implementations • 13 Jun 2022 • Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson

We explore a particular instantiation of scene structure, namely a \emph{Hand-Object Graph}, consisting of hands and objects with their locations as nodes, and physical relations of contact/no-contact as edges.

Action Recognition Video Understanding

Paper
Add Code

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

4 code implementations • 2 Jun 2022 • Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer

After re-examining the design choices for both the macro and micro-architecture of Conformer, we propose Squeezeformer which consistently outperforms the state-of-the-art ASR models under the same training schemes.

Ranked #30 on Speech Recognition on LibriSpeech test-clean

Automatic Speech Recognition Automatic Speech Recognition (ASR)

10,005

Paper
Code

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

1 code implementation • CVPR 2022 • Chao-yuan Wu, Yanghao Li, Karttikeya Mangalam, Haoqi Fan, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer

Instead of trying to process more frames at once like most existing methods, we propose to process videos in an online fashion and cache "memory" at each iteration.

Ranked #3 on Action Anticipation on EPIC-KITCHENS-100 (using extra training data)

Action Anticipation Action Classification +2

135

Paper
Code

Overcoming Mode Collapse with Adaptive Multi Adversarial Training

1 code implementation • 29 Dec 2021 • Karttikeya Mangalam, Rohin Garg

Generative Adversarial Networks (GANs) are a class of generative models used for various applications, but they have been known to suffer from the mode collapse problem, in which some modes of the target distribution are ignored by the generator.

Continual Learning

Paper
Code

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

7 code implementations • CVPR 2022 • Yanghao Li, Chao-yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer

In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection.

Ranked #1 on Action Classification on Kinetics-600 (GFLOPs metric)

Action Classification Action Recognition +6

29,671

Paper
Code

Object-Region Video Transformers

1 code implementation • CVPR 2022 • Roei Herzig, Elad Ben-Avraham, Karttikeya Mangalam, Amir Bar, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson

In this work, we present Object-Region Video Transformers (ORViT), an \emph{object-centric} approach that extends video transformer layers with a block that directly incorporates object representations.

Ranked #6 on Action Recognition on Diving-48

Action Detection Few-Shot action recognition +3

Paper
Code

Ego4D: Around the World in 3,000 Hours of Egocentric Video

6 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

4,978

Paper
Code

LOKI: Long Term and Key Intentions for Trajectory Prediction

no code implementations • ICCV 2021 • Harshayu Girase, Haiming Gang, Srikanth Malla, Jiachen Li, Akira Kanehara, Karttikeya Mangalam, Chiho Choi

We also propose a model that jointly performs trajectory and intention prediction, showing that recurrently reasoning about intention can assist with trajectory prediction.

Autonomous Driving Trajectory Prediction

Paper
Add Code

Multiscale Vision Transformers

7 code implementations • ICCV 2021 • Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer

We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.

Ranked #14 on Action Classification on Charades

Action Classification Action Recognition +2

6,264

Paper
Code

Mitigating Mode Collapse by Sidestepping Catastrophic Forgetting

no code implementations • 1 Jan 2021 • Karttikeya Mangalam, Rohin Garg, Jathushan Rajasegaran, Taesung Park

Continual Learning

Paper
Add Code

From Goals, Waypoints & Paths To Long Term Human Trajectory Forecasting

2 code implementations • ICCV 2021 • Karttikeya Mangalam, Yang An, Harshayu Girase, Jitendra Malik

Uncertainty in future trajectories stems from two sources: (a) sources that are known to the agent but unknown to the model, such as long term goals and (b)sources that are unknown to both the agent & the model, such as intent of other agents & irreducible randomness indecisions.

Ranked #3 on Trajectory Prediction on ETH/UCY

Trajectory Forecasting

325

Paper
Code

Long-term Human Motion Prediction with Scene Context

1 code implementation • ECCV 2020 • Zhe Cao, Hang Gao, Karttikeya Mangalam, Qi-Zhi Cai, Minh Vo, Jitendra Malik

Human movement is goal-directed and influenced by the spatial layout of the objects in the scene.

Human motion prediction motion prediction +1

240

Paper
Code

It Is Not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction

3 code implementations • ECCV 2020 • Karttikeya Mangalam, Harshayu Girase, Shreyas Agarwal, Kuan-Hui Lee, Ehsan Adeli, Jitendra Malik, Adrien Gaidon

In this work, we present Predicted Endpoint Conditioned Network (PECNet) for flexible human trajectory prediction.

Ranked #1 on Multi Future Trajectory Prediction on ETH/UCY

Autonomous Navigation Multi-future Trajectory Prediction +3

325

Paper
Code

Disentangling Human Dynamics for Pedestrian Locomotion Forecasting with Noisy Supervision

no code implementations • 4 Nov 2019 • Karttikeya Mangalam, Ehsan Adeli, Kuan-Hui Lee, Adrien Gaidon, Juan Carlos Niebles

In contrast to the previous work that aims to solve either the task of pose prediction or trajectory forecasting in isolation, we propose a framework to unify the two problems and address the practically useful task of pedestrian locomotion prediction in the wild.

Human Dynamics Pose Prediction +1

Paper
Add Code

Do deep neural networks learn shallow learnable examples first?

1 code implementation • ICML Workshop Deep_Phenomen 2019 • Karttikeya Mangalam, Vinay Uday Prabhu

In this paper, we empirically investigate the training journey of deep neural networks relative to fully trained shallow machine learning models.

Paper
Code

On Compressing U-net Using Knowledge Distillation

no code implementations • 1 Dec 2018 • Karttikeya Mangalam, Mathieu Salzamann

We study the use of knowledge distillation to compress the U-net architecture.

Knowledge Distillation

Paper
Add Code

Learning Spontaneity to Improve Emotion Recognition In Speech

no code implementations • 12 Dec 2017 • Karttikeya Mangalam, Tanaya Guha

We investigate the effect and usefulness of spontaneity (i. e. whether a given speech is spontaneous or not) in speech in the context of emotion recognition.

Speech Emotion Recognition

Paper
Add Code

Future Person Localization in First-Person Videos

1 code implementation • CVPR 2018 • Takuma Yagi, Karttikeya Mangalam, Ryo Yonetani, Yoichi Sato

We present a new task that predicts future locations of people observed in first-person videos.

Paper
Code

Bitwise Operations of Cellular Automaton on Gray-scale Images

no code implementations • 19 May 2017 • Karttikeya Mangalam, K. S. Venkatesh

The results indicate several interesting invariances in the application of the CA, such as the particular noise realization and the choice of sub-sampling of pixels to determine recombination weights.

Image Denoising

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.