Search Results for author: Karttikeya Mangalam

Found 29 papers, 16 papers with code

EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

1 code implementation17 Aug 2023 Karttikeya Mangalam, Raiymbek Akshulakov, Jitendra Malik

We introduce EgoSchema, a very long-form video question-answering dataset, and benchmark to evaluate long video understanding capabilities of modern vision and language systems.

Multiple-choice Question Answering +2

PaReprop: Fast Parallelized Reversible Backpropagation

no code implementations15 Jun 2023 Tyler Zhu, Karttikeya Mangalam

We present PaReprop, a fast Parallelized Reversible Backpropagation algorithm that parallelizes the additional activation re-computation overhead in reversible training with the gradient computation itself in backpropagation phase.


Speculative Decoding with Big Little Decoder

1 code implementation15 Feb 2023 Sehoon Kim, Karttikeya Mangalam, Suhong Moon, John Canny, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer

To address this, we propose Big Little Decoder (BiLD), a framework that can improve inference efficiency and latency for a wide range of text generation applications.

Language Modelling Machine Translation +1

Reversible Vision Transformers

4 code implementations CVPR 2022 Karttikeya Mangalam, Haoqi Fan, Yanghao Li, Chao-yuan Wu, Bo Xiong, Christoph Feichtenhofer, Jitendra Malik

Reversible Vision Transformers achieve a reduced memory footprint of up to 15. 5x at roughly identical model complexity, parameters and accuracy, demonstrating the promise of reversible vision transformers as an efficient backbone for hardware resource limited training regimes.

Image Classification object-detection +2

Latency Matters: Real-Time Action Forecasting Transformer

no code implementations CVPR 2023 Harshayu Girase, Nakul Agarwal, Chiho Choi, Karttikeya Mangalam

We present RAFTformer, a real-time action forecasting transformer for latency aware real-world action forecasting applications.

Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization

1 code implementation25 Nov 2022 Chen Zhao, Shuming Liu, Karttikeya Mangalam, Bernard Ghanem

Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content.

Temporal Action Localization

Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022

no code implementations15 Jun 2022 Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson

First, as both images and videos contain structured information, we enrich a transformer model with a set of \emph{object tokens} that can be used across images and videos.

Point- of-no-return (PNR) temporal localization Temporal Localization

Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens

no code implementations13 Jun 2022 Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson

We explore a particular instantiation of scene structure, namely a \emph{Hand-Object Graph}, consisting of hands and objects with their locations as nodes, and physical relations of contact/no-contact as edges.

Action Recognition Video Understanding

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

4 code implementations2 Jun 2022 Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer

After re-examining the design choices for both the macro and micro-architecture of Conformer, we propose Squeezeformer which consistently outperforms the state-of-the-art ASR models under the same training schemes.

Automatic Speech Recognition Automatic Speech Recognition (ASR)

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

1 code implementation CVPR 2022 Chao-yuan Wu, Yanghao Li, Karttikeya Mangalam, Haoqi Fan, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer

Instead of trying to process more frames at once like most existing methods, we propose to process videos in an online fashion and cache "memory" at each iteration.

Ranked #3 on Action Anticipation on EPIC-KITCHENS-100 (using extra training data)

Action Anticipation Action Classification +2

Overcoming Mode Collapse with Adaptive Multi Adversarial Training

1 code implementation29 Dec 2021 Karttikeya Mangalam, Rohin Garg

Generative Adversarial Networks (GANs) are a class of generative models used for various applications, but they have been known to suffer from the mode collapse problem, in which some modes of the target distribution are ignored by the generator.

Continual Learning

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

6 code implementations CVPR 2022 Yanghao Li, Chao-yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer

In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection.

 Ranked #1 on Action Classification on Kinetics-600 (GFLOPs metric)

Action Classification Action Recognition +5

Object-Region Video Transformers

1 code implementation CVPR 2022 Roei Herzig, Elad Ben-Avraham, Karttikeya Mangalam, Amir Bar, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson

In this work, we present Object-Region Video Transformers (ORViT), an \emph{object-centric} approach that extends video transformer layers with a block that directly incorporates object representations.

Action Detection Few-Shot action recognition +2

Ego4D: Around the World in 3,000 Hours of Egocentric Video

3 code implementations CVPR 2022 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

LOKI: Long Term and Key Intentions for Trajectory Prediction

no code implementations ICCV 2021 Harshayu Girase, Haiming Gang, Srikanth Malla, Jiachen Li, Akira Kanehara, Karttikeya Mangalam, Chiho Choi

We also propose a model that jointly performs trajectory and intention prediction, showing that recurrently reasoning about intention can assist with trajectory prediction.

Autonomous Driving Trajectory Prediction

Multiscale Vision Transformers

7 code implementations ICCV 2021 Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer

We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.

Action Classification Action Recognition +2

Mitigating Mode Collapse by Sidestepping Catastrophic Forgetting

no code implementations1 Jan 2021 Karttikeya Mangalam, Rohin Garg, Jathushan Rajasegaran, Taesung Park

Generative Adversarial Networks (GANs) are a class of generative models used for various applications, but they have been known to suffer from the mode collapse problem, in which some modes of the target distribution are ignored by the generator.

Continual Learning

From Goals, Waypoints & Paths To Long Term Human Trajectory Forecasting

2 code implementations ICCV 2021 Karttikeya Mangalam, Yang An, Harshayu Girase, Jitendra Malik

Uncertainty in future trajectories stems from two sources: (a) sources that are known to the agent but unknown to the model, such as long term goals and (b)sources that are unknown to both the agent & the model, such as intent of other agents & irreducible randomness indecisions.

Trajectory Forecasting

Disentangling Human Dynamics for Pedestrian Locomotion Forecasting with Noisy Supervision

no code implementations4 Nov 2019 Karttikeya Mangalam, Ehsan Adeli, Kuan-Hui Lee, Adrien Gaidon, Juan Carlos Niebles

In contrast to the previous work that aims to solve either the task of pose prediction or trajectory forecasting in isolation, we propose a framework to unify the two problems and address the practically useful task of pedestrian locomotion prediction in the wild.

Human Dynamics Pose Prediction +1

Do deep neural networks learn shallow learnable examples first?

1 code implementation ICML Workshop Deep_Phenomen 2019 Karttikeya Mangalam, Vinay Uday Prabhu

In this paper, we empirically investigate the training journey of deep neural networks relative to fully trained shallow machine learning models.

On Compressing U-net Using Knowledge Distillation

no code implementations1 Dec 2018 Karttikeya Mangalam, Mathieu Salzamann

We study the use of knowledge distillation to compress the U-net architecture.

Knowledge Distillation

Learning Spontaneity to Improve Emotion Recognition In Speech

no code implementations12 Dec 2017 Karttikeya Mangalam, Tanaya Guha

We investigate the effect and usefulness of spontaneity (i. e. whether a given speech is spontaneous or not) in speech in the context of emotion recognition.

Speech Emotion Recognition

Future Person Localization in First-Person Videos

1 code implementation CVPR 2018 Takuma Yagi, Karttikeya Mangalam, Ryo Yonetani, Yoichi Sato

We present a new task that predicts future locations of people observed in first-person videos.

Bitwise Operations of Cellular Automaton on Gray-scale Images

no code implementations19 May 2017 Karttikeya Mangalam, K. S. Venkatesh

The results indicate several interesting invariances in the application of the CA, such as the particular noise realization and the choice of sub-sampling of pixels to determine recombination weights.

Image Denoising

Cannot find the paper you are looking for? You can Submit a new open access paper.