Search Results for author: Abhinav Shrivastava

Found 100 papers, 46 papers with code

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

1 code implementation • 8 Apr 2024 • Bo He, Hengduo Li, Young Kyun Jang, Menglin Jia, Xuefei Cao, Ashish Shah, Abhinav Shrivastava, Ser-Nam Lim

However, existing LLM-based large multimodal models (e. g., Video-LLaMA, VideoChat) can only take in a limited number of frames for short video understanding.

Ranked #1 on Video Classification on COIN

Question Answering Video Captioning +4

100

Paper
Code

Measuring Style Similarity in Diffusion Models

1 code implementation • 1 Apr 2024 • Gowthami Somepalli, Anubhav Gupta, Kamal Gupta, Shramay Palta, Micah Goldblum, Jonas Geiping, Abhinav Shrivastava, Tom Goldstein

We also propose a method to extract style descriptors that can be used to attribute style of a generated image to the images used in the training dataset of a text-to-image model.

Attribute

Paper
Code

What is Point Supervision Worth in Video Instance Segmentation?

no code implementations • 1 Apr 2024 • Shuaiyi Huang, De-An Huang, Zhiding Yu, Shiyi Lan, Subhashree Radhakrishnan, Jose M. Alvarez, Abhinav Shrivastava, Anima Anandkumar

Video instance segmentation (VIS) is a challenging vision task that aims to detect, segment, and track objects in videos.

Instance Segmentation Object +2

Paper
Add Code

LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors

no code implementations • 21 Mar 2024 • Saksham Suri, Matthew Walmer, Kamal Gupta, Abhinav Shrivastava

We present a simple self-supervised method to enhance the performance of ViT features for dense downstream tasks.

Object Discovery

Paper
Add Code

Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models

no code implementations • 22 Feb 2024 • Yixuan Ren, Yang Zhou, Jimei Yang, Jing Shi, Difan Liu, Feng Liu, Mingi Kwon, Abhinav Shrivastava

With the emergence of text-to-video (T2V) diffusion models, its temporal counterpart, motion customization, has not yet been well investigated.

Video Generation

Paper
Add Code

Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions

no code implementations • 18 Jan 2024 • Namitha Padmanabhan, Matthew Gwilliam, Pulkit Kumar, Shishira R Maiya, Max Ehrlich, Abhinav Shrivastava

We call the aggregate of these contribution maps the Implicit Neural Canvas and we use this concept to demonstrate that the INRs which we study learn to ''see'' the frames they represent in surprising ways.

Novel View Synthesis Video Compression

Paper
Add Code

Video Dynamics Prior: An Internal Learning Approach for Robust Video Enhancements

no code implementations • NeurIPS 2023 • Gaurav Shrivastava, Ser-Nam Lim, Abhinav Shrivastava

In this paper, we present a novel robust framework for low-level vision tasks, including denoising, object removal, frame interpolation, and super-resolution, that does not require any external training data corpus.

Denoising Super-Resolution

Paper
Add Code

EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS

no code implementations • 7 Dec 2023 • Sharath Girish, Kamal Gupta, Abhinav Shrivastava

We validate the effectiveness of our approach on a variety of datasets and scenes preserving the visual quality while consuming 10-20x less memory and faster training/inference speed.

Paper
Add Code

Gen2Det: Generate to Detect

no code implementations • 7 Dec 2023 • Saksham Suri, Fanyi Xiao, Animesh Sinha, Sean Chang Culatana, Raghuraman Krishnamoorthi, Chenchen Zhu, Abhinav Shrivastava

In the long-tailed detection setting on LVIS, Gen2Det improves the performance on rare categories by a large margin while also significantly improving the performance on other categories, e. g. we see an improvement of 2. 13 Box AP and 1. 84 Mask AP over just training on real data on LVIS with Mask R-CNN.

Image Generation Object +2

Paper
Add Code

Multimodality-guided Image Style Transfer using Cross-modal GAN Inversion

no code implementations • 4 Dec 2023 • Hanyu Wang, Pengxiang Wu, Kevin Dela Rosa, Chen Wang, Abhinav Shrivastava

Compared to IIST, such approaches provide more flexibility with text-specified styles, which are useful in scenarios where the style is hard to define with reference images.

Style Transfer

Paper
Add Code

A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval

no code implementations • 30 Nov 2023 • Matthew Gwilliam, Michael Cogswell, Meng Ye, Karan Sikka, Abhinav Shrivastava, Ajay Divakaran

To provide a more thorough evaluation of the capabilities of long video retrieval systems, we propose a pipeline that leverages state-of-the-art large language models to carefully generate a diverse set of synthetic captions for long videos.

Benchmarking Retrieval +2

Paper
Add Code

Do text-free diffusion models learn discriminative visual representations?

1 code implementation • 29 Nov 2023 • Soumik Mukhopadhyay, Matthew Gwilliam, Yosuke Yamaguchi, Vatsal Agarwal, Namitha Padmanabhan, Archana Swaminathan, Tianyi Zhou, Abhinav Shrivastava

We find that the intermediate feature maps of the U-Net are diverse, discriminative feature representations.

Image Classification object-detection +3

Paper
Code

Multi-entity Video Transformers for Fine-Grained Video Representation Learning

1 code implementation • 17 Nov 2023 • Matthew Walmer, Rose Kanjirathinkal, Kai Sheng Tai, Keyur Muzumdar, Taipeng Tian, Abhinav Shrivastava

In this work, we advance the state-of-the-art for this area by re-examining the design of transformer architectures for video representation learning.

Representation Learning

Paper
Code

SHACIRA: Scalable HAsh-grid Compression for Implicit Neural Representations

no code implementations • ICCV 2023 • Sharath Girish, Abhinav Shrivastava, Kamal Gupta

Implicit Neural Representations (INR) or neural fields have emerged as a popular framework to encode multimedia signals such as images and radiance fields while retaining high-quality.

Quantization

Paper
Add Code

Chop & Learn: Recognizing and Generating Object-State Compositions

no code implementations • ICCV 2023 • Nirat Saini, Hanyu Wang, Archana Swaminathan, Vinoj Jayasundara, Bo He, Kamal Gupta, Abhinav Shrivastava

Recognizing and generating object-state compositions has been a challenging task, especially when generalizing to unseen compositions.

Action Recognition Image Generation +1

Paper
Add Code

Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization

1 code implementation • 18 Aug 2023 • Soumik Mukhopadhyay, Saksham Suri, Ravi Teja Gadde, Abhinav Shrivastava

We show results on both reconstruction (same audio-video inputs) as well as cross (different audio-video inputs) settings on Voxceleb2 and LRW datasets.

226

Paper
Code

Diffusion Models Beat GANs on Image Classification

1 code implementation • 17 Jul 2023 • Soumik Mukhopadhyay, Matthew Gwilliam, Vatsal Agarwal, Namitha Padmanabhan, Archana Swaminathan, Srinidhi Hegde, Tianyi Zhou, Abhinav Shrivastava

We explore optimal methods for extracting and using these embeddings for classification tasks, demonstrating promising results on the ImageNet classification task.

Classification Denoising +5

Paper
Code

SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network

1 code implementation • CVPR 2023 • Chuong Huynh, Yuqian Zhou, Zhe Lin, Connelly Barnes, Eli Shechtman, Sohrab Amirghodsi, Abhinav Shrivastava

In photo editing, it is common practice to remove visual distractions to improve the overall image quality and highlight the primary subject.

Panoptic Segmentation Segmentation

Paper
Code

MOST: Multiple Object localization with Self-supervised Transformers for object discovery

no code implementations • ICCV 2023 • Sai Saketh Rambhatla, Ishan Misra, Rama Chellappa, Abhinav Shrivastava

In this work, we present Multiple Object localization with Self-supervised Transformers (MOST) that uses features of transformers trained using self-supervised learning to localize multiple objects in real world images.

Object object-detection +6

Paper
Add Code

HNeRV: A Hybrid Neural Representation for Videos

1 code implementation • CVPR 2023 • Hao Chen, Matt Gwilliam, Ser-Nam Lim, Abhinav Shrivastava

Such embedding largely limits the regression capacity and internal generalization for video interpolation.

Ranked #3 on Video Reconstruction on UVG

Denoising regression +3

106

Paper
Code

ASIC: Aligning Sparse in-the-wild Image Collections

no code implementations • ICCV 2023 • Kamal Gupta, Varun Jampani, Carlos Esteves, Abhinav Shrivastava, Ameesh Makadia, Noah Snavely, Abhishek Kar

We present a self-supervised technique that directly optimizes on a sparse collection of images of a particular object/object category to obtain consistent dense correspondences across the collection.

Object

Paper
Add Code

FlexNeRF: Photorealistic Free-viewpoint Rendering of Moving Humans from Sparse Views

no code implementations • CVPR 2023 • Vinoj Jayasundara, Amit Agrawal, Nicolas Heron, Abhinav Shrivastava, Larry S. Davis

We present FlexNeRF, a method for photorealistic freeviewpoint rendering of humans in motion from monocular videos.

Paper
Add Code

Towards Scalable Neural Representation for Diverse Videos

no code implementations • CVPR 2023 • Bo He, Xitong Yang, Hanyu Wang, Zuxuan Wu, Hao Chen, Shuaiyi Huang, Yixuan Ren, Ser-Nam Lim, Abhinav Shrivastava

Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images, and have been recently applied to encode videos (e. g., NeRV, E-NeRV).

Action Recognition Video Compression

Paper
Add Code

Align and Attend: Multimodal Summarization with Dual Contrastive Losses

1 code implementation • CVPR 2023 • Bo He, Jun Wang, JieLin Qiu, Trung Bui, Abhinav Shrivastava, Zhaowen Wang

The goal of multimodal summarization is to extract the most important information from different modalities to form output summaries.

Ranked #3 on Supervised Video Summarization on SumMe

Extractive Text Summarization Supervised Video Summarization

Paper
Code

COVID-VTS: Fact Extraction and Verification on Short Video Platforms

1 code implementation • 15 Feb 2023 • Fuxiao Liu, Yaser Yacoob, Abhinav Shrivastava

We introduce a new benchmark, COVID-VTS, for fact-checking multi-modal information involving short-duration videos with COVID19- focused information from both the real world and machine generation.

Fact Checking Fact Selection +1

Paper
Code

BT^2: Backward-compatible Training with Basis Transformation

1 code implementation • ICCV 2023 • Yifei Zhou, Zilu Li, Abhinav Shrivastava, Hengshuang Zhao, Antonio Torralba, Taipeng Tian, Ser-Nam Lim

In this way, the new representation can be directly compared with the old representation, in principle avoiding the need for any backfilling.

Paper
Code

NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-wise Modeling

1 code implementation • CVPR 2023 • Shishira R Maiya, Sharath Girish, Max Ehrlich, Hanyu Wang, Kwot Sin Lee, Patrick Poirson, Pengxiang Wu, Chen Wang, Abhinav Shrivastava

This design shares computation within each group, in the spatial and temporal dimensions, resulting in reduced encoding time of the video.

Quantization Video Compression

Paper
Code

Teaching Matters: Investigating the Role of Supervision in Vision Transformers

1 code implementation • CVPR 2023 • Matthew Walmer, Saksham Suri, Kamal Gupta, Abhinav Shrivastava

We compare ViTs trained through different methods of supervision, and show that they learn a diverse range of behaviors in terms of their attention, representations, and downstream performance.

Paper
Code

CNeRV: Content-adaptive Neural Representation for Visual Data

no code implementations • 18 Nov 2022 • Hao Chen, Matt Gwilliam, Bo He, Ser-Nam Lim, Abhinav Shrivastava

We match the performance of NeRV, a state-of-the-art implicit neural representation, on the reconstruction task for frames seen during training while far surpassing for frames that are skipped during training (unseen images).

Data Compression

Paper
Add Code

$BT^2$: Backward-compatible Training with Basis Transformation

1 code implementation • 8 Nov 2022 • Yifei Zhou, Zilu Li, Abhinav Shrivastava, Hengshuang Zhao, Antonio Torralba, Taipeng Tian, Ser-Nam Lim

In this way, the new representation can be directly compared with the old representation, in principle avoiding the need for any backfilling.

Retrieval

Paper
Code

Learning Semantic Correspondence with Sparse Annotations

1 code implementation • 15 Aug 2022 • Shuaiyi Huang, Luyu Yang, Bo He, Songyang Zhang, Xuming He, Abhinav Shrivastava

In this paper, we aim to address the challenge of label sparsity in semantic correspondence by enriching supervision signals from sparse keypoint annotations.

Denoising Semantic correspondence

Paper
Code

Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning

1 code implementation • CVPR 2022 • Matthew Gwilliam, Abhinav Shrivastava

In this paper, we compare methods using performance-based benchmarks such as linear evaluation, nearest neighbor classification, and clustering for several different datasets, demonstrating the lack of a clear front-runner within the current state-of-the-art.

Benchmarking Clustering +3

Paper
Code

Disentangling Visual Embeddings for Attributes and Objects

1 code implementation • CVPR 2022 • Nirat Saini, Khoi Pham, Abhinav Shrivastava

We use visual decomposed features to hallucinate embeddings that are representative for the seen and novel compositions to better regularize the learning of our model.

Attribute Compositional Zero-Shot Learning +2

Paper
Code

Neural Space-filling Curves

no code implementations • 18 Apr 2022 • Hanyu Wang, Kamal Gupta, Larry Davis, Abhinav Shrivastava

We present Neural Space-filling Curves (SFCs), a data-driven approach to infer a context-based scan order for a set of images.

Image Compression

Paper
Add Code

LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification

1 code implementation • 6 Apr 2022 • Sharath Girish, Kamal Gupta, Saurabh Singh, Abhinav Shrivastava

We introduce LilNetX, an end-to-end trainable technique for neural networks that enables learning models with specified accuracy-rate-computation trade-off.

Model Compression

Paper
Code

ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization

1 code implementation • CVPR 2022 • Bo He, Xitong Yang, Le Kang, Zhiyu Cheng, Xin Zhou, Abhinav Shrivastava

Without the boundary information of action segments, existing methods mostly rely on multiple instance learning (MIL), where the predictions of unlabeled instances (i. e., video snippets) are supervised by classifying labeled bags (i. e., untrimmed videos).

Ranked #5 on Weakly Supervised Action Localization on ActivityNet-1.3

Weakly Supervised Temporal Action Localization

Paper
Code

ObjectFormer for Image Manipulation Detection and Localization

no code implementations • CVPR 2022 • Junke Wang, Zuxuan Wu, Jingjing Chen, Xintong Han, Abhinav Shrivastava, Ser-Nam Lim, Yu-Gang Jiang

Recent advances in image editing techniques have posed serious challenges to the trustworthiness of multimedia data, which drives the research of image tampering detection.

Image Manipulation Image Manipulation Detection

Paper
Add Code

One Network Doesn't Rule Them All: Moving Beyond Handcrafted Architectures in Self-Supervised Learning

no code implementations • 15 Mar 2022 • Sharath Girish, Debadeepta Dey, Neel Joshi, Vibhav Vineet, Shital Shah, Caio Cesar Teodoro Mendes, Abhinav Shrivastava, Yale Song

We conduct a large-scale study with over 100 variants of ResNet and MobileNet architectures and evaluate them across 11 downstream scenarios in the SSL setting.

Image Classification Self-Supervised Learning

Paper
Add Code

Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement

no code implementations • 31 Jan 2022 • Max Ehrlich, Jon Barker, Namitha Padmanabhan, Larry Davis, Andrew Tao, Bryan Catanzaro, Abhinav Shrivastava

Video compression is a central feature of the modern internet powering technologies from social media to video conferencing.

Quantization Video Compression

Paper
Add Code

SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-positive Mining

no code implementations • ICCV 2023 • Saksham Suri, Sai Saketh Rambhatla, Rama Chellappa, Abhinav Shrivastava

On average, we improve by $2. 6$, $3. 9$ and $9. 6$ mAP over previous state-of-the-art methods on three splits of increasing sparsity on COCO.

object-detection Object Detection +2

Paper
Add Code

Dual-Key Multimodal Backdoors for Visual Question Answering

1 code implementation • CVPR 2022 • Matthew Walmer, Karan Sikka, Indranil Sur, Abhinav Shrivastava, Susmit Jha

This is challenging for the attacker as the detector can distort or ignore the visual trigger entirely, which leads to models where backdoors are over-reliant on the language trigger.

Question Answering Visual Question Answering

Paper
Code

Burn After Reading: Online Adaptation for Cross-domain Streaming Data

no code implementations • 8 Dec 2021 • Luyu Yang, Mingfei Gao, Zeyuan Chen, ran Xu, Abhinav Shrivastava, Chetan Ramaiah

In the context of online privacy, many methods propose complex privacy and security preserving measures to protect sensitive data.

Unsupervised Domain Adaptation

Paper
Add Code

PatchGame: Learning to Signal Mid-level Patches in Referential Games

1 code implementation • NeurIPS 2021 • Kamal Gupta, Gowthami Somepalli, Anubhav Gupta, Vinoj Jayasundara, Matthias Zwicker, Abhinav Shrivastava

We study a referential game (a type of signaling game) where two agents communicate with each other via a discrete bottleneck to achieve a common goal.

Paper
Code

NeRV: Neural Representations for Videos

3 code implementations • NeurIPS 2021 • Hao Chen, Bo He, Hanyu Wang, Yixuan Ren, Ser-Nam Lim, Abhinav Shrivastava

In contrast, with NeRV, we can use any neural network compression method as a proxy for video compression, and achieve comparable performance to traditional frame-based video compression approaches (H. 264, HEVC \etc).

Ranked #6 on Video Reconstruction on UVG

Denoising Neural Network Compression +3

277

Paper
Code

A Frequency Perspective of Adversarial Robustness

no code implementations • 26 Oct 2021 • Shishira R Maiya, Max Ehrlich, Vatsal Agarwal, Ser-Nam Lim, Tom Goldstein, Abhinav Shrivastava

Our analysis shows that adversarial examples are neither in high-frequency nor in low-frequency components, but are simply dataset dependent.

Adversarial Robustness

Paper
Add Code

HR-RCNN: Hierarchical Relational Reasoning for Object Detection

no code implementations • 26 Oct 2021 • Hao Chen, Abhinav Shrivastava

Incorporating relational reasoning in neural networks for object recognition remains an open problem.

Graph Attention Instance Segmentation +6

Paper
Add Code

Diverse Video Generation using a Gaussian Process Trigger

1 code implementation • ICLR 2021 • Gaurav Shrivastava, Abhinav Shrivastava

Our approach, Diverse Video Generator, uses a Gaussian Process (GP) to learn priors on future states given the past and maintains a probability distribution over possible futures given a particular sample.

Ranked #1 on Video Prediction on KTH (Diversity metric)

Video Generation Video Prediction

Paper
Code

Hierarchical Video Prediction Using Relational Layouts for Human-Object Interactions

no code implementations • CVPR 2021 • Navaneeth Bodla, Gaurav Shrivastava, Rama Chellappa, Abhinav Shrivastava

Our work builds on hierarchical video prediction models, which disentangle the video generation process into two stages: predicting a high-level representation, such as pose sequence, and then learning a pose-to-pixels translation model for pixel generation.

Human-Object Interaction Detection Object +4

Paper
Add Code

Learning Graphs for Knowledge Transfer With Limited Labels

no code implementations • CVPR 2021 • Pallabi Ghosh, Nirat Saini, Larry S. Davis, Abhinav Shrivastava

The standard paradigm is to utilize relationships in the input graph to transfer information using GCNs from training to testing nodes in the graph; for example, the semi-supervised, zero-shot, and few-shot learning setups.

Benchmarking Few-Shot action recognition +3

Paper
Add Code

Learning to Predict Visual Attributes in the Wild

no code implementations • CVPR 2021 • Khoi Pham, Kushal Kafle, Zhe Lin, Zhihong Ding, Scott Cohen, Quan Tran, Abhinav Shrivastava

In this paper, we introduce a large-scale in-the-wild visual attribute prediction dataset consisting of over 927K attribute annotations for over 260K object instances.

Attribute Contrastive Learning +2

Paper
Add Code

Rethinking Pseudo Labels for Semi-Supervised Object Detection

no code implementations • 1 Jun 2021 • Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, Larry S. Davis

In this paper, we introduce certainty-aware pseudo labels tailored for object detection, which can effectively estimate the classification and localization quality of derived pseudo labels.

Ranked #8 on Semi-Supervised Object Detection on COCO 100% labeled data (using extra training data)

Classification Image Classification +4

Paper
Add Code

Towards Discovery and Attribution of Open-world GAN Generated Images

1 code implementation • ICCV 2021 • Sharath Girish, Saksham Suri, Saketh Rambhatla, Abhinav Shrivastava

Through extensive experiments, we show that our algorithm discovers unseen GANs with high accuracy and also generalizes to GANs trained on unseen real datasets.

Attribute Clustering +1

Paper
Code

The Pursuit of Knowledge: Discovering and Localizing Novel Categories using Dual Memory

no code implementations • ICCV 2021 • Sai Saketh Rambhatla, Rama Chellappa, Abhinav Shrivastava

We tackle object category discovery, which is the problem of discovering and localizing novel objects in a large unlabeled dataset.

Object

Paper
Add Code

Learned Spatial Representations for Few-shot Talking-Head Synthesis

no code implementations • ICCV 2021 • Moustafa Meshry, Saksham Suri, Larry S. Davis, Abhinav Shrivastava

In contrast, we propose to factorize the representation of a subject into its spatial and style components.

Paper
Add Code

StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis

no code implementations • CVPR 2021 • Moustafa Meshry, Yixuan Ren, Larry S Davis, Abhinav Shrivastava

Specifically, we pre-train a generic style encoder using a novel proxy task to learn an embedding of images, from arbitrary domains, into a low-dimensional style latent space.

Image Generation Translation

Paper
Add Code

Knowledge Evolution in Neural Networks

1 code implementation • CVPR 2021 • Ahmed Taha, Abhinav Shrivastava, Larry Davis

We evaluate KE using relatively small datasets (e. g., CUB-200) and randomly initialized deep networks.

Metric Learning

Paper
Code

SVMax: A Feature Embedding Regularizer

1 code implementation • 4 Mar 2021 • Ahmed Taha, Alex Hanson, Abhinav Shrivastava, Larry Davis

The SVMax regularizer supports both supervised and unsupervised learning.

Retrieval

Paper
Code

Deep Video Inpainting Detection

no code implementations • 26 Jan 2021 • Peng Zhou, Ning Yu, Zuxuan Wu, Larry S. Davis, Abhinav Shrivastava, Ser-Nam Lim

This paper studies video inpainting detection, which localizes an inpainted region in a video both spatially and temporally.

Video Inpainting

Paper
Add Code

Multimodal Attention for Layout Synthesis in Diverse Domains

no code implementations • 1 Jan 2021 • Kamal Gupta, Vijay Mahadevan, Alessandro Achille, Justin Lazarow, Larry S. Davis, Abhinav Shrivastava

We address the problem of scene layout generation for diverse domains such as images, mobile applications, documents and 3D objects.

Paper
Add Code

Learning What Not to Model: Gaussian Process Regression with Negative Constraints

no code implementations • 1 Jan 2021 • Gaurav Shrivastava, Harsh Shrivastava, Abhinav Shrivastava

But, what if for an input point '$\bar{\mathbf{x}}$', we want to constrain the GP to avoid a target regression value '$\bar{y}(\bar{\mathbf{x}})$' (a negative datapair)?

Navigate regression

Paper
Add Code

2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition

no code implementations • CVPR 2021 • Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, Larry S. Davis

Then, only frames and convolutions that are selected by the selection network are used in the 3D model to generate predictions.

Ranked #11 on Action Recognition on ActivityNet

Action Recognition Policy Gradient Methods +1

Paper
Add Code

GTA: Global Temporal Attention for Video Action Understanding

no code implementations • 15 Dec 2020 • Bo He, Xitong Yang, Zuxuan Wu, Hao Chen, Ser-Nam Lim, Abhinav Shrivastava

To this end, we introduce Global Temporal Attention (GTA), which performs global temporal attention on top of spatial attention in a decoupled manner.

Action Recognition Action Understanding +1

Paper
Add Code

The Lottery Ticket Hypothesis for Object Recognition

1 code implementation • CVPR 2021 • Sharath Girish, Shishira R. Maiya, Kamal Gupta, Hao Chen, Larry Davis, Abhinav Shrivastava

The recently proposed Lottery Ticket Hypothesis (LTH) states that deep neural networks trained on large datasets contain smaller subnetworks that achieve on par performance as the dense networks.

Instance Segmentation Keypoint Estimation +5

Paper
Code

Analyzing and Mitigating JPEG Compression Defects in Deep Learning

no code implementations • 17 Nov 2020 • Max Ehrlich, Larry Davis, Ser-Nam Lim, Abhinav Shrivastava

We show that there is a significant penalty on common performance metrics for high compression.

Paper
Add Code

Learning Visual Representations for Transfer Learning by Suppressing Texture

1 code implementation • 3 Nov 2020 • Shlok Mishra, Anshul Shah, Ankan Bansal, Janit Anjaria, Jonghyun Choi, Abhinav Shrivastava, Abhishek Sharma, David Jacobs

Recent literature has shown that features obtained from supervised training of CNNs may over-emphasize texture rather than encoding high-level information.

Ranked #19 on Object Detection on PASCAL VOC 2007

Image Classification object-detection +3

188

Paper
Code

Pose And Joint-Aware Action Recognition

1 code implementation • 16 Oct 2020 • Anshul Shah, Shlok Mishra, Ankan Bansal, Jun-Cheng Chen, Rama Chellappa, Abhinav Shrivastava

Unlike other modalities, constellation of joints and their motion generate models with succinct human motion information for activity recognition.

Ranked #1 on Action Recognition on Mimetics

Action Classification Action Recognition In Videos +5

Paper
Code

Improved Modeling of 3D Shapes with Multi-view Depth Maps

1 code implementation • 7 Sep 2020 • Kamal Gupta, Susmija Jabbireddy, Ketul Shah, Abhinav Shrivastava, Matthias Zwicker

Our simple encoder-decoder framework, comprised of a novel identity encoder and class-conditional viewpoint generator, generates 3D consistent depth maps.

Image Generation

Paper
Code

All About Knowledge Graphs for Actions

no code implementations • 28 Aug 2020 • Pallabi Ghosh, Nirat Saini, Larry S. Davis, Abhinav Shrivastava

Current action recognition systems require large amounts of training data for recognizing an action.

Ranked #17 on Zero-Shot Action Recognition on Kinetics

Few-Shot action recognition Few Shot Action Recognition +5

Paper
Add Code

Deep Co-Training with Task Decomposition for Semi-Supervised Domain Adaptation

1 code implementation • ICCV 2021 • Luyu Yang, Yan Wang, Mingfei Gao, Abhinav Shrivastava, Kilian Q. Weinberger, Wei-Lun Chao, Ser-Nam Lim

To integrate the strengths of the two classifiers, we apply the well-established co-training framework, in which the two classifiers exchange their high confident predictions to iteratively "teach each other" so that both classifiers can excel in the target domain.

Semi-supervised Domain Adaptation Unsupervised Domain Adaptation

Paper
Code

End-to-end Learning of Compressible Features

1 code implementation • 23 Jul 2020 • Saurabh Singh, Sami Abu-El-Haija, Nick Johnston, Johannes Ballé, Abhinav Shrivastava, George Toderici

We propose a learned method that jointly optimizes for compressibility along with the task objective for learning the features.

Quantization

Paper
Code

A Generic Visualization Approach for Convolutional Neural Networks

2 code implementations • ECCV 2020 • Ahmed Taha, Xitong Yang, Abhinav Shrivastava, Larry Davis

Compared to classification networks, attention visualization for retrieval networks is hardly studied.

Classification General Classification +2

Paper
Code

Curriculum Manager for Source Selection in Multi-Source Domain Adaptation

no code implementations • ECCV 2020 • Luyu Yang, Yogesh Balaji, Ser-Nam Lim, Abhinav Shrivastava

In this paper, we proposed an adversarial agent that learns a dynamic curriculum for source samples, called Curriculum Manager for Source Selection (CMSS).

Multi-Source Unsupervised Domain Adaptation Unsupervised Domain Adaptation

Paper
Add Code

Group Ensemble: Learning an Ensemble of ConvNets in a single ConvNet

1 code implementation • 1 Jul 2020 • Hao Chen, Abhinav Shrivastava

Owing to group convolution and the shared-base, GENet can fully leverage the advantage of explicit ensemble learning while retaining the same computation as a single ConvNet.

Action Recognition Ensemble Learning +2

Paper
Code

LayoutTransformer: Layout Generation and Completion with Self-attention

2 code implementations • ICCV 2021 • Kamal Gupta, Justin Lazarow, Alessandro Achille, Larry Davis, Vijay Mahadevan, Abhinav Shrivastava

Generating a new layout or extending an existing layout requires understanding the relationships between these primitives.

140

Paper
Code

Quantization Guided JPEG Artifact Correction

1 code implementation • ECCV 2020 • Max Ehrlich, Larry Davis, Ser-Nam Lim, Abhinav Shrivastava

The JPEG image compression algorithm is the most popular method of image compression because of its ability for large compression ratios.

Ranked #1 on JPEG Artifact Correction on ICB (Quality 20 Grayscale)

JPEG Artifact Correction Quantization

Paper
Code

Spatial Priming for Detecting Human-Object Interactions

no code implementations • 9 Apr 2020 • Ankan Bansal, Sai Saketh Rambhatla, Abhinav Shrivastava, Rama Chellappa

The proposed method consists of a layout module which primes a visual module to predict the type of interaction between a human and an object.

Human-Object Interaction Detection Object

Paper
Add Code

PatchVAE: Learning Local Latent Codes for Recognition

1 code implementation • CVPR 2020 • Kamal Gupta, Saurabh Singh, Abhinav Shrivastava

Unsupervised representation learning holds the promise of exploiting large amounts of unlabeled data to learn general representations.

Representation Learning

Paper
Code

Hand-Priming in Object Localization for Assistive Egocentric Vision

no code implementations • 28 Feb 2020 • Kyungjun Lee, Abhinav Shrivastava, Hernisa Kacorri

Egocentric vision holds great promises for increasing access to visual information and improving the quality of life for people with visual impairments, with object recognition being one of the daily challenges for this population.

Hand Segmentation Multi-Task Learning +3

Paper
Add Code

Depth Completion Using a View-constrained Deep Prior

no code implementations • 21 Jan 2020 • Pallabi Ghosh, Vibhav Vineet, Larry S. Davis, Abhinav Shrivastava, Sudipta Sinha, Neel Joshi

Given color images and noisy and incomplete target depth maps, we optimize a randomly-initialized CNN model to reconstruct a depth map restored by virtue of using the CNN network structure as a prior combined with a view-constrained photo-consistency loss.

Depth Completion Image Denoising

Paper
Add Code

Style-based Encoder Pre-training for Multi-modal Image Synthesis

no code implementations • 25 Sep 2019 • Moustafa Meshry, Yixuan Ren, Ricardo Martin-Brualla, Larry Davis, Abhinav Shrivastava

Then we train a generator to transform an input image along with a style-code to the output domain.

Image Generation Translation

Paper
Add Code

Scalable Model Compression by Entropy Penalized Reparameterization

no code implementations • ICLR 2020 • Deniz Oktay, Johannes Ballé, Saurabh Singh, Abhinav Shrivastava

We describe a simple and general neural network weight compression approach, in which the network parameters (weights and biases) are represented in a "latent" space, amounting to a reparameterization.

General Classification Model Compression

Paper
Add Code

Render4Completion: Synthesizing Multi-View Depth Maps for 3D Shape Completion

no code implementations • 17 Apr 2019 • Tao Hu, Zhizhong Han, Abhinav Shrivastava, Matthias Zwicker

Different from image-to-image translation network that completes each view separately, our novel network, multi-view completion net (MVCN), leverages information from all views of a 3D shape to help the completion of each single view.

Image-to-Image Translation Translation

Paper
Add Code

EvalNorm: Estimating Batch Normalization Statistics for Evaluation

no code implementations • ICCV 2019 • Saurabh Singh, Abhinav Shrivastava

Batch normalization (BN) has been very effective for deep learning and is widely used.

object-detection Object Detection

Paper
Add Code

Referring to Objects in Videos using Spatio-Temporal Identifying Descriptions

no code implementations • WS 2019 • Peratham Wiriyathammabhum, Abhinav Shrivastava, Vlad I. Morariu, Larry S. Davis

This paper presents a new task, the grounding of spatio-temporal identifying descriptions in videos.

Paper
Add Code

Relational Action Forecasting

no code implementations • CVPR 2019 • Chen Sun, Abhinav Shrivastava, Carl Vondrick, Rahul Sukthankar, Kevin Murphy, Cordelia Schmid

This paper focuses on multi-person action forecasting in videos.

Action Classification Action Recognition +1

Paper
Add Code

Detecting Human-Object Interactions via Functional Generalization

no code implementations • 5 Apr 2019 • Ankan Bansal, Sai Saketh Rambhatla, Abhinav Shrivastava, Rama Chellappa

We present an approach for detecting human-object interactions (HOIs) in images, based on the idea that humans interact with functionally similar objects in a similar manner.

Human-Object Interaction Detection Object

Paper
Add Code

Unsupervised Data Uncertainty Learning in Visual Retrieval Systems

no code implementations • 7 Feb 2019 • Ahmed Taha, Yi-Ting Chen, Teruhisa Misu, Abhinav Shrivastava, Larry Davis

We introduce an unsupervised formulation to estimate heteroscedastic uncertainty in retrieval systems.

Retrieval Video Retrieval

Paper
Add Code

Boosting Standard Classification Architectures Through a Ranking Regularizer

1 code implementation • 24 Jan 2019 • Ahmed Taha, Yi-Ting Chen, Teruhisa Misu, Abhinav Shrivastava, Larry Davis

We employ triplet loss as a feature embedding regularizer to boost classification performance.

Classification General Classification

Paper
Code

Generate, Segment and Refine: Towards Generic Manipulation Segmentation

1 code implementation • 24 Nov 2018 • Peng Zhou, Bor-Chun Chen, Xintong Han, Mahyar Najibi, Abhinav Shrivastava, Ser Nam Lim, Larry S. Davis

The advent of image sharing platforms and the easy availability of advanced photo editing software have resulted in a large quantities of manipulated images being shared on the internet.

Detecting Image Manipulation Image Generation +3

Paper
Code

Actor-Centric Relation Network

1 code implementation • ECCV 2018 • Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, Cordelia Schmid

A visualization of the learned relation features confirms that our approach is able to attend to the relevant relations for each action.

Ranked #15 on Action Recognition on AVA v2.1

Action Classification Action Detection +5

3,866

Paper
Code

Tracking Emerges by Colorizing Videos

1 code implementation • ECCV 2018 • Carl Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, Kevin Murphy

We use large amounts of unlabeled video to learn models for visual tracking without manual human supervision.

Ranked #2 on Skeleton Based Action Recognition on JHMDB Pose Tracking

Colorization Optical Flow Estimation +2

Paper
Code

Revisiting Unreasonable Effectiveness of Data in Deep Learning Era

2 code implementations • ICCV 2017 • Chen Sun, Abhinav Shrivastava, Saurabh Singh, Abhinav Gupta

What will happen if we increase the dataset size by 10x or 100x?

Ranked #2 on Semantic Segmentation on PASCAL VOC 2007

Image Classification object-detection +4

3,048

Paper
Code

A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection

4 code implementations • CVPR 2017 • Xiaolong Wang, Abhinav Shrivastava, Abhinav Gupta

We propose to learn an adversarial network that generates examples with occlusions and deformations.

Ranked #20 on Object Detection on PASCAL VOC 2007 (using extra training data)

Object object-detection +1

480

Paper
Code

Beyond Skip Connections: Top-Down Modulation for Object Detection

1 code implementation • 20 Dec 2016 • Abhinav Shrivastava, Rahul Sukthankar, Jitendra Malik, Abhinav Gupta

But most of these fine details are lost in the early convolutional layers.

Ranked #203 on Object Detection on COCO test-dev

Object object-detection +1

111

Paper
Code

Cross-stitch Networks for Multi-task Learning

1 code implementation • CVPR 2016 • Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, Martial Hebert

In this paper, we propose a principled approach to learn shared representations in ConvNets using multi-task learning.

Ranked #106 on Semantic Segmentation on NYU Depth v2

Multi-Task Learning Semantic Segmentation

120

Paper
Code

Training Region-based Object Detectors with Online Hard Example Mining

5 code implementations • CVPR 2016 • Abhinav Shrivastava, Abhinav Gupta, Ross Girshick

Our motivation is the same as it has always been -- detection datasets contain an overwhelming number of easy examples and a small number of hard examples.

Ranked #6 on Face Verification on Trillion Pairs Dataset

object-detection Object Detection

421

Paper
Code

Watch and Learn: Semi-Supervised Learning for Object Detectors From Video

no code implementations • CVPR 2015 • Ishan Misra, Abhinav Shrivastava, Martial Hebert

We present a semi-supervised approach that localizes multiple unknown object instances in long videos.

Object object-detection +1

Paper
Add Code

Watch and Learn: Semi-Supervised Learning of Object Detectors from Videos

no code implementations • 21 May 2015 • Ishan Misra, Abhinav Shrivastava, Martial Hebert

We present a semi-supervised approach that localizes multiple unknown object instances in long videos.

Object object-detection +1

Paper
Add Code

Mid-level Elements for Object Detection

no code implementations • 27 Apr 2015 • Aayush Bansal, Abhinav Shrivastava, Carl Doersch, Abhinav Gupta

Building on the success of recent discriminative mid-level elements, we propose a surprisingly simple approach for object detection which performs comparable to the current state-of-the-art approaches on PASCAL VOC comp-3 detection challenge (no external data).

Object object-detection +1

Paper
Add Code

Enriching Visual Knowledge Bases via Object Discovery and Segmentation

no code implementations • CVPR 2014 • Xinlei Chen, Abhinav Shrivastava, Abhinav Gupta

In this paper, we propose to enrich these knowledge bases by automatically discovering objects and their segmentations from noisy Internet images.

Object Discovery Segmentation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.