Search Results for author: Anurag Arnab

Found 38 papers, 22 papers with code

End-to-End Spatio-Temporal Action Localisation with Video Transformers

no code implementations24 Apr 2023 Alexey Gritsenko, Xuehan Xiong, Josip Djolonga, Mostafa Dehghani, Chen Sun, Mario Lučić, Cordelia Schmid, Anurag Arnab

The most performant spatio-temporal action localisation models use external person proposals and complex external memory banks.

VicTR: Video-conditioned Text Representations for Activity Recognition

no code implementations5 Apr 2023 Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani, Michael S. Ryoo

All such recipes rely on augmenting visual embeddings with temporal information (i. e., image -> video), often keeping text embeddings unchanged or even being discarded.

Action Classification Activity Recognition +1

CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation

1 code implementation21 Mar 2023 Seokju Cho, Heeseong Shin, Sunghwan Hong, Seungjun An, Seungjun Lee, Anurag Arnab, Paul Hongsuck Seo, Seungryong Kim

However, the problem of transferring these capabilities learned from image-level supervision to the pixel-level task of segmentation and addressing arbitrary unseen categories at inference makes this task challenging.

Image Segmentation Open Vocabulary Semantic Segmentation +2

Adaptive Computation with Elastic Input Sequence

1 code implementation30 Jan 2023 Fuzhao Xue, Valerii Likhosherstov, Anurag Arnab, Neil Houlsby, Mostafa Dehghani, Yang You

However, most standard neural networks have the same function type and fixed computation budget on different samples regardless of their nature and difficulty.

Inductive Bias

How Can Objects Help Action Recognition?

no code implementations CVPR 2023 Xingyi Zhou, Anurag Arnab, Chen Sun, Cordelia Schmid

In this paper, we investigate how we can use knowledge of objects to design better video models, namely to process fewer tokens and to improve recognition accuracy.

Action Recognition

Audiovisual Masked Autoencoders

no code implementations9 Dec 2022 Mariana-Iuliana Georgescu, Eduardo Fonseca, Radu Tudor Ionescu, Mario Lucic, Cordelia Schmid, Anurag Arnab

Can we leverage the audiovisual information already present in video to improve self-supervised representation learning?

Representation Learning

Token Turing Machines

1 code implementation CVPR 2023 Michael S. Ryoo, Keerthana Gopalakrishnan, Kumara Kahatapitiya, Ted Xiao, Kanishka Rao, Austin Stone, Yao Lu, Julian Ibarz, Anurag Arnab

The model's memory module ensures that a new observation will only be processed with the contents of the memory (and not the entire history), meaning that it can efficiently process long sequences with a bounded computational cost at each step.

Action Detection Activity Detection

Dynamic Graph Message Passing Networks for Visual Recognition

2 code implementations20 Sep 2022 Li Zhang, Mohan Chen, Anurag Arnab, xiangyang xue, Philip H. S. Torr

A fully-connected graph, such as the self-attention operation in Transformers, is beneficial for such modelling, however, its computational overhead is prohibitive.

Image Classification object-detection +3

M&M Mix: A Multimodal Multiview Transformer Ensemble

no code implementations20 Jun 2022 Xuehan Xiong, Anurag Arnab, Arsha Nagrani, Cordelia Schmid

This report describes the approach behind our winning solution to the 2022 Epic-Kitchens Action Recognition Challenge.

 Ranked #1 on Action Recognition on EPIC-KITCHENS-100 (using extra training data)

Action Recognition Video Recognition

Multiview Transformers for Video Recognition

1 code implementation CVPR 2022 Shen Yan, Xuehan Xiong, Anurag Arnab, Zhichao Lu, Mi Zhang, Chen Sun, Cordelia Schmid

Video understanding requires reasoning at multiple spatiotemporal resolutions -- from short fine-grained motions to events taking place over longer durations.

Ranked #2 on Action Recognition on EPIC-KITCHENS-100 (using extra training data)

Action Classification Action Recognition +1

TokenLearner: Adaptive Space-Time Tokenization for Videos

1 code implementation NeurIPS 2021 Michael Ryoo, AJ Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova

In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks.

Representation Learning Video Recognition +1

PolyViT: Co-training Vision Transformers on Images, Videos and Audio

no code implementations25 Nov 2021 Valerii Likhosherstov, Anurag Arnab, Krzysztof Choromanski, Mario Lucic, Yi Tay, Adrian Weller, Mostafa Dehghani

Can we train a single transformer model capable of processing multiple modalities and datasets, whilst sharing almost all of its learnable parameters?

Audio Classification

The Efficiency Misnomer

no code implementations ICLR 2022 Mostafa Dehghani, Anurag Arnab, Lucas Beyer, Ashish Vaswani, Yi Tay

We further present suggestions to improve reporting of efficiency metrics.

SCENIC: A JAX Library for Computer Vision Research and Beyond

1 code implementation CVPR 2022 Mostafa Dehghani, Alexey Gritsenko, Anurag Arnab, Matthias Minderer, Yi Tay

Scenic is an open-source JAX library with a focus on Transformer-based models for computer vision research and beyond.

Compressive Visual Representations

1 code implementation NeurIPS 2021 Kuang-Huei Lee, Anurag Arnab, Sergio Guadarrama, John Canny, Ian Fischer

We verify this by developing SimCLR and BYOL formulations compatible with the Conditional Entropy Bottleneck (CEB) objective, allowing us to both measure and control the amount of compression in the learned representation, and observe their impact on downstream tasks.

Contrastive Learning Self-Supervised Image Classification

Attention Bottlenecks for Multimodal Fusion

1 code implementation NeurIPS 2021 Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, Chen Sun

Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio.

 Ranked #1 on Audio Classification on VGGSound (Top 5 Accuracy metric)

Action Classification Action Recognition +2

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

4 code implementations21 Jun 2021 Michael S. Ryoo, AJ Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova

In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks.

Action Classification Image Classification +3

Unified Graph Structured Models for Video Understanding

no code implementations ICCV 2021 Anurag Arnab, Chen Sun, Cordelia Schmid

Accurate video understanding involves reasoning about the relationships between actors, objects and their environment, often over long temporal intervals.

Action Detection Graph Classification +3

ViViT: A Video Vision Transformer

5 code implementations ICCV 2021 Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, Cordelia Schmid

We present pure-transformer based models for video classification, drawing upon the recent success of such models in image classification.

Ranked #8 on Action Classification on Moments in Time (Top 5 Accuracy metric, using extra training data)

Action Classification Action Recognition +4

Dual Graph Convolutional Network for Semantic Segmentation

5 code implementations13 Sep 2019 Li Zhang, Xiangtai Li, Anurag Arnab, Kuiyuan Yang, Yunhai Tong, Philip H. S. Torr

Exploiting long-range contextual information is key for pixel-wise prediction tasks such as semantic segmentation.

Semantic Segmentation

Dynamic Graph Message Passing Networks

1 code implementation CVPR 2020 Li Zhang, Dan Xu, Anurag Arnab, Philip H. S. Torr

We propose a dynamic graph message passing network, that significantly reduces the computational complexity compared to related works modelling a fully-connected graph.

Image Classification object-detection +3

Exploiting temporal context for 3D human pose estimation in the wild

1 code implementation CVPR 2019 Anurag Arnab, Carl Doersch, Andrew Zisserman

We present a bundle-adjustment-based algorithm for recovering accurate 3D human pose and meshes from monocular videos.

 Ranked #1 on Monocular 3D Human Pose Estimation on Human3.6M (Use Video Sequence metric)

3D Pose Estimation Monocular 3D Human Pose Estimation

Weakly- and Semi-Supervised Panoptic Segmentation

1 code implementation ECCV 2018 Qizhu Li, Anurag Arnab, Philip H. S. Torr

We present a weakly supervised model that jointly performs both semantic- and instance-segmentation -- a particularly relevant problem given the substantial cost of obtaining pixel-perfect annotation for these tasks.

Instance Segmentation Panoptic Segmentation +3

On the Robustness of Semantic Segmentation Models to Adversarial Attacks

1 code implementation CVPR 2018 Anurag Arnab, Ondrej Miksik, Philip H. S. Torr

Deep Neural Networks (DNNs) have demonstrated exceptional performance on most recognition tasks such as image classification and segmentation.

General Classification Image Classification +2

Holistic, Instance-Level Human Parsing

1 code implementation11 Sep 2017 Qizhu Li, Anurag Arnab, Philip H. S. Torr

We address this problem by segmenting the parts of objects at an instance-level, such that each pixel in the image is assigned a part label, as well as the identity of the object it belongs to.

Human Detection Multi-Human Parsing

Pixelwise Instance Segmentation with a Dynamically Instantiated Network

1 code implementation CVPR 2017 Anurag Arnab, Philip H. S. Torr

This subnetwork uses the initial category-level segmentation, along with cues from the output of an object detector, within an end-to-end CRF to predict instances.

Instance Segmentation object-detection +2

A Projected Gradient Descent Method for CRF Inference allowing End-To-End Training of Arbitrary Pairwise Potentials

no code implementations24 Jan 2017 Måns Larsson, Anurag Arnab, Fredrik Kahl, Shuai Zheng, Philip Torr

It is empirically demonstrated that such learned potentials can improve segmentation accuracy and that certain label class interactions are indeed better modelled by a non-Gaussian potential.

Semantic Segmentation Structured Prediction

Bottom-up Instance Segmentation using Deep Higher-Order CRFs

no code implementations8 Sep 2016 Anurag Arnab, Philip H. S. Torr

Traditional Scene Understanding problems such as Object Detection and Semantic Segmentation have made breakthroughs in recent years due to the adoption of deep learning.

Instance Segmentation object-detection +3

Joint Object-Material Category Segmentation from Audio-Visual Cues

no code implementations10 Jan 2016 Anurag Arnab, Michael Sapienza, Stuart Golodetz, Julien Valentin, Ondrej Miksik, Shahram Izadi, Philip Torr

It is not always possible to recognise objects and infer material properties for a scene from visual cues alone, since objects can look visually similar whilst being made of very different materials.

Higher Order Conditional Random Fields in Deep Neural Networks

1 code implementation25 Nov 2015 Anurag Arnab, Sadeep Jayasumana, Shuai Zheng, Philip Torr

Recent deep learning approaches have incorporated CRFs into Convolutional Neural Networks (CNNs), with some even training the CRF end-to-end with the rest of the network.

Semantic Segmentation Superpixels

Cannot find the paper you are looking for? You can Submit a new open access paper.