Search Results for author: Christoph Feichtenhofer

Found 28 papers, 20 papers with code

VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding

no code implementations20 May 2021 Hu Xu, Gargi Ghosh, Po-Yao Huang, Prahal Arora, Masoumeh Aminzadeh, Christoph Feichtenhofer, Florian Metze, Luke Zettlemoyer

We present a simplified, task-agnostic multi-modal pre-training approach that can accept either video or text input, or both for a variety of end tasks.

Language Modelling Video Understanding

Multiscale Vision Transformers

2 code implementations22 Apr 2021 Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer

We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.

Action Classification Action Recognition +2

Multiview Pseudo-Labeling for Semi-supervised Learning from Video

no code implementations1 Apr 2021 Bo Xiong, Haoqi Fan, Kristen Grauman, Christoph Feichtenhofer

We present a multiview pseudo-labeling approach to video learning, a novel framework that uses complementary views in the form of appearance and motion information for semi-supervised learning in video.

Representation Learning Video Recognition

TrackFormer: Multi-Object Tracking with Transformers

1 code implementation7 Jan 2021 Tim Meinhardt, Alexander Kirillov, Laura Leal-Taixe, Christoph Feichtenhofer

The challenging task of multi-object tracking (MOT) requires simultaneous reasoning about track initialization, identity, and spatiotemporal trajectories.

Ranked #3 on Multi-Object Tracking on MOTS20 (using extra training data)

Multi-Object Tracking Video Understanding

X3D: Expanding Architectures for Efficient Video Recognition

7 code implementations CVPR 2020 Christoph Feichtenhofer

This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth.

Action Classification Feature Selection +4

Feature Pyramid Grids

1 code implementation7 Apr 2020 Kai Chen, Yuhang Cao, Chen Change Loy, Dahua Lin, Christoph Feichtenhofer

Feature pyramid networks have been widely adopted in the object detection literature to improve feature representations for better handling of variations in scale.

Neural Architecture Search Object Detection +1

EGO-TOPO: Environment Affordances from Egocentric Video

1 code implementation CVPR 2020 Tushar Nagarajan, Yanghao Li, Christoph Feichtenhofer, Kristen Grauman

We introduce a model for environment affordances that is learned directly from egocentric video.

A Multigrid Method for Efficiently Training Video Models

3 code implementations CVPR 2020 Chao-yuan Wu, Ross Girshick, Kaiming He, Christoph Feichtenhofer, Philipp Krähenbühl

We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).

Action Detection Action Recognition +1

Learning Temporal Pose Estimation from Sparsely-Labeled Videos

2 code implementations NeurIPS 2019 Gedas Bertasius, Christoph Feichtenhofer, Du Tran, Jianbo Shi, Lorenzo Torresani

To reduce the need for dense annotations, we propose a PoseWarper network that leverages training videos with sparse annotations (every k frames) to learn to perform dense temporal pose propagation and estimation.

Ranked #2 on Multi-Person Pose Estimation on PoseTrack2017 (using extra training data)

Multi-Person Pose Estimation Optical Flow Estimation

Grounded Human-Object Interaction Hotspots from Video (Extended Abstract)

no code implementations3 Jun 2019 Tushar Nagarajan, Christoph Feichtenhofer, Kristen Grauman

Learning how to interact with objects is an important step towards embodied visual intelligence, but existing techniques suffer from heavy supervision or sensing requirements.

Human-Object Interaction Detection Semantic Segmentation

Modeling Human Motion with Quaternion-based Neural Networks

1 code implementation21 Jan 2019 Dario Pavllo, Christoph Feichtenhofer, Michael Auli, David Grangier

Previous work on predicting or generating 3D human pose sequences regresses either joint rotations or joint positions.

Learning Discriminative Motion Features Through Detection

no code implementations11 Dec 2018 Gedas Bertasius, Christoph Feichtenhofer, Du Tran, Jianbo Shi, Lorenzo Torresani

Our network learns to spatially sample features from Frame B in order to maximize pose detection accuracy in Frame A.

Fine-grained Action Recognition Pose Estimation

Grounded Human-Object Interaction Hotspots from Video

1 code implementation ICCV 2019 Tushar Nagarajan, Christoph Feichtenhofer, Kristen Grauman

Learning how to interact with objects is an important step towards embodied visual intelligence, but existing techniques suffer from heavy supervision or sensing requirements.

Human-Object Interaction Detection Object Recognition +1

What have we learned from deep representations for action recognition?

no code implementations CVPR 2018 Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes, Andrew Zisserman

In this paper, we shed light on deep spatiotemporal representations by visualizing what two-stream models have learned in order to recognize actions in video.

Action Recognition

Detect to Track and Track to Detect

3 code implementations ICCV 2017 Christoph Feichtenhofer, Axel Pinz, Andrew Zisserman

Recent approaches for high accuracy detection and tracking of object categories in video consist of complex multistage solutions that become more cumbersome each year.

Object Detection

Temporal Residual Networks for Dynamic Scene Recognition

1 code implementation CVPR 2017 Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes

Finally, our temporal ResNet boosts recognition performance and establishes a new state-of-the-art on dynamic scene recognition, as well as on the complementary task of action recognition.

Action Recognition Scene Recognition

Spatiotemporal Multiplier Networks for Video Action Recognition

1 code implementation CVPR 2017 Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes

This paper presents a general ConvNet architecture for video action recognition based on multiplicative interactions of spacetime features.

Action Recognition General Classification

Convolutional Two-Stream Network Fusion for Video Action Recognition

1 code implementation CVPR 2016 Christoph Feichtenhofer, Axel Pinz, Andrew Zisserman

Recent applications of Convolutional Neural Networks (ConvNets) for human action recognition in videos have proposed different solutions for incorporating the appearance and motion information.

Ranked #49 on Action Recognition on HMDB-51 (using extra training data)

Action Recognition Action Recognition In Videos +1

Dynamically Encoded Actions Based on Spacetime Saliency

no code implementations CVPR 2015 Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes

By using the resulting definition of saliency during feature pooling we show that action recognition performance achieves state-of-the-art levels on three widely considered action recognition datasets.

Action Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.