Search Results for author: Du Tran

Found 30 papers, 11 papers with code

Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

no code implementations • NeurIPS 2012 • Du Tran, Junsong Yuan

The mapping between a video and a spatio-temporal action trajectory is learned.

Paper
Add Code

EXMOVES: Classifier-based Features for Scalable Action Recognition

no code implementations • 20 Dec 2013 • Du Tran, Lorenzo Torresani

We show the generality of our approach by building our mid-level descriptors from two different low-level feature representations.

Action Recognition General Classification +1

Paper
Add Code

Learning Spatiotemporal Features with 3D Convolutional Networks

28 code implementations • ICCV 2015 • Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri

We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset.

Ranked #8 on Action Recognition on Sports-1M

Action Recognition In Videos Dynamic Facial Expression Recognition

3,862

Paper
Code

Deep End2End Voxel2Voxel Prediction

no code implementations • 20 Nov 2015 • Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri

Over the last few years deep learning methods have emerged as one of the most prominent approaches for video analysis.

Neural Architecture Search Optical Flow Estimation +3

Paper
Add Code

VideoMCC: a New Benchmark for Video Comprehension

no code implementations • 23 Jun 2016 • Du Tran, Maksim Bolonkin, Manohar Paluri, Lorenzo Torresani

Language has been exploited to sidestep the problem of defining video categories, by formulating video understanding as the task of captioning or description.

Multiple-choice Video Description +1

Paper
Add Code

Transformation-Based Models of Video Sequences

no code implementations • 29 Jan 2017 • Joost van Amersfoort, Anitha Kannan, Marc'Aurelio Ranzato, Arthur Szlam, Du Tran, Soumith Chintala

In this work we propose a simple unsupervised approach for next frame prediction in video.

Paper
Add Code

ConvNet Architecture Search for Spatiotemporal Feature Learning

1 code implementation • 16 Aug 2017 • Du Tran, Jamie Ray, Zheng Shou, Shih-Fu Chang, Manohar Paluri

Learning image representations with ConvNets by pre-training on ImageNet has proven useful across many visual understanding tasks including object detection, semantic segmentation, and image captioning.

Ranked #71 on Action Recognition on HMDB-51

Action Classification Action Recognition +5

Paper
Code

A Closer Look at Spatiotemporal Convolutions for Action Recognition

20 code implementations • CVPR 2018 • Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann Lecun, Manohar Paluri

In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition.

Ranked #3 on Action Recognition on Sports-1M

Action Classification Action Recognition +1

9,283

Paper
Code

Detect-and-Track: Efficient Pose Estimation in Videos

1 code implementation • CVPR 2018 • Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, Du Tran

This paper addresses the problem of estimating and tracking human body keypoints in complex, multi-person video.

Ranked #8 on Pose Tracking on PoseTrack2017 (using extra training data)

Human Detection Keypoint Estimation +4

1,002

Paper
Code

Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization

no code implementations • NeurIPS 2018 • Bruno Korbar, Du Tran, Lorenzo Torresani

There is a natural correlation between the visual and auditive elements of a video.

Ranked #7 on Self-Supervised Audio Classification on ESC-50

Audio Classification Self-Supervised Action Recognition +2

Paper
Add Code

Scenes-Objects-Actions: A Multi-Task, Multi-Label Video Dataset

no code implementations • ECCV 2018 • Jamie Ray, Heng Wang, Du Tran, YuFei Wang, Matt Feiszli, Lorenzo Torresani, Manohar Paluri

The videos retrieved by the search engines are then veried for correctness by human annotators.

Action Recognition Temporal Action Localization +1

Paper
Add Code

Learning Discriminative Motion Features Through Detection

no code implementations • 11 Dec 2018 • Gedas Bertasius, Christoph Feichtenhofer, Du Tran, Jianbo Shi, Lorenzo Torresani

Our network learns to spatially sample features from Frame B in order to maximize pose detection accuracy in Frame A.

Fine-grained Action Recognition Pose Estimation +1

Paper
Add Code

DistInit: Learning Video Representations Without a Single Labeled Video

no code implementations • ICCV 2019 • Rohit Girdhar, Du Tran, Lorenzo Torresani, Deva Ramanan

In this work, we propose an alternative approach to learning video representations that require no semantically labeled videos and instead leverages the years of effort in collecting and labeling large and clean still-image datasets.

Ranked #72 on Action Recognition on HMDB-51 (using extra training data)

Action Recognition Temporal Action Localization +1

Paper
Add Code

Video Classification with Channel-Separated Convolutional Networks

7 code implementations • ICCV 2019 • Du Tran, Heng Wang, Lorenzo Torresani, Matt Feiszli

It is natural to ask: 1) if group convolution can help to alleviate the high computational cost of video classification networks; 2) what factors matter the most in 3D group convolutional networks; and 3) what are good computation/accuracy trade-offs with 3D group convolutional networks.

Ranked #1 on Action Recognition on Sports-1M

Action Classification Action Recognition +3

3,862

Paper
Code

SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition

no code implementations • ICCV 2019 • Bruno Korbar, Du Tran, Lorenzo Torresani

We demonstrate that the computational cost of action recognition on untrimmed videos can be dramatically reduced by invoking recognition only on these most salient clips.

Ranked #1 on Action Recognition on miniSports

Action Recognition Temporal Action Localization

Paper
Add Code

Large-scale weakly-supervised pre-training for video action recognition

3 code implementations • CVPR 2019 • Deepti Ghadiyaram, Matt Feiszli, Du Tran, Xueting Yan, Heng Wang, Dhruv Mahajan

Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?

Ranked #2 on Egocentric Activity Recognition on EPIC-KITCHENS-55 (Actions Top-1 (S2) metric)

Action Classification Action Recognition +3

9,283

Paper
Code

What Makes Training Multi-Modal Classification Networks Hard?

3 code implementations • CVPR 2020 • Wei-Yao Wang, Du Tran, Matt Feiszli

Consider end-to-end training of a multi-modal vs. a single-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or outperform its single-modal counterpart.

Ranked #1 on Action Recognition In Videos on miniSports

Action Classification Action Recognition In Videos +4

1,033

Paper
Code

Learning Temporal Pose Estimation from Sparsely-Labeled Videos

3 code implementations • NeurIPS 2019 • Gedas Bertasius, Christoph Feichtenhofer, Du Tran, Jianbo Shi, Lorenzo Torresani

To reduce the need for dense annotations, we propose a PoseWarper network that leverages training videos with sparse annotations (every k frames) to learn to perform dense temporal pose propagation and estimation.

Ranked #2 on Multi-Person Pose Estimation on PoseTrack2018 (using extra training data)

Multi-Person Pose Estimation Optical Flow Estimation

4,949

Paper
Code

Video Modeling with Correlation Networks

no code implementations • CVPR 2020 • Heng Wang, Du Tran, Lorenzo Torresani, Matt Feiszli

Motion is a salient cue to recognize actions in video.

Ranked #108 on Action Classification on Kinetics-400

Action Classification Action Recognition +1

Paper
Add Code

UniDual: A Unified Model for Image and Video Understanding

no code implementations • 10 Jun 2019 • Yufei Wang, Du Tran, Lorenzo Torresani

It consists of a shared 2D spatial convolution followed by two parallel point-wise convolutional layers, one devoted to images and the other one used for videos.

Multi-Task Learning Video Understanding

Paper
Add Code

FASTER Recurrent Networks for Efficient Video Classification

no code implementations • 10 Jun 2019 • Linchao Zhu, Laura Sevilla-Lara, Du Tran, Matt Feiszli, Yi Yang, Heng Wang

FASTER aims to leverage the redundancy between neighboring clips and reduce the computational cost by learning to aggregate the predictions from models of different complexities.

Ranked #26 on Action Recognition on UCF101

Action Classification Action Recognition +3

Paper
Add Code

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

1 code implementation • NeurIPS 2020 • Humam Alwassel, Dhruv Mahajan, Bruno Korbar, Lorenzo Torresani, Bernard Ghanem, Du Tran

To the best of our knowledge, XDC is the first self-supervised learning method that outperforms large-scale fully-supervised pretraining for action recognition on the same architecture.

Ranked #2 on Self-Supervised Action Recognition on UCF101 (finetuned)

Audio Classification Clustering +5

Paper
Code

FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation

1 code implementation • 15 Dec 2020 • Tarun Kalluri, Deepak Pathak, Manmohan Chandraker, Du Tran

A majority of methods for video frame interpolation compute bidirectional optical flow between adjacent frames of a video, followed by a suitable warping algorithm to generate the output frames.

Ranked #2 on Video Frame Interpolation on GoPro

Action Recognition Motion Magnification +2

404

Paper
Code

Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation

no code implementations • ICCV 2021 • Weiyao Wang, Matt Feiszli, Heng Wang, Du Tran

Current state-of-the-art object detection and segmentation methods work well under the closed-world assumption.

Object object-detection +6

Paper
Add Code

Long-Short Temporal Contrastive Learning of Video Transformers

no code implementations • CVPR 2022 • Jue Wang, Gedas Bertasius, Du Tran, Lorenzo Torresani

Our approach, named Long-Short Temporal Contrastive Learning (LSTCL), enables video transformers to learn an effective clip-level representation by predicting temporal context captured from a longer temporal extent.

Action Recognition Contrastive Learning +1

Paper
Add Code

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity

1 code implementation • CVPR 2022 • Weiyao Wang, Matt Feiszli, Heng Wang, Jitendra Malik, Du Tran

From PA we construct a large set of pseudo-ground-truth instance masks; combined with human-annotated instance masks we train GGNs and significantly outperform the SOTA on open-world instance segmentation on various benchmarks including COCO, LVIS, ADE20K, and UVO.

Open-World Instance Segmentation Semantic Segmentation

111

Paper
Code

Relational Space-Time Query in Long-Form Videos

no code implementations • CVPR 2023 • Xitong Yang, Fu-Jen Chu, Matt Feiszli, Raghav Goyal, Lorenzo Torresani, Du Tran

In this paper, we propose to study these problems in a joint framework for long video understanding.

Video Understanding

Paper
Add Code

MINOTAUR: Multi-task Video Grounding From Multimodal Queries

no code implementations • 16 Feb 2023 • Raghav Goyal, Effrosyni Mavroudi, Xitong Yang, Sainbayar Sukhbaatar, Leonid Sigal, Matt Feiszli, Lorenzo Torresani, Du Tran

Video understanding tasks take many forms, from action detection to visual query localization and spatio-temporal grounding of sentences.

Action Detection Sentence +2

Paper
Add Code

Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision

no code implementations • 9 Mar 2023 • Tarun Kalluri, Weiyao Wang, Heng Wang, Manmohan Chandraker, Lorenzo Torresani, Du Tran

Many top-down architectures for instance segmentation achieve significant success when trained and tested on pre-defined closed-world taxonomy.

Open-World Instance Segmentation Segmentation +1

Paper
Add Code

Learning Space-Time Semantic Correspondences

no code implementations • 16 Jun 2023 • Du Tran, Jitendra Malik

We propose a new task of space-time semantic correspondence prediction in videos.

Imitation Learning Semantic correspondence +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.