no code implementations • 25 May 2017 • Matt Feiszli
As any generative model induces a probability density on its output domain, we propose studying this density directly.
no code implementations • ECCV 2018 • Jamie Ray, Heng Wang, Du Tran, YuFei Wang, Matt Feiszli, Lorenzo Torresani, Manohar Paluri
The videos retrieved by the search engines are then veried for correctness by human annotators.
7 code implementations • ICCV 2019 • Du Tran, Heng Wang, Lorenzo Torresani, Matt Feiszli
It is natural to ask: 1) if group convolution can help to alleviate the high computational cost of video classification networks; 2) what factors matter the most in 3D group convolutional networks; and 3) what are good computation/accuracy trade-offs with 3D group convolutional networks.
Ranked #1 on Action Recognition on Sports-1M
3 code implementations • CVPR 2019 • Deepti Ghadiyaram, Matt Feiszli, Du Tran, Xueting Yan, Heng Wang, Dhruv Mahajan
Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?
Ranked #2 on Egocentric Activity Recognition on EPIC-KITCHENS-55 (Actions Top-1 (S2) metric)
3 code implementations • CVPR 2020 • Wei-Yao Wang, Du Tran, Matt Feiszli
Consider end-to-end training of a multi-modal vs. a single-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or outperform its single-modal counterpart.
Ranked #1 on Action Recognition In Videos on miniSports
no code implementations • CVPR 2020 • Heng Wang, Du Tran, Lorenzo Torresani, Matt Feiszli
Motion is a salient cue to recognize actions in video.
Ranked #107 on Action Classification on Kinetics-400
no code implementations • 10 Jun 2019 • Linchao Zhu, Laura Sevilla-Lara, Du Tran, Matt Feiszli, Yi Yang, Heng Wang
FASTER aims to leverage the redundancy between neighboring clips and reduce the computational cost by learning to aggregate the predictions from models of different complexities.
Ranked #26 on Action Recognition on UCF101
no code implementations • 19 Jul 2019 • Laura Sevilla-Lara, Shengxin Zha, Zhicheng Yan, Vedanuj Goswami, Matt Feiszli, Lorenzo Torresani
However, in current video datasets it has been observed that action classes can often be recognized without any temporal information from a single frame of video.
1 code implementation • CVPR 2020 • Krishna Kumar Singh, Dhruv Mahajan, Kristen Grauman, Yong Jae Lee, Matt Feiszli, Deepti Ghadiyaram
Our key idea is to decorrelate feature representations of a category from its co-occurring context.
1 code implementation • ECCV 2020 • Fan Ma, Linchao Zhu, Yi Yang, Shengxin Zha, Gourab Kundu, Matt Feiszli, Zheng Shou
To obtain the single-frame supervision, the annotators are asked to identify only a single frame within the temporal window of an action.
Ranked #5 on Weakly Supervised Action Localization on BEOID
no code implementations • CVPR 2021 • Zhicheng Yan, Xiaoliang Dai, Peizhao Zhang, Yuandong Tian, Bichen Wu, Matt Feiszli
Furthermore, to search fast in the multi-variate space, we propose a coarse-to-fine strategy by using a factorized distribution at the beginning which can reduce the number of architecture parameters by over an order of magnitude.
2 code implementations • ICCV 2021 • Mike Zheng Shou, Stan Weixian Lei, Weiyao Wang, Deepti Ghadiyaram, Matt Feiszli
This paper presents a novel task together with a new benchmark for detecting generic, taxonomy-free event boundaries that segment a whole video into chunks.
no code implementations • ICCV 2021 • Weiyao Wang, Matt Feiszli, Heng Wang, Du Tran
Current state-of-the-art object detection and segmentation methods work well under the closed-world assumption.
no code implementations • ICCV 2021 • Xinyu Gong, Heng Wang, Zheng Shou, Matt Feiszli, Zhangyang Wang, Zhicheng Yan
We design a multivariate search space, including 6 search variables to capture a wide variety of choices in designing two-stream models.
1 code implementation • 18 Nov 2021 • Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer
We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing.
1 code implementation • 1 Apr 2022 • Yuxuan Wang, Difei Gao, Licheng Yu, Stan Weixian Lei, Matt Feiszli, Mike Zheng Shou
In this paper, we introduce a new dataset called Kinetic-GEB+.
Ranked #1 on Boundary Captioning on Kinetics-GEB+
1 code implementation • CVPR 2022 • Weiyao Wang, Matt Feiszli, Heng Wang, Jitendra Malik, Du Tran
From PA we construct a large set of pseudo-ground-truth instance masks; combined with human-annotated instance masks we train GGNs and significantly outperform the SOTA on open-world instance segmentation on various benchmarks including COCO, LVIS, ADE20K, and UVO.
no code implementations • CVPR 2023 • Xitong Yang, Fu-Jen Chu, Matt Feiszli, Raghav Goyal, Lorenzo Torresani, Du Tran
In this paper, we propose to study these problems in a joint framework for long video understanding.
no code implementations • ICCV 2023 • Pierre Gleize, Weiyao Wang, Matt Feiszli
Keypoint detection & descriptors are foundational technologies for computer vision tasks like image matching, 3D reconstruction and visual odometry.
no code implementations • ICCV 2023 • Peri Akiva, Jing Huang, Kevin J Liang, Rama Kovvuri, Xingyu Chen, Matt Feiszli, Kristin Dana, Tal Hassner
Understanding the visual world from the perspective of humans (egocentric) has been a long-standing challenge in computer vision.
no code implementations • 16 Feb 2023 • Raghav Goyal, Effrosyni Mavroudi, Xitong Yang, Sainbayar Sukhbaatar, Leonid Sigal, Matt Feiszli, Lorenzo Torresani, Du Tran
Video understanding tasks take many forms, from action detection to visual query localization and spatio-temporal grounding of sentences.
1 code implementation • 12 Apr 2023 • Pierre Gleize, Weiyao Wang, Matt Feiszli
Keypoint detection & descriptors are foundational tech-nologies for computer vision tasks like image matching, 3D reconstruction and visual odometry.
no code implementations • 29 Aug 2023 • Tim Meinhardt, Matt Feiszli, Yuchen Fan, Laura Leal-Taixe, Rakesh Ranjan
Until recently, the Video Instance Segmentation (VIS) community operated under the common belief that offline methods are generally superior to a frame by frame online processing.
Ranked #5 on Video Instance Segmentation on YouTube-VIS validation (using extra training data)
no code implementations • 17 Jan 2024 • Weiyao Wang, Pierre Gleize, Hao Tang, Xingyu Chen, Kevin J Liang, Matt Feiszli
Neural Radiance Fields (NeRF) exhibit remarkable performance for Novel View Synthesis (NVS) given a set of 2D images.