no code implementations • 23 Jan 2025 • Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, Matt Feiszli
Multi-view 3D reconstruction remains a core challenge in computer vision, particularly in applications requiring accurate and scalable representations across diverse perspectives.
1 code implementation • 3 Jan 2025 • Wenyan Cong, Kevin Wang, Jiahui Lei, Colton Stearns, Yuanhao Cai, Dilin Wang, Rakesh Ranjan, Matt Feiszli, Leonidas Guibas, Zhangyang Wang, Weiyao Wang, Zhiwen Fan
Efficiently reconstructing accurate 3D models from monocular video is a key challenge in computer vision, critical for advancing applications in virtual reality, robotics, and scene understanding.
no code implementations • 9 Oct 2024 • Yunzhi Lin, Yipu Zhao, Fu-Jen Chu, Xingyu Chen, Weiyao Wang, Hao Tang, Patricio A. Vela, Matt Feiszli, Kevin Liang
To address the challenge of short-term object pose tracking in dynamic environments with monocular RGB input, we introduce a large-scale synthetic dataset OmniPose6D, crafted to mirror the diversity of real-world conditions.
no code implementations • 16 Aug 2024 • Hao Tang, Weiyao Wang, Pierre Gleize, Matt Feiszli
Recent data-driven approaches aim to directly output camera poses, either through regressing the 6DoF camera poses or formulating rotation as a probability distribution.
no code implementations • 12 Jul 2024 • Anh Thai, Weiyao Wang, Hao Tang, Stefan Stojanov, Matt Feiszli, James M. Rehg
While substantial progress has been made in 2D object part segmentation, the 3D counterpart has received less attention, in part due to the scarcity of annotated 3D datasets, which are expensive to collect.
no code implementations • CVPR 2024 • Weiyao Wang, Pierre Gleize, Hao Tang, Xingyu Chen, Kevin J Liang, Matt Feiszli
Neural Radiance Fields (NeRF) exhibit remarkable performance for Novel View Synthesis (NVS) given a set of 2D images.
no code implementations • CVPR 2024 • YuHan Shen, Huiyu Wang, Xitong Yang, Matt Feiszli, Ehsan Elhamifar, Lorenzo Torresani, Effrosyni Mavroudi
In contrast we propose ROSA a weakly-supervised pixel-level grounding framework learning alignments between referred objects and segmentation mask proposals.
no code implementations • 29 Aug 2023 • Tim Meinhardt, Matt Feiszli, Yuchen Fan, Laura Leal-Taixe, Rakesh Ranjan
Until recently, the Video Instance Segmentation (VIS) community operated under the common belief that offline methods are generally superior to a frame by frame online processing.
Ranked #10 on
Video Instance Segmentation
on YouTube-VIS 2021
(using extra training data)
2 code implementations • 12 Apr 2023 • Pierre Gleize, Weiyao Wang, Matt Feiszli
Keypoint detection & descriptors are foundational tech-nologies for computer vision tasks like image matching, 3D reconstruction and visual odometry.
no code implementations • 16 Feb 2023 • Raghav Goyal, Effrosyni Mavroudi, Xitong Yang, Sainbayar Sukhbaatar, Leonid Sigal, Matt Feiszli, Lorenzo Torresani, Du Tran
Video understanding tasks take many forms, from action detection to visual query localization and spatio-temporal grounding of sentences.
no code implementations • ICCV 2023 • Pierre Gleize, Weiyao Wang, Matt Feiszli
Keypoint detection & descriptors are foundational technologies for computer vision tasks like image matching, 3D reconstruction and visual odometry.
no code implementations • CVPR 2023 • Xitong Yang, Fu-Jen Chu, Matt Feiszli, Raghav Goyal, Lorenzo Torresani, Du Tran
In this paper, we propose to study these problems in a joint framework for long video understanding.
no code implementations • ICCV 2023 • Peri Akiva, Jing Huang, Kevin J Liang, Rama Kovvuri, Xingyu Chen, Matt Feiszli, Kristin Dana, Tal Hassner
Understanding the visual world from the perspective of humans (egocentric) has been a long-standing challenge in computer vision.
1 code implementation • CVPR 2022 • Weiyao Wang, Matt Feiszli, Heng Wang, Jitendra Malik, Du Tran
From PA we construct a large set of pseudo-ground-truth instance masks; combined with human-annotated instance masks we train GGNs and significantly outperform the SOTA on open-world instance segmentation on various benchmarks including COCO, LVIS, ADE20K, and UVO.
1 code implementation • 1 Apr 2022 • Yuxuan Wang, Difei Gao, Licheng Yu, Stan Weixian Lei, Matt Feiszli, Mike Zheng Shou
In this paper, we introduce a new dataset called Kinetic-GEB+.
Ranked #1 on
Boundary Captioning
on Kinetics-GEB+
1 code implementation • 18 Nov 2021 • Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer
We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing.
no code implementations • ICCV 2021 • Xinyu Gong, Heng Wang, Zheng Shou, Matt Feiszli, Zhangyang Wang, Zhicheng Yan
We design a multivariate search space, including 6 search variables to capture a wide variety of choices in designing two-stream models.
no code implementations • ICCV 2021 • Weiyao Wang, Matt Feiszli, Heng Wang, Du Tran
Current state-of-the-art object detection and segmentation methods work well under the closed-world assumption.
2 code implementations • ICCV 2021 • Mike Zheng Shou, Stan Weixian Lei, Weiyao Wang, Deepti Ghadiyaram, Matt Feiszli
This paper presents a novel task together with a new benchmark for detecting generic, taxonomy-free event boundaries that segment a whole video into chunks.
no code implementations • CVPR 2021 • Zhicheng Yan, Xiaoliang Dai, Peizhao Zhang, Yuandong Tian, Bichen Wu, Matt Feiszli
Furthermore, to search fast in the multi-variate space, we propose a coarse-to-fine strategy by using a factorized distribution at the beginning which can reduce the number of architecture parameters by over an order of magnitude.
1 code implementation • ECCV 2020 • Fan Ma, Linchao Zhu, Yi Yang, Shengxin Zha, Gourab Kundu, Matt Feiszli, Zheng Shou
To obtain the single-frame supervision, the annotators are asked to identify only a single frame within the temporal window of an action.
Ranked #5 on
Weakly Supervised Action Localization
on BEOID
1 code implementation • CVPR 2020 • Krishna Kumar Singh, Dhruv Mahajan, Kristen Grauman, Yong Jae Lee, Matt Feiszli, Deepti Ghadiyaram
Our key idea is to decorrelate feature representations of a category from its co-occurring context.
no code implementations • 19 Jul 2019 • Laura Sevilla-Lara, Shengxin Zha, Zhicheng Yan, Vedanuj Goswami, Matt Feiszli, Lorenzo Torresani
However, in current video datasets it has been observed that action classes can often be recognized without any temporal information from a single frame of video.
no code implementations • 10 Jun 2019 • Linchao Zhu, Laura Sevilla-Lara, Du Tran, Matt Feiszli, Yi Yang, Heng Wang
FASTER aims to leverage the redundancy between neighboring clips and reduce the computational cost by learning to aggregate the predictions from models of different complexities.
Ranked #28 on
Action Recognition
on UCF101
no code implementations • CVPR 2020 • Heng Wang, Du Tran, Lorenzo Torresani, Matt Feiszli
Motion is a salient cue to recognize actions in video.
Ranked #112 on
Action Classification
on Kinetics-400
3 code implementations • CVPR 2020 • Wei-Yao Wang, Du Tran, Matt Feiszli
Consider end-to-end training of a multi-modal vs. a single-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or outperform its single-modal counterpart.
Ranked #1 on
Action Recognition In Videos
on miniSports
3 code implementations • CVPR 2019 • Deepti Ghadiyaram, Matt Feiszli, Du Tran, Xueting Yan, Heng Wang, Dhruv Mahajan
Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?
Ranked #2 on
Egocentric Activity Recognition
on EPIC-KITCHENS-55
(Actions Top-1 (S2) metric)
7 code implementations • ICCV 2019 • Du Tran, Heng Wang, Lorenzo Torresani, Matt Feiszli
It is natural to ask: 1) if group convolution can help to alleviate the high computational cost of video classification networks; 2) what factors matter the most in 3D group convolutional networks; and 3) what are good computation/accuracy trade-offs with 3D group convolutional networks.
Ranked #1 on
Action Recognition
on Sports-1M
no code implementations • ECCV 2018 • Jamie Ray, Heng Wang, Du Tran, YuFei Wang, Matt Feiszli, Lorenzo Torresani, Manohar Paluri
The videos retrieved by the search engines are then veried for correctness by human annotators.
no code implementations • 25 May 2017 • Matt Feiszli
As any generative model induces a probability density on its output domain, we propose studying this density directly.