Search Results for author: Matt Feiszli

Found 30 papers, 11 papers with code

Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

no code implementations23 Jan 2025 Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, Matt Feiszli

Multi-view 3D reconstruction remains a core challenge in computer vision, particularly in applications requiring accurate and scalable representations across diverse perspectives.

3D Reconstruction Camera Pose Estimation +2

VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment

1 code implementation3 Jan 2025 Wenyan Cong, Kevin Wang, Jiahui Lei, Colton Stearns, Yuanhao Cai, Dilin Wang, Rakesh Ranjan, Matt Feiszli, Leonidas Guibas, Zhangyang Wang, Weiyao Wang, Zhiwen Fan

Efficiently reconstructing accurate 3D models from monocular video is a key challenge in computer vision, critical for advancing applications in virtual reality, robotics, and scene understanding.

Computational Efficiency Scene Understanding

OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB

no code implementations9 Oct 2024 Yunzhi Lin, Yipu Zhao, Fu-Jen Chu, Xingyu Chen, Weiyao Wang, Hao Tang, Patricio A. Vela, Matt Feiszli, Kevin Liang

To address the challenge of short-term object pose tracking in dynamic environments with monocular RGB input, we introduce a large-scale synthetic dataset OmniPose6D, crafted to mirror the diversity of real-world conditions.

Benchmarking Diversity +2

ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation

no code implementations16 Aug 2024 Hao Tang, Weiyao Wang, Pierre Gleize, Matt Feiszli

Recent data-driven approaches aim to directly output camera poses, either through regressing the 6DoF camera poses or formulating rotation as a probability distribution.

Camera Pose Estimation Pose Estimation

3x2: 3D Object Part Segmentation by 2D Semantic Correspondences

no code implementations12 Jul 2024 Anh Thai, Weiyao Wang, Hao Tang, Stefan Stojanov, Matt Feiszli, James M. Rehg

While substantial progress has been made in 2D object part segmentation, the 3D counterpart has received less attention, in part due to the scarcity of annotated 3D datasets, which are expensive to collect.

Object Segmentation

Learning to Segment Referred Objects from Narrated Egocentric Videos

no code implementations CVPR 2024 YuHan Shen, Huiyu Wang, Xitong Yang, Matt Feiszli, Ehsan Elhamifar, Lorenzo Torresani, Effrosyni Mavroudi

In contrast we propose ROSA a weakly-supervised pixel-level grounding framework learning alignments between referred objects and segmentation mask proposals.

Object Segmentation +3

NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation

no code implementations29 Aug 2023 Tim Meinhardt, Matt Feiszli, Yuchen Fan, Laura Leal-Taixe, Rakesh Ranjan

Until recently, the Video Instance Segmentation (VIS) community operated under the common belief that offline methods are generally superior to a frame by frame online processing.

Ranked #10 on Video Instance Segmentation on YouTube-VIS 2021 (using extra training data)

Instance Segmentation Segmentation +2

SiLK -- Simple Learned Keypoints

2 code implementations12 Apr 2023 Pierre Gleize, Weiyao Wang, Matt Feiszli

Keypoint detection & descriptors are foundational tech-nologies for computer vision tasks like image matching, 3D reconstruction and visual odometry.

3D Reconstruction Camera Pose Estimation +5

MINOTAUR: Multi-task Video Grounding From Multimodal Queries

no code implementations16 Feb 2023 Raghav Goyal, Effrosyni Mavroudi, Xitong Yang, Sainbayar Sukhbaatar, Leonid Sigal, Matt Feiszli, Lorenzo Torresani, Du Tran

Video understanding tasks take many forms, from action detection to visual query localization and spatio-temporal grounding of sentences.

Action Detection Sentence +2

SiLK: Simple Learned Keypoints

no code implementations ICCV 2023 Pierre Gleize, Weiyao Wang, Matt Feiszli

Keypoint detection & descriptors are foundational technologies for computer vision tasks like image matching, 3D reconstruction and visual odometry.

3D Reconstruction Camera Pose Estimation +5

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity

1 code implementation CVPR 2022 Weiyao Wang, Matt Feiszli, Heng Wang, Jitendra Malik, Du Tran

From PA we construct a large set of pseudo-ground-truth instance masks; combined with human-annotated instance masks we train GGNs and significantly outperform the SOTA on open-world instance segmentation on various benchmarks including COCO, LVIS, ADE20K, and UVO.

Diversity Open-World Instance Segmentation +1

PyTorchVideo: A Deep Learning Library for Video Understanding

1 code implementation18 Nov 2021 Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer

We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing.

Deep Learning Self-Supervised Learning +1

Generic Event Boundary Detection: A Benchmark for Event Segmentation

2 code implementations ICCV 2021 Mike Zheng Shou, Stan Weixian Lei, Weiyao Wang, Deepti Ghadiyaram, Matt Feiszli

This paper presents a novel task together with a new benchmark for detecting generic, taxonomy-free event boundaries that segment a whole video into chunks.

Action Detection Boundary Detection +3

FP-NAS: Fast Probabilistic Neural Architecture Search

no code implementations CVPR 2021 Zhicheng Yan, Xiaoliang Dai, Peizhao Zhang, Yuandong Tian, Bichen Wu, Matt Feiszli

Furthermore, to search fast in the multi-variate space, we propose a coarse-to-fine strategy by using a factorized distribution at the beginning which can reduce the number of architecture parameters by over an order of magnitude.

Neural Architecture Search

Only Time Can Tell: Discovering Temporal Data for Temporal Modeling

no code implementations19 Jul 2019 Laura Sevilla-Lara, Shengxin Zha, Zhicheng Yan, Vedanuj Goswami, Matt Feiszli, Lorenzo Torresani

However, in current video datasets it has been observed that action classes can often be recognized without any temporal information from a single frame of video.

Benchmarking Motion Estimation +1

FASTER Recurrent Networks for Efficient Video Classification

no code implementations10 Jun 2019 Linchao Zhu, Laura Sevilla-Lara, Du Tran, Matt Feiszli, Yi Yang, Heng Wang

FASTER aims to leverage the redundancy between neighboring clips and reduce the computational cost by learning to aggregate the predictions from models of different complexities.

Action Classification Action Recognition +3

What Makes Training Multi-Modal Classification Networks Hard?

3 code implementations CVPR 2020 Wei-Yao Wang, Du Tran, Matt Feiszli

Consider end-to-end training of a multi-modal vs. a single-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or outperform its single-modal counterpart.

Action Classification Action Recognition In Videos +4

Large-scale weakly-supervised pre-training for video action recognition

3 code implementations CVPR 2019 Deepti Ghadiyaram, Matt Feiszli, Du Tran, Xueting Yan, Heng Wang, Dhruv Mahajan

Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?

Ranked #2 on Egocentric Activity Recognition on EPIC-KITCHENS-55 (Actions Top-1 (S2) metric)

Action Classification Action Recognition +3

Video Classification with Channel-Separated Convolutional Networks

7 code implementations ICCV 2019 Du Tran, Heng Wang, Lorenzo Torresani, Matt Feiszli

It is natural to ask: 1) if group convolution can help to alleviate the high computational cost of video classification networks; 2) what factors matter the most in 3D group convolutional networks; and 3) what are good computation/accuracy trade-offs with 3D group convolutional networks.

Action Classification Action Recognition +3

Latent Geometry and Memorization in Generative Models

no code implementations25 May 2017 Matt Feiszli

As any generative model induces a probability density on its output domain, we propose studying this density directly.

Memorization

Cannot find the paper you are looking for? You can Submit a new open access paper.