Search Results for author: Matt Feiszli

Found 21 papers, 10 papers with code

SiLK -- Simple Learned Keypoints

1 code implementation12 Apr 2023 Pierre Gleize, Weiyao Wang, Matt Feiszli

Keypoint detection & descriptors are foundational tech-nologies for computer vision tasks like image matching, 3D reconstruction and visual odometry.

3D Reconstruction Homography Estimation +3

MINOTAUR: Multi-task Video Grounding From Multimodal Queries

no code implementations16 Feb 2023 Raghav Goyal, Effrosyni Mavroudi, Xitong Yang, Sainbayar Sukhbaatar, Leonid Sigal, Matt Feiszli, Lorenzo Torresani, Du Tran

Video understanding tasks take many forms, from action detection to visual query localization and spatio-temporal grounding of sentences.

Action Detection Video Grounding +1

EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset

no code implementations9 Jan 2023 Hao Tang, Kevin Liang, Matt Feiszli, Weiyao Wang

We thus introduce EgoTracks, a new dataset for long-term egocentric visual object tracking.

Visual Object Tracking

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity

1 code implementation CVPR 2022 Weiyao Wang, Matt Feiszli, Heng Wang, Jitendra Malik, Du Tran

From PA we construct a large set of pseudo-ground-truth instance masks; combined with human-annotated instance masks we train GGNs and significantly outperform the SOTA on open-world instance segmentation on various benchmarks including COCO, LVIS, ADE20K, and UVO.

Instance Segmentation Semantic Segmentation

PyTorchVideo: A Deep Learning Library for Video Understanding

1 code implementation18 Nov 2021 Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer

We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing.

Self-Supervised Learning Video Understanding

Generic Event Boundary Detection: A Benchmark for Event Segmentation

2 code implementations ICCV 2021 Mike Zheng Shou, Stan Weixian Lei, Weiyao Wang, Deepti Ghadiyaram, Matt Feiszli

This paper presents a novel task together with a new benchmark for detecting generic, taxonomy-free event boundaries that segment a whole video into chunks.

Action Detection Boundary Detection +3

FP-NAS: Fast Probabilistic Neural Architecture Search

no code implementations CVPR 2021 Zhicheng Yan, Xiaoliang Dai, Peizhao Zhang, Yuandong Tian, Bichen Wu, Matt Feiszli

Furthermore, to search fast in the multi-variate space, we propose a coarse-to-fine strategy by using a factorized distribution at the beginning which can reduce the number of architecture parameters by over an order of magnitude.

Neural Architecture Search

Only Time Can Tell: Discovering Temporal Data for Temporal Modeling

no code implementations19 Jul 2019 Laura Sevilla-Lara, Shengxin Zha, Zhicheng Yan, Vedanuj Goswami, Matt Feiszli, Lorenzo Torresani

However, in current video datasets it has been observed that action classes can often be recognized without any temporal information from a single frame of video.

Benchmarking Motion Estimation +1

FASTER Recurrent Networks for Efficient Video Classification

no code implementations10 Jun 2019 Linchao Zhu, Laura Sevilla-Lara, Du Tran, Matt Feiszli, Yi Yang, Heng Wang

FASTER aims to leverage the redundancy between neighboring clips and reduce the computational cost by learning to aggregate the predictions from models of different complexities.

Action Classification Action Recognition +3

What Makes Training Multi-Modal Classification Networks Hard?

3 code implementations CVPR 2020 Wei-Yao Wang, Du Tran, Matt Feiszli

Consider end-to-end training of a multi-modal vs. a single-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or outperform its single-modal counterpart.

Action Classification Action Recognition In Videos +4

Large-scale weakly-supervised pre-training for video action recognition

3 code implementations CVPR 2019 Deepti Ghadiyaram, Matt Feiszli, Du Tran, Xueting Yan, Heng Wang, Dhruv Mahajan

Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?

Ranked #2 on Egocentric Activity Recognition on EPIC-KITCHENS-55 (Actions Top-1 (S2) metric)

Action Classification Action Recognition +3

Video Classification with Channel-Separated Convolutional Networks

6 code implementations ICCV 2019 Du Tran, Heng Wang, Lorenzo Torresani, Matt Feiszli

It is natural to ask: 1) if group convolution can help to alleviate the high computational cost of video classification networks; 2) what factors matter the most in 3D group convolutional networks; and 3) what are good computation/accuracy trade-offs with 3D group convolutional networks.

Action Classification Action Recognition +3

Latent Geometry and Memorization in Generative Models

no code implementations25 May 2017 Matt Feiszli

As any generative model induces a probability density on its output domain, we propose studying this density directly.


Cannot find the paper you are looking for? You can Submit a new open access paper.