Search Results for author: Joseph Tighe

Found 21 papers, 4 papers with code

Multi-Object Tracking with Hallucinated and Unlabeled Videos

no code implementations19 Aug 2021 Daniel McKee, Bing Shuai, Andrew Berneshawi, Manchen Wang, Davide Modolo, Svetlana Lazebnik, Joseph Tighe

Next, to tackle harder tracking cases, we mine hard examples across an unlabeled pool of real videos with a tracker trained on our hallucinated video data.

Multi-Object Tracking

Single View Physical Distance Estimation using Human Pose

no code implementations ICCV 2021 Xiaohan Fei, Henry Wang, Xiangyu Zeng, Lin Lee Cheong, Meng Wang, Joseph Tighe

We propose a fully automated system that simultaneously estimates the camera intrinsics, the ground plane, and physical distances between people from a single RGB image or video captured by a camera viewing a 3-D scene from a fixed vantage point.

MoDist: Motion Distillation for Self-supervised Video Representation Learning

no code implementations17 Jun 2021 Fanyi Xiao, Joseph Tighe, Davide Modolo

We present MoDist as a novel method to explicitly distill motion information into self-supervised video representations.

Action Detection Action Recognition +2

VidTr: Video Transformer Without Convolutions

no code implementations ICCV 2021 Yanyi Zhang, Xinyu Li, Chunhui Liu, Bing Shuai, Yi Zhu, Biagio Brattoli, Hao Chen, Ivan Marsic, Joseph Tighe

We first introduce the vanilla video transformer and show that transformer module is able to perform spatio-temporal modeling from raw pixels, but with heavy memory usage.

Action Classification Action Recognition

TubeR: Tube-Transformer for Action Detection

no code implementations2 Apr 2021 Jiaojiao Zhao, Xinyu Li, Chunhui Liu, Shuai Bing, Hao Chen, Cees G. M. Snoek, Joseph Tighe

In this paper, we propose TubeR: the first transformer based network for end-to-end action detection, with an encoder and decoder optimized for modeling action tubes with variable lengths and aspect ratios.

Action Detection Video Understanding

Selective Feature Compression for Efficient Activity Recognition Inference

no code implementations ICCV 2021 Chunhui Liu, Xinyu Li, Hao Chen, Davide Modolo, Joseph Tighe

In this work, we focus on improving the inference efficiency of current action recognition backbones on trimmed videos, and illustrate that one action model can also cover then informative region by dropping non-informative features.

Action Recognition

NUTA: Non-uniform Temporal Aggregation for Action Recognition

no code implementations15 Dec 2020 Xinyu Li, Chunhui Liu, Bing Shuai, Yi Zhu, Hao Chen, Joseph Tighe

In the world of action recognition research, one primary focus has been on how to construct and train networks to model the spatial-temporal volume of an input video.

Action Recognition

Exploiting weakly supervised visual patterns to learn from partial annotations

no code implementations NeurIPS 2020 Kaustav Kundu, Joseph Tighe

Ignoring these un-annotated labels result in loss of supervisory signal which reduces the performance of the classification models.

General Classification Panoptic Segmentation

Directional Temporal Modeling for Action Recognition

no code implementations ECCV 2020 Xinyu Li, Bing Shuai, Joseph Tighe

Many current activity recognition models use 3D convolutional neural networks (e. g. I3D, I3D-NL) to generate local spatial-temporal features.

Action Recognition

Multi-Object Tracking with Siamese Track-RCNN

no code implementations16 Apr 2020 Bing Shuai, Andrew G. Berneshawi, Davide Modolo, Joseph Tighe

Multi-object tracking systems often consist of a combination of a detector, a short term linker, a re-identification feature extractor and a solver that takes the output from these separate components and makes a final prediction.

Multi-Object Tracking

Combining detection and tracking for human pose estimation in videos

no code implementations CVPR 2020 Manchen Wang, Joseph Tighe, Davide Modolo

Our approach consists of three components: (i) a Clip Tracking Network that performs body joint detection and tracking simultaneously on small video clips; (ii) a Video Tracking Pipeline that merges the fixed-length tracklets produced by the Clip Tracking Network to arbitrary length tracks; and (iii) a Spatial-Temporal Merging procedure that refines the joint locations based on spatial and temporal smoothing terms.

Pose Estimation Pose Tracking

Understanding the impact of mistakes on background regions in crowd counting

no code implementations30 Mar 2020 Davide Modolo, Bing Shuai, Rahul Rama Varior, Joseph Tighe

Our results show that (i) mistakes on background are substantial and they are responsible for 18-49% of the total error, (ii) models do not generalize well to different kinds of backgrounds and perform poorly on completely background images, and (iii) models make many more mistakes than those captured by the standard Mean Absolute Error (MAE) metric, as counting on background compensates considerably for misses on foreground.

Crowd Counting

Action recognition with spatial-temporal discriminative filter banks

no code implementations ICCV 2019 Brais Martinez, Davide Modolo, Yuanjun Xiong, Joseph Tighe

In this work we focus on how to improve the representation capacity of the network, but rather than altering the backbone, we focus on improving the last layers of the network, where changes have low impact in terms of computational cost.

Ranked #15 on Action Recognition on Something-Something V1 (using extra training data)

Action Classification Action Recognition

Multi-Scale Attention Network for Crowd Counting

no code implementations17 Jan 2019 Rahul Rama Varior, Bing Shuai, Joseph Tighe, Davide Modolo

In crowd counting datasets, people appear at different scales, depending on their distance from the camera.

Crowd Counting Hierarchical structure

Scene Parsing with Object Instances and Occlusion Ordering

no code implementations CVPR 2014 Joseph Tighe, Marc Niethammer, Svetlana Lazebnik

This work proposes a method to interpret a scene by assigning a semantic label at every pixel and inferring the spatial extent of individual object instances together with their occlusion relationships.

Scene Parsing

Finding Things: Image Parsing with Regions and Per-Exemplar Detectors

no code implementations CVPR 2013 Joseph Tighe, Svetlana Lazebnik

This paper presents a system for image parsing, or labeling each pixel in an image with its semantic category, aimed at achieving broad coverage across hundreds of object categories, many of them sparsely sampled.

Cannot find the paper you are looking for? You can Submit a new open access paper.