Search Results for author: Joseph Tighe

Found 34 papers, 8 papers with code

Finding Things: Image Parsing with Regions and Per-Exemplar Detectors

no code implementations • CVPR 2013 • Joseph Tighe, Svetlana Lazebnik

This paper presents a system for image parsing, or labeling each pixel in an image with its semantic category, aimed at achieving broad coverage across hundreds of object categories, many of them sparsely sampled.

Object

Paper
Add Code

Scene Parsing with Object Instances and Occlusion Ordering

no code implementations • CVPR 2014 • Joseph Tighe, Marc Niethammer, Svetlana Lazebnik

This work proposes a method to interpret a scene by assigning a semantic label at every pixel and inferring the spatial extent of individual object instances together with their occlusion relationships.

Object Scene Parsing +1

Paper
Add Code

Multi-Scale Attention Network for Crowd Counting

no code implementations • 17 Jan 2019 • Rahul Rama Varior, Bing Shuai, Joseph Tighe, Davide Modolo

In crowd counting datasets, people appear at different scales, depending on their distance from the camera.

Crowd Counting

Paper
Add Code

Action recognition with spatial-temporal discriminative filter banks

no code implementations • ICCV 2019 • Brais Martinez, Davide Modolo, Yuanjun Xiong, Joseph Tighe

In this work we focus on how to improve the representation capacity of the network, but rather than altering the backbone, we focus on improving the last layers of the network, where changes have low impact in terms of computational cost.

Ranked #36 on Action Recognition on Something-Something V1 (using extra training data)

Action Classification Action Recognition +1

Paper
Add Code

Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications

1 code implementation • CVPR 2020 • Biagio Brattoli, Joseph Tighe, Fedor Zhdanov, Pietro Perona, Krzysztof Chalupka

Our training procedure builds on insights from recent video classification literature and uses a trainable 3D CNN to learn the visual features.

Ranked #4 on Zero-Shot Action Recognition on ActivityNet

Benchmarking General Classification +3

143

Paper
Code

Combining detection and tracking for human pose estimation in videos

no code implementations • CVPR 2020 • Manchen Wang, Joseph Tighe, Davide Modolo

Our approach consists of three components: (i) a Clip Tracking Network that performs body joint detection and tracking simultaneously on small video clips; (ii) a Video Tracking Pipeline that merges the fixed-length tracklets produced by the Clip Tracking Network to arbitrary length tracks; and (iii) a Spatial-Temporal Merging procedure that refines the joint locations based on spatial and temporal smoothing terms.

Ranked #1 on Pose Tracking on PoseTrack2017

Pose Estimation Pose Tracking

Paper
Add Code

Understanding the impact of mistakes on background regions in crowd counting

no code implementations • 30 Mar 2020 • Davide Modolo, Bing Shuai, Rahul Rama Varior, Joseph Tighe

Our results show that (i) mistakes on background are substantial and they are responsible for 18-49% of the total error, (ii) models do not generalize well to different kinds of backgrounds and perform poorly on completely background images, and (iii) models make many more mistakes than those captured by the standard Mean Absolute Error (MAE) metric, as counting on background compensates considerably for misses on foreground.

Crowd Counting

Paper
Add Code

Multi-Object Tracking with Siamese Track-RCNN

no code implementations • 16 Apr 2020 • Bing Shuai, Andrew G. Berneshawi, Davide Modolo, Joseph Tighe

Multi-object tracking systems often consist of a combination of a detector, a short term linker, a re-identification feature extractor and a solver that takes the output from these separate components and makes a final prediction.

Multi-Object Tracking Object

Paper
Add Code

Directional Temporal Modeling for Action Recognition

no code implementations • ECCV 2020 • Xinyu Li, Bing Shuai, Joseph Tighe

Many current activity recognition models use 3D convolutional neural networks (e. g. I3D, I3D-NL) to generate local spatial-temporal features.

Action Recognition

Paper
Add Code

Exploiting weakly supervised visual patterns to learn from partial annotations

no code implementations • NeurIPS 2020 • Kaustav Kundu, Joseph Tighe

Ignoring these un-annotated labels result in loss of supervisory signal which reduces the performance of the classification models.

General Classification Panoptic Segmentation

Paper
Add Code

A Comprehensive Study of Deep Video Action Recognition

1 code implementation • 11 Dec 2020 • Yi Zhu, Xinyu Li, Chunhui Liu, Mohammadreza Zolfaghari, Yuanjun Xiong, Chongruo wu, Zhi Zhang, Joseph Tighe, R. Manmatha, Mu Li

Video action recognition is one of the representative tasks for video understanding.

Action Recognition Temporal Action Localization +1

553

Paper
Code

NUTA: Non-uniform Temporal Aggregation for Action Recognition

no code implementations • 15 Dec 2020 • Xinyu Li, Chunhui Liu, Bing Shuai, Yi Zhu, Hao Chen, Joseph Tighe

In the world of action recognition research, one primary focus has been on how to construct and train networks to model the spatial-temporal volume of an input video.

Action Recognition

Paper
Add Code

Selective Feature Compression for Efficient Activity Recognition Inference

no code implementations • ICCV 2021 • Chunhui Liu, Xinyu Li, Hao Chen, Davide Modolo, Joseph Tighe

In this work, we focus on improving the inference efficiency of current action recognition backbones on trimmed videos, and illustrate that one action model can also cover then informative region by dropping non-informative features.

Action Recognition Feature Compression

Paper
Add Code

TubeR: Tubelet Transformer for Video Action Detection

1 code implementation • CVPR 2022 • Jiaojiao Zhao, Yanyi Zhang, Xinyu Li, Hao Chen, Shuai Bing, Mingze Xu, Chunhui Liu, Kaustav Kundu, Yuanjun Xiong, Davide Modolo, Ivan Marsic, Cees G. M. Snoek, Joseph Tighe

We propose TubeR: a simple solution for spatio-temporal video action detection.

Action Classification Action Detection +2

Paper
Code

VidTr: Video Transformer Without Convolutions

no code implementations • ICCV 2021 • Yanyi Zhang, Xinyu Li, Chunhui Liu, Bing Shuai, Yi Zhu, Biagio Brattoli, Hao Chen, Ivan Marsic, Joseph Tighe

We first introduce the vanilla video transformer and show that transformer module is able to perform spatio-temporal modeling from raw pixels, but with heavy memory usage.

Ranked #15 on Action Classification on Charades

Action Classification Action Recognition +1

Paper
Add Code

SiamMOT: Siamese Multi-Object Tracking

1 code implementation • CVPR 2021 • Bing Shuai, Andrew Berneshawi, Xinyu Li, Davide Modolo, Joseph Tighe

In this paper, we focus on improving online multi-object tracking (MOT).

Multi-Object Tracking Object +1

474

Paper
Code

SSCAP: Self-supervised Co-occurrence Action Parsing for Unsupervised Temporal Action Segmentation

no code implementations • 29 May 2021 • Zhe Wang, Hao Chen, Xinyu Li, Chunhui Liu, Yuanjun Xiong, Joseph Tighe, Charless Fowlkes

However, it is quite expensive to annotate every frame in a large corpus of videos to construct a comprehensive supervised training dataset.

Action Parsing Action Segmentation +2

Paper
Add Code

MaCLR: Motion-aware Contrastive Learning of Representations for Videos

1 code implementation • 17 Jun 2021 • Fanyi Xiao, Joseph Tighe, Davide Modolo

We present MaCLR, a novel method to explicitly perform cross-modal self-supervised video representations learning from visual and motion modalities.

Action Detection Action Recognition +2

Paper
Code

Single View Physical Distance Estimation using Human Pose

no code implementations • ICCV 2021 • Xiaohan Fei, Henry Wang, Xiangyu Zeng, Lin Lee Cheong, Meng Wang, Joseph Tighe

We propose a fully automated system that simultaneously estimates the camera intrinsics, the ground plane, and physical distances between people from a single RGB image or video captured by a camera viewing a 3-D scene from a fixed vantage point.

Camera Calibration

Paper
Add Code

Video Contrastive Learning with Global Context

1 code implementation • 5 Aug 2021 • Haofei Kuang, Yi Zhu, Zhi Zhang, Xinyu Li, Joseph Tighe, Sören Schwertfeger, Cyrill Stachniss, Mu Li

Our formulation is able to capture global context in a video, thus robust to temporal content change.

Action Classification Action Localization +4

152

Paper
Code

Multi-Object Tracking with Hallucinated and Unlabeled Videos

no code implementations • 19 Aug 2021 • Daniel McKee, Bing Shuai, Andrew Berneshawi, Manchen Wang, Davide Modolo, Svetlana Lazebnik, Joseph Tighe

Next, to tackle harder tracking cases, we mine hard examples across an unlabeled pool of real videos with a tracker trained on our hallucinated video data.

Multi-Object Tracking Object

Paper
Add Code

Id-Free Person Similarity Learning

no code implementations • CVPR 2022 • Bing Shuai, Xinyu Li, Kaustav Kundu, Joseph Tighe

In this work, we explore training such a model by only using person box annotations, thus removing the necessity of manually labeling a training dataset with additional person identity annotation as these are expensive to collect.

Contrastive Learning Human Detection +2

Paper
Add Code

Transfer of Representations to Video Label Propagation: Implementation Factors Matter

no code implementations • 10 Mar 2022 • Daniel McKee, Zitong Zhan, Bing Shuai, Davide Modolo, Joseph Tighe, Svetlana Lazebnik

This work studies feature representations for dense label propagation in video, with a focus on recently proposed methods that learn video correspondence using self-supervised signals such as colorization or temporal cycle consistency.

Colorization

Paper
Add Code

What to look at and where: Semantic and Spatial Refined Transformer for detecting human-object interactions

no code implementations • CVPR 2022 • A S M Iftekhar, Hao Chen, Kaustav Kundu, Xinyu Li, Joseph Tighe, Davide Modolo

We propose a novel one-stage Transformer-based semantic and spatial refined transformer (SSRT) to solve the Human-Object Interaction detection task, which requires to localize humans and objects, and predicts their interactions.

Human-Object Interaction Detection Object

Paper
Add Code

Hierarchical Self-supervised Representation Learning for Movie Understanding

no code implementations • CVPR 2022 • Fanyi Xiao, Kaustav Kundu, Joseph Tighe, Davide Modolo

Most self-supervised video representation learning approaches focus on action recognition.

Action Recognition Contrastive Learning +1

Paper
Add Code

SCVRL: Shuffled Contrastive Video Representation Learning

no code implementations • 24 May 2022 • Michael Dorkenwald, Fanyi Xiao, Biagio Brattoli, Joseph Tighe, Davide Modolo

We propose SCVRL, a novel contrastive-based framework for self-supervised learning for videos.

Contrastive Learning Representation Learning +1

Paper
Add Code

An In-depth Study of Stochastic Backpropagation

1 code implementation • 30 Sep 2022 • Jun Fang, Mingze Xu, Hao Chen, Bing Shuai, Zhuowen Tu, Joseph Tighe

In this paper, we provide an in-depth study of Stochastic Backpropagation (SBP) when training deep neural networks for standard image classification and object detection tasks.

Image Classification object-detection +1

Paper
Code

Large Scale Real-World Multi-Person Tracking

1 code implementation • ECCV 2022 • Bing Shuai, Alessandro Bergamo, Uta Buechler, Andrew Berneshawi, Alyssa Boden, Joseph Tighe

This paper presents a new large scale multi-person tracking dataset -- \texttt{PersonPath22}, which is over an order of magnitude larger than currently available high quality multi-object tracking datasets such as MOT17, HiEve, and MOT20 datasets.

Ranked #1 on Multi-Object Tracking on PersonPath22

Multi-Object Tracking

Paper
Code

SkeleTR: Towards Skeleton-based Action Recognition in the Wild

no code implementations • ICCV 2023 • Haodong Duan, Mingze Xu, Bing Shuai, Davide Modolo, Zhuowen Tu, Joseph Tighe, Alessandro Bergamo

It first models the intra-person skeleton dynamics for each skeleton sequence with graph convolutions, and then uses stacked Transformer encoders to capture person interactions that are important for action recognition in the wild.

Ranked #2 on Human Interaction Recognition on NTU RGB+D

Action Classification Action Recognition +3

Paper
Add Code

Learning for Transductive Threshold Calibration in Open-World Recognition

no code implementations • 19 May 2023 • Qin Zhang, Dongsheng An, Tianjun Xiao, Tong He, Qingming Tang, Ying Nian Wu, Joseph Tighe, Yifan Xing, Stefano Soatto

In deep metric learning for visual recognition, the calibration of distance thresholds is crucial for achieving desired model performance in the true positive rates (TPR) or true negative rates (TNR).

Metric Learning Open Set Learning

Paper
Add Code

ScaleDet: A Scalable Multi-Dataset Object Detector

no code implementations • CVPR 2023 • Yanbei Chen, Manchen Wang, Abhay Mittal, Zhenlin Xu, Paolo Favaro, Joseph Tighe, Davide Modolo

Our results show that ScaleDet achieves compelling strong model performance with an mAP of 50. 7 on LVIS, 58. 8 on COCO, 46. 8 on Objects365, 76. 2 on OpenImages, and 71. 8 on ODinW, surpassing state-of-the-art detectors with the same backbone.

Ranked #1 on Object Detection on OpenImages-v6 (using extra training data)

Object object-detection +1

Paper
Add Code

Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity

no code implementations • 28 Jun 2023 • Zhenlin Xu, Yi Zhu, Tiffany Deng, Abhay Mittal, Yanbei Chen, Manchen Wang, Paolo Favaro, Joseph Tighe, Davide Modolo

This paper introduces innovative benchmarks to evaluate Vision-Language Models (VLMs) in real-world zero-shot recognition tasks, focusing on the granularity and specificity of prompting text.

Benchmarking Specificity +1

Paper
Add Code

SkeleTR: Towrads Skeleton-based Action Recognition in the Wild

no code implementations • 20 Sep 2023 • Haodong Duan, Mingze Xu, Bing Shuai, Davide Modolo, Zhuowen Tu, Joseph Tighe, Alessandro Bergamo

Action Classification Action Recognition +2

Paper
Add Code

Early Action Recognition with Action Prototypes

no code implementations • 11 Dec 2023 • Guglielmo Camporese, Alessandro Bergamo, Xunyu Lin, Joseph Tighe, Davide Modolo

For example, on early recognition observing only the first 10% of each video, our method improves the SOTA by +2. 23 Top-1 accuracy on Something-Something-v2, +3. 55 on UCF-101, +3. 68 on SSsub21, and +5. 03 on EPIC-Kitchens-55, where prior work used either multi-modal inputs (e. g. optical-flow) or batched inference.

Action Recognition Optical Flow Estimation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.