1 code implementation • CVPR 2024 • Zijia Lu, Bing Shuai, Yanbei Chen, Zhenlin Xu, Davide Modolo
In this paper, we propose a novel concept of path consistency to learn robust object matching without using manual object identity supervision.
no code implementations • 20 Sep 2023 • Haodong Duan, Mingze Xu, Bing Shuai, Davide Modolo, Zhuowen Tu, Joseph Tighe, Alessandro Bergamo
It first models the intra-person skeleton dynamics for each skeleton sequence with graph convolutions, and then uses stacked Transformer encoders to capture person interactions that are important for action recognition in general scenarios.
1 code implementation • ICCV 2023 • Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao
Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines.
no code implementations • ICCV 2023 • Najmeh Sadoughi, Xinyu Li, Avijit Vajpayee, David Fan, Bing Shuai, Hector Santos-Villalobos, Vimal Bhat, Rohith MV
Previous research has studied the task of segmenting cinematic videos into scenes and into narrative acts.
no code implementations • ICCV 2023 • Haodong Duan, Mingze Xu, Bing Shuai, Davide Modolo, Zhuowen Tu, Joseph Tighe, Alessandro Bergamo
It first models the intra-person skeleton dynamics for each skeleton sequence with graph convolutions, and then uses stacked Transformer encoders to capture person interactions that are important for action recognition in the wild.
Ranked #3 on Human Interaction Recognition on NTU RGB+D
1 code implementation • ECCV 2022 • Bing Shuai, Alessandro Bergamo, Uta Buechler, Andrew Berneshawi, Alyssa Boden, Joseph Tighe
This paper presents a new large scale multi-person tracking dataset -- \texttt{PersonPath22}, which is over an order of magnitude larger than currently available high quality multi-object tracking datasets such as MOT17, HiEve, and MOT20 datasets.
Ranked #1 on Multi-Object Tracking on PersonPath22
1 code implementation • 30 Sep 2022 • Jun Fang, Mingze Xu, Hao Chen, Bing Shuai, Zhuowen Tu, Joseph Tighe
In this paper, we provide an in-depth study of Stochastic Backpropagation (SBP) when training deep neural networks for standard image classification and object detection tasks.
no code implementations • 10 Mar 2022 • Daniel McKee, Zitong Zhan, Bing Shuai, Davide Modolo, Joseph Tighe, Svetlana Lazebnik
This work studies feature representations for dense label propagation in video, with a focus on recently proposed methods that learn video correspondence using self-supervised signals such as colorization or temporal cycle consistency.
no code implementations • CVPR 2022 • Bing Shuai, Xinyu Li, Kaustav Kundu, Joseph Tighe
In this work, we explore training such a model by only using person box annotations, thus removing the necessity of manually labeling a training dataset with additional person identity annotation as these are expensive to collect.
no code implementations • 19 Aug 2021 • Daniel McKee, Bing Shuai, Andrew Berneshawi, Manchen Wang, Davide Modolo, Svetlana Lazebnik, Joseph Tighe
Next, to tackle harder tracking cases, we mine hard examples across an unlabeled pool of real videos with a tracker trained on our hallucinated video data.
1 code implementation • CVPR 2021 • Bing Shuai, Andrew Berneshawi, Xinyu Li, Davide Modolo, Joseph Tighe
In this paper, we focus on improving online multi-object tracking (MOT).
no code implementations • ICCV 2021 • Yanyi Zhang, Xinyu Li, Chunhui Liu, Bing Shuai, Yi Zhu, Biagio Brattoli, Hao Chen, Ivan Marsic, Joseph Tighe
We first introduce the vanilla video transformer and show that transformer module is able to perform spatio-temporal modeling from raw pixels, but with heavy memory usage.
Ranked #15 on Action Classification on Charades
no code implementations • 15 Dec 2020 • Xinyu Li, Chunhui Liu, Bing Shuai, Yi Zhu, Hao Chen, Joseph Tighe
In the world of action recognition research, one primary focus has been on how to construct and train networks to model the spatial-temporal volume of an input video.
no code implementations • ECCV 2020 • Xinyu Li, Bing Shuai, Joseph Tighe
Many current activity recognition models use 3D convolutional neural networks (e. g. I3D, I3D-NL) to generate local spatial-temporal features.
no code implementations • 16 Apr 2020 • Bing Shuai, Andrew G. Berneshawi, Davide Modolo, Joseph Tighe
Multi-object tracking systems often consist of a combination of a detector, a short term linker, a re-identification feature extractor and a solver that takes the output from these separate components and makes a final prediction.
no code implementations • 30 Mar 2020 • Davide Modolo, Bing Shuai, Rahul Rama Varior, Joseph Tighe
Our results show that (i) mistakes on background are substantial and they are responsible for 18-49% of the total error, (ii) models do not generalize well to different kinds of backgrounds and perform poorly on completely background images, and (iii) models make many more mistakes than those captured by the standard Mean Absolute Error (MAE) metric, as counting on background compensates considerably for misses on foreground.
1 code implementation • CVPR 2019 • Henghui Ding, Xudong Jiang, Bing Shuai, Ai Qun Liu, Gang Wang
In this way, the proposed network aggregates the context information of a pixel from its semantic-correlated region instead of a predefined fixed region.
Ranked #14 on Semantic Segmentation on COCO-Stuff test
1 code implementation • journal 2019 • Bing Shuai, Henghui Ding, Ting Liu, Gang Wang, Xudong Jiang
Furthermore, we introduce a “dense skip” architecture to retain a rich set of low-level information from the pre-trained CNN, which is essential to improve the low-level parsing performance.
no code implementations • 17 Jan 2019 • Rahul Rama Varior, Bing Shuai, Joseph Tighe, Davide Modolo
In crowd counting datasets, people appear at different scales, depending on their distance from the camera.
no code implementations • 19 Oct 2018 • Jiafeng Xie, Bing Shuai, Jian-Fang Hu, Jingyang Lin, Wei-Shi Zheng
Recently, segmentation neural networks have been significantly improved by demonstrating very promising accuracies on public benchmarks.
1 code implementation • CVPR 2018 • Henghui Ding, Xudong Jiang, Bing Shuai, Ai Qun Liu, Gang Wang
In this paper, we first propose a novel context contrasted local feature that not only leverages the informative context but also spotlights the local information in contrast to the context.
Ranked #17 on Semantic Segmentation on COCO-Stuff test
no code implementations • 13 Mar 2018 • Abrar H. Abdulnabi, Bing Shuai, Zhen Zuo, Lap-Pui Chau, Gang Wang
This paper proposes a new method called Multimodal RNNs for RGB-D scene semantic segmentation.
no code implementations • CVPR 2017 • Abrar H. Abdulnabi, Bing Shuai, Stefan Winkler, Gang Wang
Scene labeling can be seen as a sequence-sequence prediction task (pixels-labels), and it is quite important to leverage relevant context to enhance the performance of pixel classification.
no code implementations • CVPR 2017 • Ping Hu, Bing Shuai, Jun Liu, Gang Wang
Our method drives the network to learn a Level Set function for salient objects so it can output more accurate boundaries and compact saliency.
no code implementations • 28 Nov 2016 • Bing Shuai, Ting Liu, Gang Wang
In addition, dense skip connections are added so that the context network can be effectively optimized.
no code implementations • European Conference on Computer Vision 2016 • Rahul Rama Varior, Bing Shuai, Jiwen Lu, Dong Xu, Gang Wang
Matching pedestrians across multiple camera views known as human re-identification (re-identification) is a challenging problem in visual surveillance.
no code implementations • 15 May 2016 • Bing Wang, Li Wang, Bing Shuai, Zhen Zuo, Ting Liu, Kap Luk Chan, Gang Wang
Then the Siamese CNN and temporally constrained metrics are jointly learned online to construct the appearance-based tracklet affinity models.
no code implementations • 20 Apr 2016 • Bing Shuai, Zhen Zuo, Gang Wang, Bing Wang
The local beliefs of pixels are output by CNN-Ensemble.
no code implementations • 22 Dec 2015 • Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Li Wang, Gang Wang, Jianfei Cai, Tsuhan Chen
In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing.
no code implementations • 13 Sep 2015 • Zhen Zuo, Bing Shuai, Gang Wang, Xiao Liu, Xingxing Wang, Bing Wang
In this manuscript, we integrate CNNs with HRNNs, and develop end-to-end convolutional hierarchical recurrent neural networks (C-HRNNs).
no code implementations • CVPR 2016 • Bing Shuai, Zhen Zuo, Gang Wang, Bing Wang
In image labeling, local representations for image units are usually generated from their surrounding image patches, thus long-range contextual information is not effectively encoded.
Ranked #19 on Semantic Segmentation on COCO-Stuff test
no code implementations • 21 Aug 2015 • Zhen Zuo, Gang Wang, Bing Shuai, Lifan Zhao, Qingxiong Yang
In order to encode the class correlation and class specific information in image representation, we propose a new local feature learning approach named Deep Discriminative and Shareable Feature Learning (DDSFL).
no code implementations • CVPR 2015 • Bing Shuai, Gang Wang, Zhen Zuo, Bing Wang, Lifan Zhao
We adopt Convolutional Neural Networks (CNN) as our parametric model to learn discriminative features and classifiers for local patch classification.