no code implementations • 27 Sep 2022 • Rahul Duggal, Hao Zhou, Shuo Yang, Jun Fang, Yuanjun Xiong, Wei Xia
With the shift towards on-device deep learning, ensuring a consistent behavior of an AI service across diverse compute platforms becomes tremendously important.
1 code implementation • 20 Sep 2022 • Haodong Duan, Yue Zhao, Kai Chen, Yuanjun Xiong, Dahua Lin
Deep learning models have achieved excellent recognition results on large-scale video benchmarks.
no code implementations • 12 May 2022 • Yue Zhao, Yantao Shen, Yuanjun Xiong, Shuo Yang, Wei Xia, Zhuowen Tu, Bernt Schiele, Stefano Soatto
We present a method to train a classification system that achieves paragon performance in both error rate and NFR, at the inference cost of a single model.
1 code implementation • CVPR 2022 • Feng Cheng, Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Li, Wei Xia
We propose a memory efficient method, named Stochastic Backpropagation (SBP), for training deep neural networks on videos.
no code implementations • CVPR 2022 • Jiarui Cai, Mingze Xu, Wei Li, Yuanjun Xiong, Wei Xia, Zhuowen Tu, Stefano Soatto
We propose an online tracking algorithm that performs the object detection and data association under a common framework, capable of linking objects after a long time span.
no code implementations • 6 Jan 2022 • Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan, Stefano Soatto
We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features whereby data points that are mapped to nearby representations by the source (teacher) model are also mapped to neighbors by the target (student) model.
1 code implementation • NeurIPS 2021 • Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Xia, Zhuowen Tu, Stefano Soatto
We present Long Short-term TRansformer (LSTR), a temporal modeling algorithm for online action detection, which employs a long- and short-term memory mechanism to model prolonged sequence data.
Ranked #2 on
Online Action Detection
on TVSeries
no code implementations • 6 Jul 2021 • Wei Li, Yuanjun Xiong, Shuo Yang, Mingze Xu, Yongxin Wang, Wei Xia
We design a new instance-to-track matching objective to learn appearance embedding that compares a candidate detection to the embedding of the tracks persisted in the tracker.
2 code implementations • ICCV 2021 • Yifan Xing, Tong He, Tianjun Xiao, Yongxin Wang, Yuanjun Xiong, Wei Xia, David Wipf, Zheng Zhang, Stefano Soatto
Our hierarchical GNN uses a novel approach to merge connected components predicted at each level of the hierarchy to form a new graph at the next level.
no code implementations • 8 Jun 2021 • Siqi Deng, Yuanjun Xiong, Meng Wang, Wei Xia, Stefano Soatto
The common implementation of face recognition systems as a cascade of a detection stage and a recognition or verification stage can cause problems beyond failures of the detector.
no code implementations • 29 May 2021 • Zhe Wang, Hao Chen, Xinyu Li, Chunhui Liu, Yuanjun Xiong, Joseph Tighe, Charless Fowlkes
However, it is quite expensive to annotate every frame in a large corpus of videos to construct a comprehensive supervised training dataset.
no code implementations • CVPR 2021 • Rahul Duggal, Hao Zhou, Shuo Yang, Yuanjun Xiong, Wei Xia, Zhuowen Tu, Stefano Soatto
Existing systems use the same embedding model to compute representations (embeddings) for the query and gallery images.
no code implementations • ACL 2021 • Yuqing Xie, Yi-An Lai, Yuanjun Xiong, Yi Zhang, Stefano Soatto
Behavior of deep neural networks can be inconsistent between different versions.
no code implementations • CVPR 2022 • Jiaojiao Zhao, Yanyi Zhang, Xinyu Li, Hao Chen, Shuai Bing, Mingze Xu, Chunhui Liu, Kaustav Kundu, Yuanjun Xiong, Davide Modolo, Ivan Marsic, Cees G. M. Snoek, Joseph Tighe
We propose TubeR: a simple solution for spatio-temporal video action detection.
1 code implementation • 25 Feb 2021 • Yuanhan Zhang, Zhenfei Yin, Jing Shao, Ziwei Liu, Shuo Yang, Yuanjun Xiong, Wei Xia, Yan Xu, Man Luo, Jian Liu, Jianshu Li, Zhijun Chen, Mingyu Guo, Hui Li, Junfu Liu, Pengfei Gao, Tianqi Hong, Hao Han, Shijie Liu, Xinhua Chen, Di Qiu, Cheng Zhen, Dashuang Liang, Yufeng Jin, Zhanlong Hao
It is the largest face anti-spoofing dataset in terms of the numbers of the data and the subjects.
2 code implementations • 18 Feb 2021 • Liming Jiang, Zhengkui Guo, Wayne Wu, Zhaoyang Liu, Ziwei Liu, Chen Change Loy, Shuo Yang, Yuanjun Xiong, Wei Xia, Baoying Chen, Peiyu Zhuang, Sili Li, Shen Chen, Taiping Yao, Shouhong Ding, Jilin Li, Feiyue Huang, Liujuan Cao, Rongrong Ji, Changlei Lu, Ganchao Tan
This paper reports methods and results in the DeeperForensics Challenge 2020 on real-world face forgery detection.
1 code implementation • ICCV 2021 • Tianchen Zhao, Xiang Xu, Mingze Xu, Hui Ding, Yuanjun Xiong, Wei Xia
We propose a new method to detect deepfake images using the cue of the source feature inconsistency within the forged images.
1 code implementation • 11 Dec 2020 • Yi Zhu, Xinyu Li, Chunhui Liu, Mohammadreza Zolfaghari, Yuanjun Xiong, Chongruo wu, Zhi Zhang, Joseph Tighe, R. Manmatha, Mu Li
Video action recognition is one of the representative tasks for video understanding.
no code implementations • CVPR 2021 • Sijie Yan, Yuanjun Xiong, Kaustav Kundu, Shuo Yang, Siqi Deng, Meng Wang, Wei Xia, Stefano Soatto
Reducing inconsistencies in the behavior of different versions of an AI system can be as important in practice as reducing its overall error.
1 code implementation • 30 Oct 2020 • Wei Li, Yuanjun Xiong, Shuo Yang, Siqi Deng, Wei Xia
We combine this scheme with SSD detectors by proposing a novel tracking anchor assignment module.
no code implementations • 6 Oct 2020 • BoWen Zhang, Hao Chen, Meng Wang, Yuanjun Xiong
We formulate the problem of online temporal action detection in live streaming videos, acknowledging one important property of live streaming videos that there is normally a broadcast delay between the latest captured frame and the actual frame viewed by the audience.
no code implementations • 3 Oct 2020 • Yifan Xing, Yuanjun Xiong, Wei Xia
Data augmentation has been highly effective in narrowing the data gap and reducing the cost for human annotation, especially for tasks where ground truth labels are difficult and expensive to acquire.
1 code implementation • ECCV 2020 • Guha Balakrishnan, Yuanjun Xiong, Wei Xia, Pietro Perona
To address this problem we develop an experimental method for measuring algorithmic bias of face analysis algorithms, which manipulates directly the attributes of interest, e. g., gender and skin tone, in order to reveal causal links between attribute variation and performance change.
no code implementations • 11 Jun 2020 • Xiang Xu, Yuanjun Xiong, Wei Xia
In this paper, we focus on improving the online face liveness detection system to enhance the security of the downstream face recognition system.
1 code implementation • ECCV 2020 • Jingbo Wang, Sijie Yan, Yuanjun Xiong, Dahua Lin
We propose a new loss function, called motion loss, for the problem of monocular 3D Human pose estimation from 2D pose.
Ranked #15 on
3D Human Pose Estimation
on Human3.6M
3 code implementations • ECCV 2020 • Haodong Duan, Yue Zhao, Yuanjun Xiong, Wentao Liu, Dahua Lin
Then a joint-training strategy is proposed to deal with the domain gaps between multiple data sources and formats in webly-supervised learning.
Ranked #4 on
Action Recognition
on UCF101
(using extra training data)
3 code implementations • CVPR 2020 • Yantao Shen, Yuanjun Xiong, Wei Xia, Stefano Soatto
Backward compatibility is critical to quickly deploy new embedding models that leverage ever-growing large-scale training datasets and improvements in deep learning architectures and training methods.
no code implementations • ICCV 2019 2019 • Sijie Yan, Zhizhong Li, Yuanjun Xiong, Huahan Yan
It captures the temporal structure at multiple scales through the GP prior and the temporal convolutions; and establishes the spatial connection between the latent vectors and the skeleton graphs via a novel graph refining scheme.
Ranked #2 on
Human action generation
on NTU RGB+D
no code implementations • ICCV 2019 • Brais Martinez, Davide Modolo, Yuanjun Xiong, Joseph Tighe
In this work we focus on how to improve the representation capacity of the network, but rather than altering the backbone, we focus on improving the last layers of the network, where changes have low impact in terms of computational cost.
Ranked #31 on
Action Recognition
on Something-Something V1
(using extra training data)
no code implementations • 19 Feb 2019 • Chen Change Loy, Dahua Lin, Wanli Ouyang, Yuanjun Xiong, Shuo Yang, Qingqiu Huang, Dongzhan Zhou, Wei Xia, Quanquan Li, Ping Luo, Junjie Yan, Jian-Feng Wang, Zuoxin Li, Ye Yuan, Boxun Li, Shuai Shao, Gang Yu, Fangyun Wei, Xiang Ming, Dong Chen, Shifeng Zhang, Cheng Chi, Zhen Lei, Stan Z. Li, Hongkai Zhang, Bingpeng Ma, Hong Chang, Shiguang Shan, Xilin Chen, Wu Liu, Boyan Zhou, Huaxiong Li, Peng Cheng, Tao Mei, Artem Kukharenko, Artem Vasenin, Nikolay Sergievskiy, Hua Yang, Liangqi Li, Qiling Xu, Yuan Hong, Lin Chen, Mingjun Sun, Yirong Mao, Shiying Luo, Yongjun Li, Ruiping Wang, Qiaokang Xie, Ziyang Wu, Lei Lu, Yiheng Liu, Wengang Zhou
This paper presents a review of the 2018 WIDER Challenge on Face and Pedestrian.
no code implementations • NeurIPS 2018 • Yue Zhao, Yuanjun Xiong, Dahua Lin
How to leverage the temporal dimension is a key question in video analysis.
1 code implementation • 14 Jun 2018 • Qingqiu Huang, Yuanjun Xiong, Yu Xiong, Yuqi Zhang, Dahua Lin
Experiments on this dataset showed that the proposed method can substantially reduce the training time while obtaining highly effective features and coherent temporal structures.
no code implementations • CVPR 2018 • Yue Zhao, Yuanjun Xiong, Dahua Lin
Despite the remarkable progress in action recognition over the past several years, existing methods remain limited in efficiency and effectiveness.
4 code implementations • CVPR 2018 • Zhirong Wu, Yuanjun Xiong, Stella X. Yu, Dahua Lin
Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so.
Ranked #38 on
Semi-Supervised Image Classification
on ImageNet - 1% labeled data
(Top 5 Accuracy metric)
14 code implementations • 5 May 2018 • Zhirong Wu, Yuanjun Xiong, Stella Yu, Dahua Lin
Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so.
Ranked #13 on
Contrastive Learning
on imagenet-1k
1 code implementation • CVPR 2018 • Kai Chen, Jiaqi Wang, Shuo Yang, Xingcheng Zhang, Yuanjun Xiong, Chen Change Loy, Dahua Lin
High-performance object detection relies on expensive convolutional networks to compute features, often leading to significant challenges in applications, e. g. those that require detecting objects from video streams in real time.
23 code implementations • 23 Jan 2018 • Sijie Yan, Yuanjun Xiong, Dahua Lin
Dynamics of human body skeletons convey significant information for human action recognition.
no code implementations • 9 Jun 2017 • Shuo Yang, Yuanjun Xiong, Chen Change Loy, Xiaoou Tang
Specifically, our method achieves 76. 4 average precision on the challenging WIDER FACE dataset and 96% recall rate on the FDDB dataset with 7 frames per second (fps) for 900 * 1300 input image.
9 code implementations • 8 May 2017 • Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc van Gool
Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.
Ranked #19 on
Action Classification
on Moments in Time
(Top 5 Accuracy metric)
6 code implementations • ICCV 2017 • Yue Zhao, Yuanjun Xiong, Li-Min Wang, Zhirong Wu, Xiaoou Tang, Dahua Lin
Detecting actions in untrimmed videos is an important yet challenging task.
Ranked #6 on
Action Recognition
on THUMOS’14
2 code implementations • CVPR 2017 • Limin Wang, Yuanjun Xiong, Dahua Lin, Luc van Gool
We exploit the learned models for action recognition (WSR) and detection (WSD) on the untrimmed video datasets of THUMOS14 and ActivityNet.
Ranked #3 on
Action Classification
on THUMOS’14
Weakly Supervised Action Localization
Weakly-Supervised Action Recognition
1 code implementation • 8 Mar 2017 • Yuanjun Xiong, Yue Zhao, Li-Min Wang, Dahua Lin, Xiaoou Tang
Detecting activities in untrimmed videos is an important but challenging task.
Ranked #22 on
Temporal Action Localization
on ActivityNet-1.3
2 code implementations • 4 Oct 2016 • Limin Wang, Sheng Guo, Weilin Huang, Yuanjun Xiong, Yu Qiao
Convolutional Neural Networks (CNNs) have made remarkable progress on scene recognition, partially due to these recent large-scale scene datasets, such as the Places and Places2.
1 code implementation • 2 Aug 2016 • Yuanjun Xiong, Li-Min Wang, Zhe Wang, Bo-Wen Zhang, Hang Song, Wei Li, Dahua Lin, Yu Qiao, Luc van Gool, Xiaoou Tang
This paper presents the method that underlies our submission to the untrimmed video classification task of ActivityNet Challenge 2016.
19 code implementations • 2 Aug 2016 • Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc van Gool
The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network.
Ranked #3 on
Multimodal Activity Recognition
on EV-Action
5 code implementations • 8 Jul 2015 • Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao
However, for action recognition in videos, the improvement of deep convolutional networks is not so evident.
Ranked #63 on
Action Recognition
on UCF101
no code implementations • CVPR 2015 • Yuanjun Xiong, Kai Zhu, Dahua Lin, Xiaoou Tang
A considerable portion of web images capture events that occur in our personal lives or social activities.
no code implementations • NeurIPS 2014 • Yuanjun Xiong, Wei Liu, Deli Zhao, Xiaoou Tang
Selecting a small informative subset from a given dataset, also called column sampling, has drawn much attention in machine learning.
no code implementations • 11 Sep 2014 • Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Yuanjun Xiong, Chen Qian, Zhenyao Zhu, Ruohui Wang, Chen-Change Loy, Xiaogang Wang, Xiaoou Tang
In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty.