Search Results for author: Yuanjun Xiong

Found 51 papers, 29 papers with code

RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition

1 code implementation • 20 Mar 2024 • Ziyu Liu, Zeyi Sun, Yuhang Zang, Wei Li, Pan Zhang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

Notably, our approach demonstrates a significant improvement in performance on 5 fine-grained visual recognition benchmarks, 11 few-shot image recognition datasets, and the 2 object detection datasets under the zero-shot recognition setting.

Contrastive Learning Fine-Grained Visual Recognition +3

Paper
Code

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

1 code implementation • 6 Dec 2023 • Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

Alpha-CLIP not only preserves the visual recognition ability of CLIP but also enables precise control over the emphasis of image contents.

3D Generation

482

Paper
Code

Towards Regression-Free Neural Networks for Diverse Compute Platforms

no code implementations • 27 Sep 2022 • Rahul Duggal, Hao Zhou, Shuo Yang, Jun Fang, Yuanjun Xiong, Wei Xia

With the shift towards on-device deep learning, ensuring a consistent behavior of an AI service across diverse compute platforms becomes tremendously important.

Neural Architecture Search regression

Paper
Add Code

Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks

1 code implementation • 20 Sep 2022 • Haodong Duan, Yue Zhao, Kai Chen, Yuanjun Xiong, Dahua Lin

Deep learning models have achieved excellent recognition results on large-scale video benchmarks.

Action Recognition

Paper
Code

ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training

no code implementations • 12 May 2022 • Yue Zhao, Yantao Shen, Yuanjun Xiong, Shuo Yang, Wei Xia, Zhuowen Tu, Bernt Schiele, Stefano Soatto

We present a method to train a classification system that achieves paragon performance in both error rate and NFR, at the inference cost of a single model.

Paper
Add Code

Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models

1 code implementation • CVPR 2022 • Feng Cheng, Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Li, Wei Xia

We propose a memory efficient method, named Stochastic Backpropagation (SBP), for training deep neural networks on videos.

Action Detection Action Recognition

Paper
Code

MeMOT: Multi-Object Tracking with Memory

no code implementations • CVPR 2022 • Jiarui Cai, Mingze Xu, Wei Li, Yuanjun Xiong, Wei Xia, Zhuowen Tu, Stefano Soatto

We propose an online tracking algorithm that performs the object detection and data association under a common framework, capable of linking objects after a long time span.

Multi-Object Tracking Object +2

Paper
Add Code

Contrastive Neighborhood Alignment

no code implementations • 6 Jan 2022 • Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan, Stefano Soatto

We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features whereby data points that are mapped to nearby representations by the source (teacher) model are also mapped to neighbors by the target (student) model.

Paper
Add Code

Long Short-Term Transformer for Online Action Detection

2 code implementations • NeurIPS 2021 • Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Xia, Zhuowen Tu, Stefano Soatto

We present Long Short-term TRansformer (LSTR), a temporal modeling algorithm for online action detection, which employs a long- and short-term memory mechanism to model prolonged sequence data.

Ranked #3 on Online Action Detection on TVSeries

Online Action Detection Playing the Game of 2048

119

Paper
Code

Semi-TCL: Semi-Supervised Track Contrastive Representation Learning

no code implementations • 6 Jul 2021 • Wei Li, Yuanjun Xiong, Shuo Yang, Mingze Xu, Yongxin Wang, Wei Xia

We design a new instance-to-track matching objective to learn appearance embedding that compares a candidate detection to the embedding of the tracks persisted in the tracker.

Multiple Object Tracking Object +1

Paper
Add Code

Learning Hierarchical Graph Neural Networks for Image Clustering

2 code implementations • ICCV 2021 • Yifan Xing, Tong He, Tianjun Xiao, Yongxin Wang, Yuanjun Xiong, Wei Xia, David Wipf, Zheng Zhang, Stefano Soatto

Our hierarchical GNN uses a novel approach to merge connected components predicted at each level of the hierarchy to form a new graph at the next level.

Clustering Face Clustering

12,977

Paper
Code

Harnessing Unrecognizable Faces for Improving Face Recognition

no code implementations • 8 Jun 2021 • Siqi Deng, Yuanjun Xiong, Meng Wang, Wei Xia, Stefano Soatto

The common implementation of face recognition systems as a cascade of a detection stage and a recognition or verification stage can cause problems beyond failures of the detector.

Face Recognition Quantization

Paper
Add Code

SSCAP: Self-supervised Co-occurrence Action Parsing for Unsupervised Temporal Action Segmentation

no code implementations • 29 May 2021 • Zhe Wang, Hao Chen, Xinyu Li, Chunhui Liu, Yuanjun Xiong, Joseph Tighe, Charless Fowlkes

However, it is quite expensive to annotate every frame in a large corpus of videos to construct a comprehensive supervised training dataset.

Action Parsing Action Segmentation +2

Paper
Add Code

Compatibility-aware Heterogeneous Visual Search

no code implementations • CVPR 2021 • Rahul Duggal, Hao Zhou, Shuo Yang, Yuanjun Xiong, Wei Xia, Zhuowen Tu, Stefano Soatto

Existing systems use the same embedding model to compute representations (embeddings) for the query and gallery images.

Neural Architecture Search Retrieval

Paper
Add Code

Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates

no code implementations • ACL 2021 • Yuqing Xie, Yi-An Lai, Yuanjun Xiong, Yi Zhang, Stefano Soatto

Behavior of deep neural networks can be inconsistent between different versions.

Knowledge Distillation negative flip rate +1

Paper
Add Code

TubeR: Tubelet Transformer for Video Action Detection

1 code implementation • CVPR 2022 • Jiaojiao Zhao, Yanyi Zhang, Xinyu Li, Hao Chen, Shuai Bing, Mingze Xu, Chunhui Liu, Kaustav Kundu, Yuanjun Xiong, Davide Modolo, Ivan Marsic, Cees G. M. Snoek, Joseph Tighe

We propose TubeR: a simple solution for spatio-temporal video action detection.

Action Classification Action Detection +2

Paper
Code

CelebA-Spoof Challenge 2020 on Face Anti-Spoofing: Methods and Results

1 code implementation • 25 Feb 2021 • Yuanhan Zhang, Zhenfei Yin, Jing Shao, Ziwei Liu, Shuo Yang, Yuanjun Xiong, Wei Xia, Yan Xu, Man Luo, Jian Liu, Jianshu Li, Zhijun Chen, Mingyu Guo, Hui Li, Junfu Liu, Pengfei Gao, Tianqi Hong, Hao Han, Shijie Liu, Xinhua Chen, Di Qiu, Cheng Zhen, Dashuang Liang, Yufeng Jin, Zhanlong Hao

It is the largest face anti-spoofing dataset in terms of the numbers of the data and the subjects.

Face Anti-Spoofing valid

512

Paper
Code

DeeperForensics Challenge 2020 on Real-World Face Forgery Detection: Methods and Results

2 code implementations • 18 Feb 2021 • Liming Jiang, Zhengkui Guo, Wayne Wu, Zhaoyang Liu, Ziwei Liu, Chen Change Loy, Shuo Yang, Yuanjun Xiong, Wei Xia, Baoying Chen, Peiyu Zhuang, Sili Li, Shen Chen, Taiping Yao, Shouhong Ding, Jilin Li, Feiyue Huang, Liujuan Cao, Rongrong Ji, Changlei Lu, Ganchao Tan

This paper reports methods and results in the DeeperForensics Challenge 2020 on real-world face forgery detection.

valid

522

Paper
Code

Learning Self-Consistency for Deepfake Detection

1 code implementation • ICCV 2021 • Tianchen Zhao, Xiang Xu, Mingze Xu, Hui Ding, Yuanjun Xiong, Wei Xia

We propose a new method to detect deepfake images using the cue of the source feature inconsistency within the forged images.

DeepFake Detection Face Swapping +2

Paper
Code

A Comprehensive Study of Deep Video Action Recognition

1 code implementation • 11 Dec 2020 • Yi Zhu, Xinyu Li, Chunhui Liu, Mohammadreza Zolfaghari, Yuanjun Xiong, Chongruo wu, Zhi Zhang, Joseph Tighe, R. Manmatha, Mu Li

Video action recognition is one of the representative tasks for video understanding.

Action Recognition Temporal Action Localization +1

553

Paper
Code

Positive-Congruent Training: Towards Regression-Free Model Updates

no code implementations • CVPR 2021 • Sijie Yan, Yuanjun Xiong, Kaustav Kundu, Shuo Yang, Siqi Deng, Meng Wang, Wei Xia, Stefano Soatto

Reducing inconsistencies in the behavior of different versions of an AI system can be as important in practice as reducing its overall error.

Image Classification regression

Paper
Add Code

SMOT: Single-Shot Multi Object Tracking

1 code implementation • 30 Oct 2020 • Wei Li, Yuanjun Xiong, Shuo Yang, Siqi Deng, Wei Xia

We combine this scheme with SSD detectors by proposing a novel tracking anchor assignment module.

Multi-Object Tracking Object

5,750

Paper
Code

Online Action Detection in Streaming Videos with Time Buffers

no code implementations • 6 Oct 2020 • BoWen Zhang, Hao Chen, Meng Wang, Yuanjun Xiong

We formulate the problem of online temporal action detection in live streaming videos, acknowledging one important property of live streaming videos that there is normally a broadcast delay between the latest captured frame and the actual frame viewed by the audience.

Online Action Detection

Paper
Add Code

3D-Aided Data Augmentation for Robust Face Understanding

no code implementations • 3 Oct 2020 • Yifan Xing, Yuanjun Xiong, Wei Xia

Data augmentation has been highly effective in narrowing the data gap and reducing the cost for human annotation, especially for tasks where ground truth labels are difficult and expensive to acquire.

3D Face Modelling Data Augmentation +1

Paper
Add Code

Towards causal benchmarking of bias in face analysis algorithms

1 code implementation • ECCV 2020 • Guha Balakrishnan, Yuanjun Xiong, Wei Xia, Pietro Perona

To address this problem we develop an experimental method for measuring algorithmic bias of face analysis algorithms, which manipulates directly the attributes of interest, e. g., gender and skin tone, in order to reveal causal links between attribute variation and performance change.

Attribute Benchmarking +2

Paper
Code

On Improving Temporal Consistency for Online Face Liveness Detection

no code implementations • 11 Jun 2020 • Xiang Xu, Yuanjun Xiong, Wei Xia

In this paper, we focus on improving the online face liveness detection system to enhance the security of the downstream face recognition system.

Ranked #4 on Face Anti-Spoofing on SiW (Protocol 3)

Face Anti-Spoofing Face Recognition

Paper
Add Code

Motion Guided 3D Pose Estimation from Videos

1 code implementation • ECCV 2020 • Jingbo Wang, Sijie Yan, Yuanjun Xiong, Dahua Lin

We propose a new loss function, called motion loss, for the problem of monocular 3D Human pose estimation from 2D pose.

Ranked #19 on 3D Human Pose Estimation on Human3.6M

3D Pose Estimation Monocular 3D Human Pose Estimation

Paper
Code

Omni-sourced Webly-supervised Learning for Video Recognition

3 code implementations • ECCV 2020 • Haodong Duan, Yue Zhao, Yuanjun Xiong, Wentao Liu, Dahua Lin

Then a joint-training strategy is proposed to deal with the domain gaps between multiple data sources and formats in webly-supervised learning.

Ranked #5 on Action Recognition on UCF101 (using extra training data)

Action Classification Action Recognition +1

3,866

Paper
Code

Towards Backward-Compatible Representation Learning

3 code implementations • CVPR 2020 • Yantao Shen, Yuanjun Xiong, Wei Xia, Stefano Soatto

Backward compatibility is critical to quickly deploy new embedding models that leverage ever-growing large-scale training datasets and improvements in deep learning architectures and training methods.

Face Recognition Representation Learning

Paper
Code

Convolutional Sequence Generation for Skeleton-Based Action Synthesis

no code implementations • ICCV 2019 2019 • Sijie Yan, Zhizhong Li, Yuanjun Xiong, Huahan Yan

It captures the temporal structure at multiple scales through the GP prior and the temporal convolutions; and establishes the spatial connection between the latent vectors and the skeleton graphs via a novel graph refining scheme.

Ranked #2 on Human action generation on NTU RGB+D

Human action generation

Paper
Add Code

Action recognition with spatial-temporal discriminative filter banks

no code implementations • ICCV 2019 • Brais Martinez, Davide Modolo, Yuanjun Xiong, Joseph Tighe

In this work we focus on how to improve the representation capacity of the network, but rather than altering the backbone, we focus on improving the last layers of the network, where changes have low impact in terms of computational cost.

Ranked #36 on Action Recognition on Something-Something V1 (using extra training data)

Action Classification Action Recognition +1

Paper
Add Code

WIDER Face and Pedestrian Challenge 2018: Methods and Results

no code implementations • 19 Feb 2019 • Chen Change Loy, Dahua Lin, Wanli Ouyang, Yuanjun Xiong, Shuo Yang, Qingqiu Huang, Dongzhan Zhou, Wei Xia, Quanquan Li, Ping Luo, Junjie Yan, Jian-Feng Wang, Zuoxin Li, Ye Yuan, Boxun Li, Shuai Shao, Gang Yu, Fangyun Wei, Xiang Ming, Dong Chen, Shifeng Zhang, Cheng Chi, Zhen Lei, Stan Z. Li, Hongkai Zhang, Bingpeng Ma, Hong Chang, Shiguang Shan, Xilin Chen, Wu Liu, Boyan Zhou, Huaxiong Li, Peng Cheng, Tao Mei, Artem Kukharenko, Artem Vasenin, Nikolay Sergievskiy, Hua Yang, Liangqi Li, Qiling Xu, Yuan Hong, Lin Chen, Mingjun Sun, Yirong Mao, Shiying Luo, Yongjun Li, Ruiping Wang, Qiaokang Xie, Ziyang Wu, Lei Lu, Yiheng Liu, Wengang Zhou

This paper presents a review of the 2018 WIDER Challenge on Face and Pedestrian.

Face Detection Pedestrian Detection +2

Paper
Add Code

Trajectory Convolution for Action Recognition

no code implementations • NeurIPS 2018 • Yue Zhao, Yuanjun Xiong, Dahua Lin

How to leverage the temporal dimension is a key question in video analysis.

Action Recognition Temporal Action Localization

Paper
Add Code

From Trailers to Storylines: An Efficient Way to Learn from Movies

1 code implementation • 14 Jun 2018 • Qingqiu Huang, Yuanjun Xiong, Yu Xiong, Yuqi Zhang, Dahua Lin

Experiments on this dataset showed that the proposed method can substantially reduce the training time while obtaining highly effective features and coherent temporal structures.

Paper
Code

Recognize Actions by Disentangling Components of Dynamics

no code implementations • CVPR 2018 • Yue Zhao, Yuanjun Xiong, Dahua Lin

Despite the remarkable progress in action recognition over the past several years, existing methods remain limited in efficiency and effectiveness.

Action Recognition Optical Flow Estimation +2

Paper
Add Code

Unsupervised Feature Learning via Non-Parametric Instance Discrimination

4 code implementations • CVPR 2018 • Zhirong Wu, Yuanjun Xiong, Stella X. Yu, Dahua Lin

Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so.

Ranked #40 on Semi-Supervised Image Classification on ImageNet - 1% labeled data (Top 5 Accuracy metric)

General Classification object-detection +4

3,078

Paper
Code

Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination

14 code implementations • 5 May 2018 • Zhirong Wu, Yuanjun Xiong, Stella Yu, Dahua Lin

Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so.

Ranked #13 on Contrastive Learning on imagenet-1k

Contrastive Learning General Classification +3

3,227

Paper
Code

Optimizing Video Object Detection via a Scale-Time Lattice

1 code implementation • CVPR 2018 • Kai Chen, Jiaqi Wang, Shuo Yang, Xingcheng Zhang, Yuanjun Xiong, Chen Change Loy, Dahua Lin

High-performance object detection relies on expensive convolutional networks to compute features, often leading to significant challenges in applications, e. g. those that require detecting objects from video streams in real time.

Object object-detection +1

449

Paper
Code

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

24 code implementations • 23 Jan 2018 • Sijie Yan, Yuanjun Xiong, Dahua Lin

Dynamics of human body skeletons convey significant information for human action recognition.

Ranked #2 on Skeleton Based Action Recognition on Varying-view RGB-D Action-Skeleton

3D Human Pose Estimation Action Recognition +3

2,849

Paper
Code

Face Detection through Scale-Friendly Deep Convolutional Networks

no code implementations • 9 Jun 2017 • Shuo Yang, Yuanjun Xiong, Chen Change Loy, Xiaoou Tang

Specifically, our method achieves 76. 4 average precision on the challenging WIDER FACE dataset and 96% recall rate on the FDDB dataset with 7 frames per second (fps) for 900 * 1300 input image.

Face Detection

Paper
Add Code

Temporal Segment Networks for Action Recognition in Videos

11 code implementations • 8 May 2017 • Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc van Gool

Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.

Ranked #5 on Video Classification on COIN

Action Classification Action Recognition In Videos +3

3,866

Paper
Code

Temporal Action Detection with Structured Segment Networks

6 code implementations • ICCV 2017 • Yue Zhao, Yuanjun Xiong, Li-Min Wang, Zhirong Wu, Xiaoou Tang, Dahua Lin

Detecting actions in untrimmed videos is an important yet challenging task.

Ranked #6 on Action Recognition on THUMOS’14

Action Detection Action Recognition +1

3,866

Paper
Code

UntrimmedNets for Weakly Supervised Action Recognition and Detection

2 code implementations • CVPR 2017 • Limin Wang, Yuanjun Xiong, Dahua Lin, Luc van Gool

We exploit the learned models for action recognition (WSR) and detection (WSD) on the untrimmed video datasets of THUMOS14 and ActivityNet.

Ranked #3 on Action Classification on ActivityNet-1.2

Weakly Supervised Action Localization Weakly-Supervised Action Recognition

163

Paper
Code

A Pursuit of Temporal Accuracy in General Activity Detection

1 code implementation • 8 Mar 2017 • Yuanjun Xiong, Yue Zhao, Li-Min Wang, Dahua Lin, Xiaoou Tang

Detecting activities in untrimmed videos is an important but challenging task.

Ranked #29 on Temporal Action Localization on ActivityNet-1.3

Action Detection Activity Detection +2

641

Paper
Code

Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs

2 code implementations • 4 Oct 2016 • Limin Wang, Sheng Guo, Weilin Huang, Yuanjun Xiong, Yu Qiao

Convolutional Neural Networks (CNNs) have made remarkable progress on scene recognition, partially due to these recent large-scale scene datasets, such as the Places and Places2.

General Classification Scene Classification +1

548

Paper
Code

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

19 code implementations • 2 Aug 2016 • Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc van Gool

The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network.

Ranked #3 on Multimodal Activity Recognition on EV-Action

Action Classification Action Recognition In Videos +2

3,866

Paper
Code

CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016

1 code implementation • 2 Aug 2016 • Yuanjun Xiong, Li-Min Wang, Zhe Wang, Bo-Wen Zhang, Hang Song, Wei Li, Dahua Lin, Yu Qiao, Luc van Gool, Xiaoou Tang

This paper presents the method that underlies our submission to the untrimmed video classification task of ActivityNet Challenge 2016.

General Classification Video Classification

250

Paper
Code

Towards Good Practices for Very Deep Two-Stream ConvNets

5 code implementations • 8 Jul 2015 • Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao

However, for action recognition in videos, the improvement of deep convolutional networks is not so evident.

Ranked #66 on Action Recognition on UCF101

Action Recognition In Videos Computational Efficiency +3

553

Paper
Code

Recognize Complex Events From Static Images by Fusing Deep Channels

no code implementations • CVPR 2015 • Yuanjun Xiong, Kai Zhu, Dahua Lin, Xiaoou Tang

A considerable portion of web images capture events that occur in our personal lives or social activities.

Paper
Add Code

Zeta Hull Pursuits: Learning Nonconvex Data Hulls

no code implementations • NeurIPS 2014 • Yuanjun Xiong, Wei Liu, Deli Zhao, Xiaoou Tang

Selecting a small informative subset from a given dataset, also called column sampling, has drawn much attention in machine learning.

Image Classification

Paper
Add Code

DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection

no code implementations • 11 Sep 2014 • Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Yuanjun Xiong, Chen Qian, Zhenyao Zhu, Ruohui Wang, Chen-Change Loy, Xiaogang Wang, Xiaoou Tang

In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty.

Object object-detection +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.