Search Results for author: Yuanjun Xiong

Found 56 papers, 32 papers with code

RetinaGS: Scalable Training for Dense Scene Rendering with Billion-Scale 3D Gaussians

no code implementations17 Jun 2024 Bingling Li, Shengyi Chen, Luchao Wang, Kaimin Liao, Sijie Yan, Yuanjun Xiong

In this work, we explore the possibility of training high-parameter 3D Gaussian splatting (3DGS) models on large-scale, high-resolution datasets.

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

1 code implementation17 Jun 2024 Ziyu Liu, Tao Chu, Yuhang Zang, Xilin Wei, Xiaoyi Dong, Pan Zhang, Zijian Liang, Yuanjun Xiong, Yu Qiao, Dahua Lin, Jiaqi Wang

Generating natural and meaningful responses to communicate with multi-modal human inputs is a fundamental capability of Large Vision-Language Models(LVLMs).

Visual Question Answering

Image and Video Tokenization with Binary Spherical Quantization

2 code implementations11 Jun 2024 Yue Zhao, Yuanjun Xiong, Philipp Krähenbühl

The resulting BSQ-ViT achieves state-of-the-art visual reconstruction quality on image and video reconstruction benchmarks with 2. 4$\times$ throughput compared to the best prior methods.

Decoder Image Generation +3

Bootstrap3D: Improving 3D Content Creation with Synthetic Data

no code implementations31 May 2024 Zeyi Sun, Tong Wu, Pan Zhang, Yuhang Zang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

Leveraging this pipeline, we have generated 1 million high-quality synthetic multi-view images with dense descriptive captions to address the shortage of high-quality 3D data.

Denoising Descriptive

A Full-duplex Speech Dialogue Scheme Based On Large Language Models

no code implementations29 May 2024 Peng Wang, Songshuo Lu, Yaohua Tang, Sijie Yan, Yuanjun Xiong, Wei Xia

The perception and motor function modules operate simultaneously, allowing the system to simultaneously speak and listen to the user.

Language Modelling Large Language Model

RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition

2 code implementations20 Mar 2024 Ziyu Liu, Zeyi Sun, Yuhang Zang, Wei Li, Pan Zhang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

Notably, our approach demonstrates a significant improvement in performance on 5 fine-grained visual recognition benchmarks, 11 few-shot image recognition datasets, and the 2 object detection datasets under the zero-shot recognition setting.

Contrastive Learning Fine-Grained Visual Recognition +3

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

1 code implementation CVPR 2024 Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

Alpha-CLIP not only preserves the visual recognition ability of CLIP but also enables precise control over the emphasis of image contents.

3D Generation

Towards Regression-Free Neural Networks for Diverse Compute Platforms

no code implementations27 Sep 2022 Rahul Duggal, Hao Zhou, Shuo Yang, Jun Fang, Yuanjun Xiong, Wei Xia

With the shift towards on-device deep learning, ensuring a consistent behavior of an AI service across diverse compute platforms becomes tremendously important.

Neural Architecture Search regression

Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks

1 code implementation20 Sep 2022 Haodong Duan, Yue Zhao, Kai Chen, Yuanjun Xiong, Dahua Lin

Deep learning models have achieved excellent recognition results on large-scale video benchmarks.

Action Recognition

ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training

1 code implementation12 May 2022 Yue Zhao, Yantao Shen, Yuanjun Xiong, Shuo Yang, Wei Xia, Zhuowen Tu, Bernt Schiele, Stefano Soatto

Based on the observation, we present a method, called Ensemble Logit Difference Inhibition (ELODI), to train a classification system that achieves paragon performance in both error rate and NFR, at the inference cost of a single model.

Classification Image Classification

MeMOT: Multi-Object Tracking with Memory

no code implementations CVPR 2022 Jiarui Cai, Mingze Xu, Wei Li, Yuanjun Xiong, Wei Xia, Zhuowen Tu, Stefano Soatto

We propose an online tracking algorithm that performs the object detection and data association under a common framework, capable of linking objects after a long time span.

Multi-Object Tracking Object +2

Contrastive Neighborhood Alignment

no code implementations6 Jan 2022 Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan, Stefano Soatto

We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features whereby data points that are mapped to nearby representations by the source (teacher) model are also mapped to neighbors by the target (student) model.

Long Short-Term Transformer for Online Action Detection

2 code implementations NeurIPS 2021 Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Xia, Zhuowen Tu, Stefano Soatto

We present Long Short-term TRansformer (LSTR), a temporal modeling algorithm for online action detection, which employs a long- and short-term memory mechanism to model prolonged sequence data.

Decoder Online Action Detection +1

Semi-TCL: Semi-Supervised Track Contrastive Representation Learning

no code implementations6 Jul 2021 Wei Li, Yuanjun Xiong, Shuo Yang, Mingze Xu, Yongxin Wang, Wei Xia

We design a new instance-to-track matching objective to learn appearance embedding that compares a candidate detection to the embedding of the tracks persisted in the tracker.

Multiple Object Tracking Object +1

Learning Hierarchical Graph Neural Networks for Image Clustering

2 code implementations ICCV 2021 Yifan Xing, Tong He, Tianjun Xiao, Yongxin Wang, Yuanjun Xiong, Wei Xia, David Wipf, Zheng Zhang, Stefano Soatto

Our hierarchical GNN uses a novel approach to merge connected components predicted at each level of the hierarchy to form a new graph at the next level.

Clustering Face Clustering +1

Harnessing Unrecognizable Faces for Improving Face Recognition

no code implementations8 Jun 2021 Siqi Deng, Yuanjun Xiong, Meng Wang, Wei Xia, Stefano Soatto

The common implementation of face recognition systems as a cascade of a detection stage and a recognition or verification stage can cause problems beyond failures of the detector.

Face Recognition Quantization

SSCAP: Self-supervised Co-occurrence Action Parsing for Unsupervised Temporal Action Segmentation

no code implementations29 May 2021 Zhe Wang, Hao Chen, Xinyu Li, Chunhui Liu, Yuanjun Xiong, Joseph Tighe, Charless Fowlkes

However, it is quite expensive to annotate every frame in a large corpus of videos to construct a comprehensive supervised training dataset.

Action Parsing Action Segmentation +2

Compatibility-aware Heterogeneous Visual Search

no code implementations CVPR 2021 Rahul Duggal, Hao Zhou, Shuo Yang, Yuanjun Xiong, Wei Xia, Zhuowen Tu, Stefano Soatto

Existing systems use the same embedding model to compute representations (embeddings) for the query and gallery images.

Neural Architecture Search Retrieval

Learning Self-Consistency for Deepfake Detection

1 code implementation ICCV 2021 Tianchen Zhao, Xiang Xu, Mingze Xu, Hui Ding, Yuanjun Xiong, Wei Xia

We propose a new method to detect deepfake images using the cue of the source feature inconsistency within the forged images.

DeepFake Detection Face Swapping +2

Positive-Congruent Training: Towards Regression-Free Model Updates

no code implementations CVPR 2021 Sijie Yan, Yuanjun Xiong, Kaustav Kundu, Shuo Yang, Siqi Deng, Meng Wang, Wei Xia, Stefano Soatto

Reducing inconsistencies in the behavior of different versions of an AI system can be as important in practice as reducing its overall error.

Image Classification regression

SMOT: Single-Shot Multi Object Tracking

1 code implementation30 Oct 2020 Wei Li, Yuanjun Xiong, Shuo Yang, Siqi Deng, Wei Xia

We combine this scheme with SSD detectors by proposing a novel tracking anchor assignment module.

Multi-Object Tracking Object

Online Action Detection in Streaming Videos with Time Buffers

no code implementations6 Oct 2020 BoWen Zhang, Hao Chen, Meng Wang, Yuanjun Xiong

We formulate the problem of online temporal action detection in live streaming videos, acknowledging one important property of live streaming videos that there is normally a broadcast delay between the latest captured frame and the actual frame viewed by the audience.

Online Action Detection

3D-Aided Data Augmentation for Robust Face Understanding

no code implementations3 Oct 2020 Yifan Xing, Yuanjun Xiong, Wei Xia

Data augmentation has been highly effective in narrowing the data gap and reducing the cost for human annotation, especially for tasks where ground truth labels are difficult and expensive to acquire.

3D Face Modelling Data Augmentation +1

Towards causal benchmarking of bias in face analysis algorithms

1 code implementation ECCV 2020 Guha Balakrishnan, Yuanjun Xiong, Wei Xia, Pietro Perona

To address this problem we develop an experimental method for measuring algorithmic bias of face analysis algorithms, which manipulates directly the attributes of interest, e. g., gender and skin tone, in order to reveal causal links between attribute variation and performance change.

Attribute Benchmarking +2

On Improving Temporal Consistency for Online Face Liveness Detection

no code implementations11 Jun 2020 Xiang Xu, Yuanjun Xiong, Wei Xia

In this paper, we focus on improving the online face liveness detection system to enhance the security of the downstream face recognition system.

Face Anti-Spoofing Face Recognition

Omni-sourced Webly-supervised Learning for Video Recognition

3 code implementations ECCV 2020 Haodong Duan, Yue Zhao, Yuanjun Xiong, Wentao Liu, Dahua Lin

Then a joint-training strategy is proposed to deal with the domain gaps between multiple data sources and formats in webly-supervised learning.

Ranked #5 on Action Recognition on UCF101 (using extra training data)

Action Classification Action Recognition +1

Towards Backward-Compatible Representation Learning

3 code implementations CVPR 2020 Yantao Shen, Yuanjun Xiong, Wei Xia, Stefano Soatto

Backward compatibility is critical to quickly deploy new embedding models that leverage ever-growing large-scale training datasets and improvements in deep learning architectures and training methods.

Face Recognition Representation Learning

Convolutional Sequence Generation for Skeleton-Based Action Synthesis

no code implementations ICCV 2019 2019 Sijie Yan, Zhizhong Li, Yuanjun Xiong, Huahan Yan

It captures the temporal structure at multiple scales through the GP prior and the temporal convolutions; and establishes the spatial connection between the latent vectors and the skeleton graphs via a novel graph refining scheme.

Human action generation

Action recognition with spatial-temporal discriminative filter banks

no code implementations ICCV 2019 Brais Martinez, Davide Modolo, Yuanjun Xiong, Joseph Tighe

In this work we focus on how to improve the representation capacity of the network, but rather than altering the backbone, we focus on improving the last layers of the network, where changes have low impact in terms of computational cost.

Ranked #36 on Action Recognition on Something-Something V1 (using extra training data)

Action Classification Action Recognition +1

From Trailers to Storylines: An Efficient Way to Learn from Movies

1 code implementation14 Jun 2018 Qingqiu Huang, Yuanjun Xiong, Yu Xiong, Yuqi Zhang, Dahua Lin

Experiments on this dataset showed that the proposed method can substantially reduce the training time while obtaining highly effective features and coherent temporal structures.

Recognize Actions by Disentangling Components of Dynamics

no code implementations CVPR 2018 Yue Zhao, Yuanjun Xiong, Dahua Lin

Despite the remarkable progress in action recognition over the past several years, existing methods remain limited in efficiency and effectiveness.

Action Recognition Optical Flow Estimation +2

Unsupervised Feature Learning via Non-Parametric Instance Discrimination

4 code implementations CVPR 2018 Zhirong Wu, Yuanjun Xiong, Stella X. Yu, Dahua Lin

Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so.

General Classification object-detection +4

Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination

14 code implementations5 May 2018 Zhirong Wu, Yuanjun Xiong, Stella Yu, Dahua Lin

Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so.

Contrastive Learning General Classification +3

Optimizing Video Object Detection via a Scale-Time Lattice

1 code implementation CVPR 2018 Kai Chen, Jiaqi Wang, Shuo Yang, Xingcheng Zhang, Yuanjun Xiong, Chen Change Loy, Dahua Lin

High-performance object detection relies on expensive convolutional networks to compute features, often leading to significant challenges in applications, e. g. those that require detecting objects from video streams in real time.

Object object-detection +1

Face Detection through Scale-Friendly Deep Convolutional Networks

no code implementations9 Jun 2017 Shuo Yang, Yuanjun Xiong, Chen Change Loy, Xiaoou Tang

Specifically, our method achieves 76. 4 average precision on the challenging WIDER FACE dataset and 96% recall rate on the FDDB dataset with 7 frames per second (fps) for 900 * 1300 input image.

Face Detection

Temporal Segment Networks for Action Recognition in Videos

11 code implementations8 May 2017 Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc van Gool

Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.

Action Classification Action Recognition In Videos +3

Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs

2 code implementations4 Oct 2016 Limin Wang, Sheng Guo, Weilin Huang, Yuanjun Xiong, Yu Qiao

Convolutional Neural Networks (CNNs) have made remarkable progress on scene recognition, partially due to these recent large-scale scene datasets, such as the Places and Places2.

General Classification Scene Classification +1

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

19 code implementations2 Aug 2016 Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc van Gool

The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network.

Action Classification Action Recognition In Videos +2

CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016

1 code implementation2 Aug 2016 Yuanjun Xiong, Li-Min Wang, Zhe Wang, Bo-Wen Zhang, Hang Song, Wei Li, Dahua Lin, Yu Qiao, Luc van Gool, Xiaoou Tang

This paper presents the method that underlies our submission to the untrimmed video classification task of ActivityNet Challenge 2016.

General Classification Video Classification

Towards Good Practices for Very Deep Two-Stream ConvNets

5 code implementations8 Jul 2015 Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao

However, for action recognition in videos, the improvement of deep convolutional networks is not so evident.

Action Recognition In Videos Computational Efficiency +3

Recognize Complex Events From Static Images by Fusing Deep Channels

no code implementations CVPR 2015 Yuanjun Xiong, Kai Zhu, Dahua Lin, Xiaoou Tang

A considerable portion of web images capture events that occur in our personal lives or social activities.

Zeta Hull Pursuits: Learning Nonconvex Data Hulls

no code implementations NeurIPS 2014 Yuanjun Xiong, Wei Liu, Deli Zhao, Xiaoou Tang

Selecting a small informative subset from a given dataset, also called column sampling, has drawn much attention in machine learning.

Image Classification

DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection

no code implementations11 Sep 2014 Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Yuanjun Xiong, Chen Qian, Zhenyao Zhu, Ruohui Wang, Chen-Change Loy, Xiaogang Wang, Xiaoou Tang

In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty.

Diversity Object +2

Cannot find the paper you are looking for? You can Submit a new open access paper.