Search Results for author: Rui Qian

Found 23 papers, 17 papers with code

Controllable Augmentations for Video Representation Learning

no code implementations30 Mar 2022 Rui Qian, Weiyao Lin, John See, Dian Li

The major reason is that the positive pairs, i. e., different clips sampled from the same video, have limited temporal receptive field, and usually share similar background but differ in motions.

Action Recognition Contrastive Learning +2

Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation

1 code implementation CVPR 2022 Xian Liu, Qianyi Wu, Hang Zhou, Yinghao Xu, Rui Qian, Xinyi Lin, Xiaowei Zhou, Wayne Wu, Bo Dai, Bolei Zhou

To enhance the quality of synthesized gestures, we develop a contrastive learning strategy based on audio-text alignment for better audio representations.

Contrastive Learning Gesture Generation

Visual Sound Localization in the Wild by Cross-Modal Interference Erasing

1 code implementation13 Feb 2022 Xian Liu, Rui Qian, Hang Zhou, Di Hu, Weiyao Lin, Ziwei Liu, Bolei Zhou, Xiaowei Zhou

Specifically, we observe that the previous practice of learning only a single audio representation is insufficient due to the additive nature of audio signals.

Class-aware Sounding Objects Localization via Audiovisual Correspondence

1 code implementation22 Dec 2021 Di Hu, Yake Wei, Rui Qian, Weiyao Lin, Ruihua Song, Ji-Rong Wen

To address this problem, we propose a two-stage step-by-step learning framework to localize and recognize sounding objects in complex audiovisual scenarios using only the correspondence between audio and vision.

object-detection Object Detection +1

Exploring Temporal Granularity in Self-Supervised Video Representation Learning

no code implementations8 Dec 2021 Rui Qian, Yeqing Li, Liangzhe Yuan, Boqing Gong, Ting Liu, Matthew Brown, Serge Belongie, Ming-Hsuan Yang, Hartwig Adam, Yin Cui

The training objective consists of two parts: a fine-grained temporal learning objective to maximize the similarity between corresponding temporal embeddings in the short clip and the long clip, and a persistent temporal learning objective to pull together global embeddings of the two clips.

Representation Learning Self-Supervised Learning

Motion-aware Contrastive Video Representation Learning via Foreground-background Merging

1 code implementation CVPR 2022 Shuangrui Ding, Maomao Li, Tianyu Yang, Rui Qian, Haohang Xu, Qingyi Chen, Jue Wang, Hongkai Xiong

To alleviate such bias, we propose \textbf{F}oreground-b\textbf{a}ckground \textbf{Me}rging (FAME) to deliberately compose the moving foreground region of the selected video onto the static background of others.

Action Recognition Contrastive Learning +1

Revisiting 3D ResNets for Video Recognition

1 code implementation3 Sep 2021 Xianzhi Du, Yeqing Li, Yin Cui, Rui Qian, Jing Li, Irwan Bello

A recent work from Bello shows that training and scaling strategies may be more significant than model architectures for visual recognition.

Action Classification Contrastive Learning +1

TA2N: Two-Stage Action Alignment Network for Few-shot Action Recognition

no code implementations10 Jul 2021 Shuyuan Li, Huabin Liu, Rui Qian, Yuxi Li, John See, Mengjuan Fei, Xiaoyuan Yu, Weiyao Lin

The first stage locates the action by learning a temporal affine transform, which warps each video feature to its action duration while dismissing the action-irrelevant feature (e. g. background).

Few Shot Action Recognition Metric Learning

3D Object Detection for Autonomous Driving: A Survey

1 code implementation21 Jun 2021 Rui Qian, Xin Lai, Xirong Li

Autonomous driving is regarded as one of the most promising remedies to shield human beings from severe crashes.

3D Object Detection Autonomous Driving +2

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

2 code implementations NeurIPS 2021 Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval.

Ranked #6 on Action Classification on Moments in Time (using extra training data)

Action Classification Action Recognition +7

BADet: Boundary-Aware 3D Object Detection from Point Clouds

1 code implementation21 Apr 2021 Rui Qian, Xin Lai, Xirong Li

Specifically, instead of refining each proposal independently as previous works do, we represent each proposal as a node for graph construction within a given cut-off threshold, associating proposals in the form of local neighborhood graph, with boundary correlations of an object being explicitly exploited.

3D Object Detection graph construction +2

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

1 code implementation NeurIPS 2020 Di Hu, Rui Qian, Minyue Jiang, Xiao Tan, Shilei Wen, Errui Ding, Weiyao Lin, Dejing Dou

First, we propose to learn robust object representations by aggregating the candidate sound localization results in the single source scenes.

Object Localization

Finding Action Tubes with a Sparse-to-Dense Framework

no code implementations30 Aug 2020 Yuxi Li, Weiyao Lin, Tao Wang, John See, Rui Qian, Ning Xu, Li-Min Wang, Shugong Xu

The task of spatial-temporal action detection has attracted increasing attention among researchers.

Action Detection

Spatiotemporal Contrastive Video Representation Learning

3 code implementations CVPR 2021 Rui Qian, Tianjian Meng, Boqing Gong, Ming-Hsuan Yang, Huisheng Wang, Serge Belongie, Yin Cui

Our representations are learned using a contrastive loss, where two augmented clips from the same short video are pulled together in the embedding space, while clips from different videos are pushed away.

 Ranked #1 on Self-Supervised Action Recognition on Kinetics-400 (using extra training data)

Contrastive Learning Data Augmentation +4

Multiple Sound Sources Localization from Coarse to Fine

1 code implementation ECCV 2020 Rui Qian, Di Hu, Heinrich Dinkel, Mengyue Wu, Ning Xu, Weiyao Lin

How to visually localize multiple sound sources in unconstrained videos is a formidable problem, especially when lack of the pairwise sound-object annotations.

Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events

no code implementations9 May 2020 Weiyao Lin, Huabin Liu, Shizhan Liu, Yuxi Li, Rui Qian, Tao Wang, Ning Xu, Hongkai Xiong, Guo-Jun Qi, Nicu Sebe

We demonstrate that the proposed method is able to boost the performance of existing pose estimation pipelines on our HiEve dataset.

Pose Estimation

ATRW: A Benchmark for Amur Tiger Re-identification in the Wild

1 code implementation13 Jun 2019 Shuyuan Li, Jianguo Li, Hanlin Tang, Rui Qian, Weiyao Lin

This paper tries to fill the gap by introducing a novel large-scale dataset, the Amur Tiger Re-identification in the Wild (ATRW) dataset.

Computer Vision

Attentive Generative Adversarial Network for Raindrop Removal from a Single Image

3 code implementations CVPR 2018 Rui Qian, Robby T. Tan, Wenhan Yang, Jiajun Su, Jiaying Liu

This injection of visual attention to both generative and discriminative networks is the main contribution of this paper.

Rain Removal

Cannot find the paper you are looking for? You can Submit a new open access paper.