Search Results for author: Yu Qiao

Found 410 papers, 245 papers with code

Motionlets: Mid-level 3D Parts for Human Motion Recognition

no code implementations • CVPR 2013 • Li-Min Wang, Yu Qiao, Xiaoou Tang

We postulate three key properties of motionlet for action recognition: high motion saliency, multiple scale representation, and representative-discriminative ability.

Action Recognition Temporal Action Localization

Paper
Add Code

A Study on Unsupervised Dictionary Learning and Feature Encoding for Action Classification

no code implementations • 2 Sep 2013 • Xiaojiang Peng, Qiang Peng, Yu Qiao, Junzhou Chen, Mehtab Afzal

Many efforts have been devoted to develop alternative methods to traditional vector quantization in image domain such as sparse coding and soft-assignment.

Action Classification Dictionary Learning +2

Paper
Add Code

Bag of Visual Words and Fusion Methods for Action Recognition: Comprehensive Study and Good Practice

no code implementations • 18 May 2014 • Xiaojiang Peng, Li-Min Wang, Xingxing Wang, Yu Qiao

Many efforts have been made in each step independently in different scenarios and their effect on action recognition is still unknown.

Action Recognition In Videos Temporal Action Localization

Paper
Add Code

Multi-View Super Vector for Action Recognition

no code implementations • CVPR 2014 • Zhuowei Cai, Li-Min Wang, Xiaojiang Peng, Yu Qiao

Kernel average is then applied on these components to produce recognition result.

Action Recognition Temporal Action Localization

Paper
Add Code

Object-Scene Convolutional Neural Networks for Event Recognition in Images

no code implementations • 2 May 2015 • Limin Wang, Zhe Wang, Wenbin Du, Yu Qiao

Meanwhile, we investigate different network architectures for OS-CNN design, and adapt the deep (AlexNet) and very-deep (GoogLeNet) networks to the task of event recognition.

Paper
Add Code

Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors

1 code implementation • CVPR 2015 • Limin Wang, Yu Qiao, Xiaoou Tang

Visual features are of vital importance for human action understanding in videos.

Ranked #59 on Action Recognition on HMDB-51

Action Recognition Action Understanding +1

Paper
Code

Boosting Optical Character Recognition: A Super-Resolution Approach

no code implementations • 7 Jun 2015 • Chao Dong, Ximei Zhu, Yubin Deng, Chen Change Loy, Yu Qiao

Text image super-resolution is a challenging yet open research problem in the computer vision community.

Image Super-Resolution Optical Character Recognition +1

Paper
Add Code

Reading Scene Text in Deep Convolutional Sequences

1 code implementation • 14 Jun 2015 • Pan He, Weilin Huang, Yu Qiao, Chen Change Loy, Xiaoou Tang

We develop a Deep-Text Recurrent Network (DTRN) that regards scene text reading as a sequence labelling problem.

Scene Text Recognition

Paper
Code

Towards Good Practices for Very Deep Two-Stream ConvNets

5 code implementations • 8 Jul 2015 • Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao

However, for action recognition in videos, the improvement of deep convolutional networks is not so evident.

Ranked #66 on Action Recognition on UCF101

Action Recognition In Videos Computational Efficiency +3

554

Paper
Code

Local Color Contrastive Descriptor for Image Classification

no code implementations • 3 Aug 2015 • Sheng Guo, Weilin Huang, Yu Qiao

Our descriptor enriches local image representation with both color and contrast information.

Classification General Classification +2

Paper
Add Code

Places205-VGGNet Models for Scene Recognition

2 code implementations • 7 Aug 2015 • Limin Wang, Sheng Guo, Weilin Huang, Yu Qiao

We verify the performance of trained Places205-VGGNet models on three datasets: MIT67, SUN397, and Places205.

Computational Efficiency Object Recognition +1

Paper
Code

Local Multi-Grouped Binary Descriptor with Ring-based Pooling Configuration and Optimization

no code implementations • 22 Sep 2015 • Yongqiang Gao, Weilin Huang, Yu Qiao

The performance of RMGD was evaluated on a number of publicly available benchmarks, where the RMGD outperforms the state-of-the-art binary descriptors significantly.

Paper
Add Code

Text-Attentional Convolutional Neural Networks for Scene Text Detection

no code implementations • 12 Oct 2015 • Tong He, Weilin Huang, Yu Qiao, Jian Yao

The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components.

Multi-Task Learning Scene Text Detection +3

Paper
Add Code

Better Exploiting OS-CNNs for Better Event Recognition in Images

no code implementations • 14 Oct 2015 • Limin Wang, Zhe Wang, Sheng Guo, Yu Qiao

Event recognition from still images is one of the most important problems for image understanding.

Object Object Recognition +1

Paper
Add Code

Locally-Supervised Deep Hybrid Model for Scene Recognition

no code implementations • 27 Jan 2016 • Sheng Guo, Weilin Huang, Li-Min Wang, Yu Qiao

Secondly, we propose a new Local Convolutional Supervision (LCS) layer to enhance the local structure of the image by directly propagating the label information to the convolutional layers.

General Classification Image Classification +1

Paper
Add Code

Text-attentional convolutional neural network for scene text detection

no code implementations • IEEE Trans. on Image Processing, 2016 2016 • Tong He, Weilin Huang, Yu Qiao, Jian Yao

Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images.

Multi-Task Learning Scene Text Detection +3

Paper
Add Code

Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network

1 code implementation • 31 Mar 2016 • Tong He, Weilin Huang, Yu Qiao, Jian Yao

We propose a novel Cascaded Convolutional Text Network (CCTN) that joints two customized convolutional networks for coarse-to-fine text localization.

Scene Text Detection Text Detection

Paper
Code

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks

42 code implementations • 11 Apr 2016 • Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, Yu Qiao

Face detection and alignment in unconstrained environment are challenging due to various poses, illuminations and occlusions.

Ranked #27 on Face Detection on WIDER Face (Easy)

Face Alignment Face Detection

9,996

Paper
Code

Actionness Estimation Using Hybrid Fully Convolutional Networks

no code implementations • CVPR 2016 • Limin Wang, Yu Qiao, Xiaoou Tang, Luc van Gool

Actionness was introduced to quantify the likelihood of containing a generic action instance at a specific location.

Ranked #11 on Action Detection on J-HMDB

Action Detection Action Recognition +1

Paper
Add Code

Real-time Action Recognition with Enhanced Motion Vector CNNs

1 code implementation • CVPR 2016 • Bowen Zhang, Li-Min Wang, Zhe Wang, Yu Qiao, Hanli Wang

The deep two-stream architecture exhibited excellent performance on video based action recognition.

Ranked #74 on Action Recognition on UCF101

Action Recognition Optical Flow Estimation +1

550

Paper
Code

Latent Factor Guided Convolutional Neural Networks for Age-Invariant Face Recognition

no code implementations • CVPR 2016 • Yandong Wen, Zhifeng Li, Yu Qiao

In order to address this problem, we propose a novel deep face recognition framework to learn the age-invariant deep face features through a carefully designed CNN model.

Ranked #7 on Age-Invariant Face Recognition on CACDVS

Age-Invariant Face Recognition MORPH

Paper
Add Code

A Key Volume Mining Deep Framework for Action Recognition

no code implementations • CVPR 2016 • Wangjiang Zhu, Jie Hu, Gang Sun, Xudong Cao, Yu Qiao

Training with a large proportion of irrelevant volumes will hurt performance.

Action Recognition In Videos Temporal Action Localization

Paper
Add Code

DeepWriter: A Multi-Stream Deep CNN for Text-independent Writer Identification

no code implementations • 21 Jun 2016 • Linjie Xing, Yu Qiao

The main contributions are: 1) we design and optimize multi-stream structure for writer identification task; 2) we introduce data augmentation learning to enhance the performance of DeepWriter; 3) we introduce a patch scanning strategy to handle text image with different lengths.

Data Augmentation Sentence +1

Paper
Add Code

CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016

1 code implementation • 2 Aug 2016 • Yuanjun Xiong, Li-Min Wang, Zhe Wang, Bo-Wen Zhang, Hang Song, Wei Li, Dahua Lin, Yu Qiao, Luc van Gool, Xiaoou Tang

This paper presents the method that underlies our submission to the untrimmed video classification task of ActivityNet Challenge 2016.

General Classification Video Classification

251

Paper
Code

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

19 code implementations • 2 Aug 2016 • Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc van Gool

The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network.

Ranked #3 on Multimodal Activity Recognition on EV-Action

Action Classification Action Recognition In Videos +2

3,888

Paper
Code

Transferring Object-Scene Convolutional Neural Networks for Event Recognition in Still Images

no code implementations • 1 Sep 2016 • Limin Wang, Zhe Wang, Yu Qiao, Luc van Gool

These newly designed transferring techniques exploit multi-task learning frameworks to incorporate extra knowledge from other networks and additional datasets into the training procedure of event CNNs.

Multi-Task Learning

Paper
Add Code

Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition

1 code implementation • 1 Sep 2016 • Zhe Wang, Li-Min Wang, Yali Wang, Bo-Wen Zhang, Yu Qiao

In this paper, we propose a hybrid representation, which leverages the discriminative capacity of CNNs and the simplicity of descriptor encoding schema for image recognition, with a focus on scene recognition.

Scene Recognition

Paper
Code

Detecting Text in Natural Image with Connectionist Text Proposal Network

27 code implementations • 12 Sep 2016 • Zhi Tian, Weilin Huang, Tong He, Pan He, Yu Qiao

We propose a novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image.

Scene Text Detection

3,414

Paper
Code

A Discriminative Feature Learning Approach for Deep Face Recognition

1 code implementation • ECCV 2016 2016 • Yandong Wen, Kaipeng Zhang, Zhifeng Li, Yu Qiao

In most of the available CNNs, the softmax loss function is used as the supervision signal to train the deep model.

Face Recognition Face Verification

941

Paper
Code

Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs

2 code implementations • 4 Oct 2016 • Limin Wang, Sheng Guo, Weilin Huang, Yuanjun Xiong, Yu Qiao

Convolutional Neural Networks (CNNs) have made remarkable progress on scene recognition, partially due to these recent large-scale scene datasets, such as the Places and Places2.

General Classification Scene Classification +1

550

Paper
Code

Range Loss for Deep Face Recognition with Long-tail

2 code implementations • 28 Nov 2016 • Xiao Zhang, Zhiyuan Fang, Yandong Wen, Zhifeng Li, Yu Qiao

Convolutional neural networks have achieved great improvement on face recognition in recent years because of its extraordinary ability in learning discriminative features of people with different identities.

Face Recognition

Paper
Code

Temporal Segment Networks for Action Recognition in Videos

11 code implementations • 8 May 2017 • Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc van Gool

Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.

Ranked #5 on Video Classification on COIN

Action Classification Action Recognition In Videos +3

3,888

Paper
Code

Single Shot Text Detector with Regional Attention

1 code implementation • ICCV 2017 • Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, Xiaolin Li

Our text detector achieves an F-measure of 77% on the ICDAR 2015 bench- mark, advancing the state-of-the-art results in [18, 28].

Ranked #4 on Scene Text Detection on COCO-Text

Scene Text Detection

212

Paper
Code

Deep Embedding Convolutional Neural Network for Synthesizing CT Image from T1-Weighted MR Image

no code implementations • 7 Sep 2017 • Lei Xiang, Qian Wang, Xiyao Jin, Dong Nie, Yu Qiao, Dinggang Shen

After repeat-ing this embedding procedure for several times in the network, we can eventually synthesize a final CT image in the end of the DECNN.

Computed Tomography (CT) Image Generation

Paper
Add Code

Detecting Faces Using Inside Cascaded Contextual CNN

no code implementations • ICCV 2017 • Kaipeng Zhang, Zhanpeng Zhang, Hao Wang, Zhifeng Li, Yu Qiao, Wei Liu

Deep Convolutional Neural Networks (CNNs) achieve substantial improvements in face detection in the wild.

Face Detection

Paper
Add Code

Range Loss for Deep Face Recognition With Long-Tailed Training Data

no code implementations • ICCV 2017 • Xiao Zhang, Zhiyuan Fang, Yandong Wen, Zhifeng Li, Yu Qiao

Unlike these work, this paper investigated how long-tailed data impact the training of face CNNs and develop a novel loss function, called range loss, to effectively utilize the tailed data in training process.

Face Recognition

Paper
Add Code

RPAN: An End-to-End Recurrent Pose-Attention Network for Action Recognition in Videos

1 code implementation • 2017 IEEE International Conference on Computer Vision (ICCV) 2017 • Wenbin Du, Yali Wang, Yu Qiao

Firstly, unlike previous works on pose-related action recognition, our RPAN is an end-to-end recurrent network which can exploit important spatial-temporal evolutions of human pose to assist action recognition in a unified framework.

Ranked #5 on Skeleton Based Action Recognition on J-HMDB

Action Recognition In Videos Pose Estimation +1

Paper
Code

Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward

6 code implementations • 29 Dec 2017 • Kaiyang Zhou, Yu Qiao, Tao Xiang

Video summarization aims to facilitate large-scale video browsing by producing short, concise summaries that are diverse and representative of original videos.

Ranked #7 on Unsupervised Video Summarization on TvSum

Decision Making reinforcement-learning +3

455

Paper
Code

FOTS: Fast Oriented Text Spotting with a Unified Network

7 code implementations • CVPR 2018 • Xuebo Liu, Ding Liang, Shi Yan, Dagui Chen, Yu Qiao, Junjie Yan

Incidental scene text spotting is considered one of the most difficult and valuable challenges in the document analysis community.

Ranked #4 on Scene Text Detection on ICDAR 2017 MLT

Scene Text Detection Scene Text Recognition +2

635

Paper
Code

Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

1 code implementation • 24 Jan 2018 • Zhe Wang, Xiaoyi Liu, Liangjian Chen, Li-Min Wang, Yu Qiao, Xiaohui Xie, Charless Fowlkes

Visual question answering (VQA) is of significant interest due to its potential to be a strong test of image understanding systems and to probe the connection between language and vision.

Multiple-choice POS +3

Paper
Code

LSTD: A Low-Shot Transfer Detector for Object Detection

1 code implementation • 5 Mar 2018 • Hao Chen, Yali Wang, Guoyou Wang, Yu Qiao

Second, we introduce a novel regularized transfer learning framework for low-shot detection, where the transfer knowledge (TK) and background depression (BD) regularizations are proposed to leverage object knowledge respectively from source and target domains, in order to further enhance fine-tuning with a few target images.

Ranked #22 on Few-Shot Object Detection on MS-COCO (30-shot)

Few-Shot Object Detection Object +2

Paper
Code

An end-to-end TextSpotter with Explicit Alignment and Attention

2 code implementations • CVPR 2018 • Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, Changming Sun

This allows the two tasks to work collaboratively by shar- ing convolutional features, which is critical to identify challenging text instances.

Text Detection

323

Paper
Code

SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters

1 code implementation • ECCV 2018 • Yifan Xu, Tianqi Fan, Mingye Xu, Long Zeng, Yu Qiao

Deep neural networks have enjoyed remarkable success for various vision tasks, however it remains challenging to apply CNNs to domains lacking a regular underlying structures such as 3D point clouds.

Ranked #6 on 3D Part Segmentation on IntrA

3D Part Segmentation 3D Point Cloud Classification

Paper
Code

Boosting up Scene Text Detectors with Guided CNN

no code implementations • 10 May 2018 • Xiaoyu Yue, Zhanghui Kuang, Zhaoyang Zhang, Zhenfang Chen, Pan He, Yu Qiao, Wei zhang

Deep CNNs have achieved great success in text detection.

Text Detection

Paper
Add Code

Knowledge-based Fully Convolutional Network and Its Application in Segmentation of Lung CT Images

no code implementations • 22 May 2018 • Tao Yu, Yu Qiao, Huan Long

A variety of deep neural networks have been applied in medical image segmentation and achieve good performance.

Image Segmentation Medical Image Segmentation +2

Paper
Add Code

Temporal Hallucinating for Action Recognition With Few Still Images

no code implementations • CVPR 2018 • Yali Wang, Lei Zhou, Yu Qiao

To mimic this capacity, we propose a novel Hybrid Video Memory (HVM) machine, which can hallucinate temporal features of still images from video memory, in order to boost action recognition with few still images.

Action Recognition In Still Images Domain Adaptation

Paper
Add Code

Prostate Segmentation using 2D Bridged U-net

no code implementations • 12 Jul 2018 • Wanli Chen, Yue Zhang, Junjun He, Yu Qiao, Yi-fan Chen, Hongjian Shi, Xiaoying Tang

To address the aforementioned three problems, we propose and validate a deeper network that can fit medical image datasets that are usually small in the sample size.

Image Segmentation Medical Image Segmentation +2

Paper
Add Code

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

45 code implementations • 1 Sep 2018 • Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen Change Loy, Yu Qiao, Xiaoou Tang

To further enhance the visual quality, we thoroughly study three key components of SRGAN - network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN).

Ranked #2 on Face Hallucination on FFHQ 512 x 512 - 16x upscaling

Face Hallucination Generative Adversarial Network +2

15,711

Paper
Code

Find and Focus: Retrieve and Localize Video Events with Natural Language Queries

no code implementations • ECCV 2018 • Dian Shao, Yu Xiong, Yue Zhao, Qingqiu Huang, Yu Qiao, Dahua Lin

The thriving of video sharing services brings new challenges to video retrieval, e. g. the rapid growth in video duration and content diversity.

Natural Language Queries Retrieval +2

Paper
Add Code

PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report

no code implementations • 3 Oct 2018 • Andrey Ignatov, Radu Timofte, Thang Van Vu, Tung Minh Luu, Trung X. Pham, Cao Van Nguyen, Yongwoo Kim, Jae-Seok Choi, Munchurl Kim, Jie Huang, Jiewen Ran, Chen Xing, Xingguang Zhou, Pengfei Zhu, Mingrui Geng, Yawei Li, Eirikur Agustsson, Shuhang Gu, Luc van Gool, Etienne de Stoutz, Nikolay Kobyshev, Kehui Nie, Yan Zhao, Gen Li, Tong Tong, Qinquan Gao, Liu Hanwen, Pablo Navarrete Michelini, Zhu Dan, Hu Fengshuo, Zheng Hui, Xiumei Wang, Lirui Deng, Rang Meng, Jinghui Qin, Yukai Shi, Wushao Wen, Liang Lin, Ruicheng Feng, Shixiang Wu, Chao Dong, Yu Qiao, Subeesh Vasu, Nimisha Thekke Madam, Praveen Kandula, A. N. Rajagopalan, Jie Liu, Cheolkon Jung

This paper reviews the first challenge on efficient perceptual image enhancement with the focus on deploying deep learning models on smartphones.

Image Enhancement Image Super-Resolution

Paper
Add Code

Super-Identity Convolutional Neural Network for Face Hallucination

no code implementations • ECCV 2018 • Kaipeng Zhang, Zhanpeng Zhang, Chia-Wen Cheng, Winston H. Hsu, Yu Qiao, Wei Liu, Tong Zhang

Face hallucination is a generative task to super-resolve the facial image with low resolution while human perception of face heavily relies on identity information.

Face Generation Face Hallucination +1

Paper
Add Code

Modulating Image Restoration with Continual Levels via Adaptive Feature Modification Layers

1 code implementation • CVPR 2019 • Jingwen He, Chao Dong, Yu Qiao

In image restoration tasks, like denoising and super resolution, continual modulation of restoration levels is of great importance for real-world applications, but has failed most of existing deep learning based image restoration methods.

Ranked #2 on Color Image Denoising on CBSD68 sigma75

Image Denoising Image Restoration +1

184

Paper
Code

AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations

3 code implementations • CVPR 2019 • Xiao Zhang, Rui Zhao, Yu Qiao, Xiaogang Wang, Hongsheng Li

Our results show that training deep neural networks with the AdaCos loss is stable and able to achieve high face recognition accuracy.

Ranked #6 on Face Verification on MegaFace

Face Recognition Face Verification

207

Paper
Code

P2SGrad: Refined Gradients for Optimizing Deep Face Models

no code implementations • CVPR 2019 • Xiao Zhang, Rui Zhao, Junjie Yan, Mengya Gao, Yu Qiao, Xiaogang Wang, Hongsheng Li

Cosine-based softmax losses significantly improve the performance of deep face recognition networks.

Face Recognition

Paper
Add Code

Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition

1 code implementation • 10 May 2019 • Kai Wang, Xiaojiang Peng, Jianfei Yang, Debin Meng, Yu Qiao

Extensive experiments show that our RAN and region biased loss largely improve the performance of FER with occlusion and variant pose.

Ranked #2 on Facial Expression Recognition (FER) on SFEW

Facial Expression Recognition Facial Expression Recognition (FER)

250

Paper
Code

Suppressing Model Overfitting for Image Super-Resolution Networks

no code implementations • 11 Jun 2019 • Ruicheng Feng, Jinjin Gu, Yu Qiao, Chao Dong

Large deep networks have demonstrated competitive performance in single image super-resolution (SISR), with a huge volume of data involved.

Image Super-Resolution Memorization

Paper
Add Code

Frame attention networks for facial expression recognition in videos

2 code implementations • 29 Jun 2019 • Debin Meng, Xiaojiang Peng, Kai Wang, Yu Qiao

The feature embedding module is a deep Convolutional Neural Network (CNN) which embeds face images into feature vectors.

Ranked #3 on Facial Expression Recognition (FER) on CK+ (Accuracy (7 emotion) metric)

Facial Expression Recognition Facial Expression Recognition (FER)

323

Paper
Code

Bootstrap Model Ensemble and Rank Loss for Engagement Intensity Regression

no code implementations • 8 Jul 2019 • Kai Wang, Jianfei Yang, Da Guo, Kaipeng Zhang, Xiaojiang Peng, Yu Qiao

Based on our winner solution last year, we mainly explore head features and body features with a bootstrap strategy and two novel loss functions in this paper.

regression

Paper
Add Code

Product Image Recognition with Guidance Learning and Noisy Supervision

no code implementations • 26 Jul 2019 • Qing Li, Xiaojiang Peng, Liangliang Cao, Wenbin Du, Hao Xing, Yu Qiao

Instead of collecting product images by labor-and time-intensive image capturing, we take advantage of the web and download images from the reviews of several e-commerce websites where the images are casually captured by consumers.

Paper
Add Code

RankSRGAN: Generative Adversarial Networks with Ranker for Image Super-Resolution

2 code implementations • ICCV 2019 • Wenlong Zhang, Yihao Liu, Chao Dong, Yu Qiao

To address the problem, we propose Super-Resolution Generative Adversarial Networks with Ranker (RankSRGAN) to optimize generator in the direction of perceptual metrics.

Ranked #1 on Image Super-Resolution on PIRM-test

Image Super-Resolution

269

Paper
Code

Understanding Vocabulary Growth Through An Adaptive Language Learning System

no code implementations • WS 2019 • Elma Kerz, Andreas Burgdorf, Daniel Wiechmann, Stefan Meeger, Yu Qiao, Christian Kohlschein, Tobias Meisen

Paper
Add Code

Learning Category Correlations for Multi-label Image Recognition with Graph Networks

no code implementations • 28 Sep 2019 • Qing Li, Xiaojiang Peng, Yu Qiao, Qiang Peng

In this paper, instead of using a pre-defined graph which is inflexible and may be sub-optimal for multi-label classification, we propose the A-GCN, which leverages the popular Graph Convolutional Networks with an Adaptive label correlation graph to model label dependencies.

Multi-Label Classification Word Embeddings

Paper
Add Code

Interactive Multi-Dimension Modulation with Dynamic Controllable Residual Learning for Image Restoration

1 code implementation • ECCV 2020 • Jingwen He, Chao Dong, Yu Qiao

To make a step forward, this paper presents a new problem setup, called multi-dimension (MD) modulation, which aims at modulating output effects across multiple degradation types and levels.

Image Restoration

Paper
Code

Geometry Sharing Network for 3D Point Cloud Classification and Segmentation

1 code implementation • 23 Dec 2019 • Mingye Xu, Zhipeng Zhou, Yu Qiao

Specially, GS-Net consists of Geometry Similarity Connection (GSC) modules which exploit Eigen-Graph to group distant points with similar and relevant geometric information, and aggregate features from nearest neighbors in both Euclidean space and Eigenvalue space.

Ranked #7 on 3D Point Cloud Classification on IntrA

3D Point Cloud Classification Classification +3

Paper
Code

Pose-Assisted Multi-Camera Collaboration for Active Object Tracking

no code implementations • 15 Jan 2020 • Jing Li, Jing Xu, Fangwei Zhong, Xiangyu Kong, Yu Qiao, Yizhou Wang

In the system, each camera is equipped with two controllers and a switcher: The vision-based controller tracks targets based on observed images.

Object Object Tracking

Paper
Add Code

FD-GAN: Generative Adversarial Networks with Fusion-discriminator for Single Image Dehazing

no code implementations • 20 Jan 2020 • Yu Dong, Yihao Liu, He Zhang, Shifeng Chen, Yu Qiao

With the proposed Fusion-discriminator which takes frequency information as additional priors, our model can generator more natural and realistic dehazed images with less color distortion and fewer artifacts.

Image Dehazing Single Image Dehazing

Paper
Add Code

A Comprehensive Study on Temporal Modeling for Online Action Detection

1 code implementation • 21 Jan 2020 • Wen Wang, Xiaojiang Peng, Yu Qiao, Jian Cheng

Online action detection (OAD) is a practical yet challenging task, which has attracted increasing attention in recent years.

Online Action Detection

Paper
Code

Progressive Object Transfer Detection

no code implementations • 12 Feb 2020 • Hao Chen, Yali Wang, Guoyou Wang, Xiang Bai, Yu Qiao

Inspired by this procedure of learning to detect, we propose a novel Progressive Object Transfer Detection (POTD) framework.

Object object-detection +1

Paper
Add Code

Learning Attentive Pairwise Interaction for Fine-Grained Classification

1 code implementation • 24 Feb 2020 • Peiqin Zhuang, Yali Wang, Yu Qiao

These distinct gate vectors inherit mutual context on semantic differences, which allow API-Net to attentively capture contrastive clues by pairwise interaction between two images.

Ranked #12 on Fine-Grained Image Classification on Stanford Dogs

Classification Fine-Grained Image Classification +1

124

Paper
Code

Suppressing Uncertainties for Large-Scale Facial Expression Recognition

2 code implementations • CVPR 2020 • Kai Wang, Xiaojiang Peng, Jianfei Yang, Shijian Lu, Yu Qiao

Annotating a qualitative large-scale facial expression dataset is extremely difficult due to the uncertainties caused by ambiguous facial expressions, low-quality facial images, and the subjectiveness of annotators.

Facial Expression Recognition Facial Expression Recognition (FER)

404

Paper
Code

Refined Gate: A Simple and Effective Gating Mechanism for Recurrent Units

no code implementations • 26 Feb 2020 • Zhanzhan Cheng, Yunlu Xu, Mingjian Cheng, Yu Qiao, ShiLiang Pu, Yi Niu, Fei Wu

Recurrent neural network (RNN) has been widely studied in sequence learning tasks, while the mainstream models (e. g., LSTM and GRU) rely on the gating mechanism (in control of how information flows between hidden states).

Language Modelling Scene Text Recognition

Paper
Add Code

TTPP: Temporal Transformer with Progressive Prediction for Efficient Action Anticipation

no code implementations • 7 Mar 2020 • Wen Wang, Xiaojiang Peng, Yanzhou Su, Yu Qiao, Jian Cheng

Video action anticipation aims to predict future action categories from observed frames.

Action Anticipation

Paper
Add Code

Context-Transformer: Tackling Object Confusion for Few-Shot Detection

1 code implementation • 16 Mar 2020 • Ze Yang, Yali Wang, Xianyu Chen, Jianzhuang Liu, Yu Qiao

Few-shot object detection is a challenging but realistic scenario, where only a few annotated training images are available for training detectors.

Few-Shot Learning Few-Shot Object Detection +3

102

Paper
Code

Domain Adaptive Ensemble Learning

1 code implementation • 16 Mar 2020 • Kaiyang Zhou, Yongxin Yang, Yu Qiao, Tao Xiang

Each such classifier is an expert to its own domain and a non-expert to others.

Domain Generalization Ensemble Learning +3

1,082

Paper
Code

Learning to Predict Context-adaptive Convolution for Semantic Segmentation

no code implementations • ECCV 2020 • Jianbo Liu, Junjun He, Jimmy S. Ren, Yu Qiao, Hongsheng Li

Long-range contextual information is essential for achieving high-performance semantic segmentation.

Segmentation Semantic Segmentation

Paper
Add Code

Understanding the Dynamics of Second Language Writing through Keystroke Logging and Complexity Contours

no code implementations • LREC 2020 • Elma Kerz, Fabio Pruneri, Daniel Wiechmann, Yu Qiao, Marcus Str{\"o}bel

The purpose of this paper is twofold: [1] to introduce, to our knowledge, the largest available resource of keystroke logging (KSL) data generated by Etherpad (https://etherpad. org/), an open-source, web-based collaborative real-time editor, that captures the dynamics of second language (L2) production and [2] to relate the behavioral data from KSL to indices of syntactic and lexical complexity of the texts produced obtained from a tool that implements a sliding window approach capturing the progression of complexity within a text.

valid

Paper
Add Code

COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification

no code implementations • CVPR 2020 • Shijie Yu, Shihua Li, Dapeng Chen, Rui Zhao, Junjie Yan, Yu Qiao

To address the clothes changing person re-id problem, we construct a novel large-scale re-id benchmark named ClOthes ChAnging Person Set (COCAS), which provides multiple images of the same identity with different clothes.

Person Re-Identification

Paper
Add Code

Attention-Guided Hierarchical Structure Aggregation for Image Matting

1 code implementation • CVPR 2020 • Yu Qiao, Yuhao Liu, Xin Yang, Dongsheng Zhou, Mingliang Xu, Qiang Zhang, Xiaopeng Wei

In this paper, we propose an end-to-end Hierarchical Attention Matting Network (HAttMatting), which can predict the better structure of alpha mattes from single RGB images without additional input.

Ranked #6 on Image Matting on P3M-10k

Image Matting SSIM

Paper
Code

SmallBigNet: Integrating Core and Contextual Views for Video Classification

1 code implementation • CVPR 2020 • Xianhang Li, Yali Wang, Zhipeng Zhou, Yu Qiao

Our SmallBig network outperforms a number of recent state-of-the-art approaches, in terms of accuracy and/or efficiency.

Classification General Classification +1

Paper
Code

Becoming Linguistically Mature: Modeling English and German Children's Writing Development Across School Grades

no code implementations • WS 2020 • Elma Kerz, Yu Qiao, Daniel Wiechmann, Marcus Str{\"o}bel

In this paper we employ a novel approach to advancing our understanding of the development of writing in English and German children across school grades using classification tasks.

General Classification

Paper
Add Code

Visual Compositional Learning for Human-Object Interaction Detection

4 code implementations • ECCV 2020 • Zhi Hou, Xiaojiang Peng, Yu Qiao, DaCheng Tao

The integration of decomposition and composition enables VCL to share object and verb features among different HOI samples and images, and to generate new interaction samples and new types of HOI, and thus largely alleviates the long-tail distribution problem and benefits low-shot or zero-shot HOI detection.

Ranked #3 on Affordance Recognition on HICO-DET(Unknown Concepts)

Affordance Recognition Object

Paper
Code

Exploring Multi-Scale Feature Propagation and Communication for Image Super Resolution

no code implementations • 1 Aug 2020 • Ruicheng Feng, Weipeng Guan, Yu Qiao, Chao Dong

Multi-scale techniques have achieved great success in a wide range of computer vision tasks.

Image Super-Resolution

Paper
Add Code

Enhanced Quadratic Video Interpolation

2 code implementations • 10 Sep 2020 • Yihao Liu, Liangbin Xie, Li Si-Yao, Wenxiu Sun, Yu Qiao, Chao Dong

In this work, we further improve the performance of QVI from three facets and propose an enhanced quadratic video interpolation (EQVI) model.

Super-Resolution Video Frame Interpolation

309

Paper
Code

BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation

1 code implementation • 15 Sep 2020 • Haisheng Su, Weihao Gan, Wei Wu, Yu Qiao, Junjie Yan

In this paper, we present BSN++, a new framework which exploits complementary boundary regressor and relation modeling for temporal proposal generation.

Ranked #6 on Temporal Action Proposal Generation on ActivityNet-1.3

Relation Temporal Action Proposal Generation

Paper
Code

AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results

3 code implementations • 15 Sep 2020 • Kai Zhang, Martin Danelljan, Yawei Li, Radu Timofte, Jie Liu, Jie Tang, Gangshan Wu, Yu Zhu, Xiangyu He, Wenjie Xu, Chenghua Li, Cong Leng, Jian Cheng, Guangyang Wu, Wenyi Wang, Xiaohong Liu, Hengyuan Zhao, Xiangtao Kong, Jingwen He, Yu Qiao, Chao Dong, Maitreya Suin, Kuldeep Purohit, A. N. Rajagopalan, Xiaochuan Li, Zhiqiang Lang, Jiangtao Nie, Wei Wei, Lei Zhang, Abdul Muqeet, Jiwon Hwang, Subin Yang, JungHeum Kang, Sung-Ho Bae, Yongwoo Kim, Geun-Woo Jeon, Jun-Ho Choi, Jun-Hyuk Kim, Jong-Seok Lee, Steven Marty, Eric Marty, Dongliang Xiong, Siang Chen, Lin Zha, Jiande Jiang, Xinbo Gao, Wen Lu, Haicheng Wang, Vineeth Bhaskara, Alex Levinshtein, Stavros Tsogkas, Allan Jepson, Xiangzhen Kong, Tongtong Zhao, Shanshan Zhao, Hrishikesh P. S, Densen Puthussery, Jiji C. V, Nan Nan, Shuai Liu, Jie Cai, Zibo Meng, Jiaming Ding, Chiu Man Ho, Xuehui Wang, Qiong Yan, Yuzhi Zhao, Long Chen, Jiangtao Zhang, Xiaotong Luo, Liang Chen, Yanyun Qu, Long Sun, Wenhao Wang, Zhenbing Liu, Rushi Lan, Rao Muhammad Umer, Christian Micheloni

This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results.

Image Super-Resolution

2,713

Paper
Code

Collaborative Distillation in the Parameter and Spectrum Domains for Video Action Recognition

no code implementations • 15 Sep 2020 • Haisheng Su, Jing Su, Dongliang Wang, Weihao Gan, Wei Wu, Mengmeng Wang, Junjie Yan, Yu Qiao

Second, the parameter frequency distribution is further adopted to guide the student network to learn the appearance modeling process from the teacher.

Action Recognition Knowledge Distillation +1

Paper
Add Code

Conditional Sequential Modulation for Efficient Global Image Retouching

1 code implementation • ECCV 2020 • Jingwen He, Yihao Liu, Yu Qiao, Chao Dong

The base network acts like an MLP that processes each pixel independently and the condition network extracts the global features of the input image to generate a condition vector.

Image Retouching Photo Retouching

128

Paper
Code

Efficient Image Super-Resolution Using Pixel Attention

1 code implementation • 2 Oct 2020 • Hengyuan Zhao, Xiangtao Kong, Jingwen He, Yu Qiao, Chao Dong

Pixel attention (PA) is similar as channel attention and spatial attention in formulation.

Image Super-Resolution

309

Paper
Code

Suppressing Mislabeled Data via Grouping and Self-Attention

1 code implementation • ECCV 2020 • Xiaojiang Peng, Kai Wang, Zhaoyang Zeng, Qing Li, Jianfei Yang, Yu Qiao

Specifically, this plug-and-play AFM first leverages a \textit{group-to-attend} module to construct groups and assign attention weights for group-wise samples, and then uses a \textit{mixup} module with the attention weights to interpolate massive noisy-suppressed samples.

Image Classification

Paper
Code

Attention-Driven Dynamic Graph Convolutional Network for Multi-Label Image Recognition

1 code implementation • ECCV 2020 • Jin Ye, Junjun He, Xiaojiang Peng, Wenhao Wu, Yu Qiao

To this end, we propose an Attention-Driven Dynamic Graph Convolutional Network (ADD-GCN) to dynamically generate a specific graph for each image.

Ranked #22 on Multi-Label Classification on MS-COCO

Multi-Label Classification

122

Paper
Code

Learning Geometry-Disentangled Representation for Complementary Understanding of 3D Object Point Cloud

3 code implementations • 20 Dec 2020 • Mutian Xu, Junhao Zhang, Zhipeng Zhou, Mingye Xu, Xiaojuan Qi, Yu Qiao

GDANet introduces Geometry-Disentangle Module to dynamically disentangle point clouds into the contour and flat part of 3D objects, respectively denoted by sharp and gentle variation components.

Ranked #1 on Point Cloud Segmentation on PointCloud-C

3D Object Classification 3D Part Segmentation +2

Paper
Code

Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion Recognition

no code implementations • 27 Dec 2020 • Hengshun Zhou, Debin Meng, Yuanyuan Zhang, Xiaojiang Peng, Jun Du, Kai Wang, Yu Qiao

The audio-video based emotion recognition aims to classify a given video into basic emotions.

Ranked #1 on Facial Expression Recognition (FER) on Acted Facial Expressions In The Wild (AFEW)

Facial Expression Recognition (FER) Video Emotion Recognition

Paper
Add Code

Tripartite Information Mining and Integration for Image Matting

1 code implementation • ICCV 2021 • Yuhao Liu, Jiake Xie, Xiao Shi, Yu Qiao, Yujie Huang, Yong Tang, Xin Yang

Regarding the nature of image matting, most researches have focused on solutions for transition regions.

2k Image Matting

Paper
Code

Multi-scale Information Assembly for Image Matting

no code implementations • 7 Jan 2021 • Yu Qiao, Yuhao Liu, Qiang Zhu, Xin Yang, Yuxin Wang, Qiang Zhang, Xiaopeng Wei

Image matting is a long-standing problem in computer graphics and vision, mostly identified as the accurate estimation of the foreground in input images.

Image Matting

Paper
Add Code

Domain Generalization: A Survey

2 code implementations • 3 Mar 2021 • Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, Chen Change Loy

Generalization to out-of-distribution (OOD) data is a capability natural to humans yet challenging for machines to reproduce.

Action Recognition Data Augmentation +8

1,082

Paper
Code

ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

3 code implementations • CVPR 2021 • Xiangtao Kong, Hengyuan Zhao, Yu Qiao, Chao Dong

On this basis, we propose a new solution pipeline -- ClassSR that combines classification and SR in a unified framework.

2k 8k +3

350

Paper
Code

Unsupervised Person Re-Identification with Multi-Label Learning Guided Self-Paced Clustering

no code implementations • 8 Mar 2021 • Qing Li, Xiaojiang Peng, Yu Qiao, Qi Hao

The multi-label learning module leverages a memory feature bank and assigns each image with a multi-label vector based on the similarities between the image and feature bank.

Clustering Multi-Label Learning +2

Paper
Add Code

Detecting Human-Object Interaction via Fabricated Compositional Learning

1 code implementation • CVPR 2021 • Zhi Hou, Baosheng Yu, Yu Qiao, Xiaojiang Peng, DaCheng Tao

With the proposed object fabricator, we are able to generate large-scale HOI samples for rare and unseen categories to alleviate the open long-tailed issues in HOI detection.

Ranked #4 on Affordance Recognition on HICO-DET

Affordance Recognition Object +1

Paper
Code

PC-HMR: Pose Calibration for 3D Human Mesh Recovery from 2D Images/Videos

no code implementations • 16 Mar 2021 • Tianyu Luan, Yali Wang, Junhao Zhang, Zhe Wang, Zhipeng Zhou, Yu Qiao

By coupling advanced 3D pose estimators and HMR in a serial or parallel manner, these two frameworks can effectively correct human mesh with guidance of a concise pose calibration module.

Ranked #4 on 3D Human Pose Estimation on Surreal

3D Human Pose Estimation Human Mesh Recovery

Paper
Add Code

Investigate Indistinguishable Points in Semantic Segmentation of 3D Point Cloud

1 code implementation • 18 Mar 2021 • Mingye Xu, Zhipeng Zhou, Junhao Zhang, Yu Qiao

This paper investigates the indistinguishable points (difficult to predict label) in semantic segmentation for large-scale 3D point clouds.

3D Semantic Segmentation Segmentation

Paper
Code

Temporal Context Aggregation Network for Temporal Action Proposal Refinement

1 code implementation • CVPR 2021 • Zhiwu Qing, Haisheng Su, Weihao Gan, Dongliang Wang, Wei Wu, Xiang Wang, Yu Qiao, Junjie Yan, Changxin Gao, Nong Sang

In this paper, we propose Temporal Context Aggregation Network (TCANet) to generate high-quality action proposals through "local and global" temporal context aggregation and complementary as well as progressive boundary refinement.

Ranked #9 on Temporal Action Localization on ActivityNet-1.3

Action Detection Retrieval +2

Paper
Code

Smart Scribbles for Image Mating

no code implementations • 31 Mar 2021 • Xin Yang, Yu Qiao, Shaozhe Chen, Shengfeng He, BaoCai Yin, Qiang Zhang, Xiaopeng Wei, Rynson W. H. Lau

Image matting is an ill-posed problem that usually requires additional user input, such as trimaps or scribbles.

Image Matting

Paper
Add Code

Domain Generalization with MixStyle

3 code implementations • ICLR 2021 • Kaiyang Zhou, Yongxin Yang, Yu Qiao, Tao Xiang

Our method, termed MixStyle, is motivated by the observation that visual domain is closely related to image style (e. g., photo vs.~sketch images).

Ranked #57 on Domain Generalization on PACS

Domain Generalization Retrieval

3,144

Paper
Code

Affordance Transfer Learning for Human-Object Interaction Detection

2 code implementations • CVPR 2021 • Zhi Hou, Baosheng Yu, Yu Qiao, Xiaojiang Peng, DaCheng Tao

The proposed method can thus be used to 1) improve the performance of HOI detection, especially for the HOIs with unseen objects; and 2) infer the affordances of novel objects.

Ranked #2 on Affordance Recognition on HICO-DET(Unknown Concepts)

Affordance Detection Affordance Recognition +4

Paper
Code

Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation

1 code implementation • 12 Apr 2021 • Hongbin Xu, Zhipeng Zhou, Yu Qiao, Wenxiong Kang, Qiuxia Wu

Recent studies have witnessed that self-supervised methods based on view synthesis obtain clear progress on multi-view stereo (MVS).

Data Augmentation

150

Paper
Code

Very Lightweight Photo Retouching Network with Conditional Sequential Modulation

no code implementations • 13 Apr 2021 • Yihao Liu, Jingwen He, Xiangyu Chen, Zhengwen Zhang, Hengyuan Zhao, Chao Dong, Yu Qiao

In practice, photo retouching can be accomplished by a series of image processing operations.

Image Retouching Photo Retouching

Paper
Add Code

The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech

no code implementations • 17 Apr 2021 • Yu Qiao, Wei Zhou, Elma Kerz, Ralf Schlüter

In recent years, automated approaches to assessing linguistic complexity in second language (L2) writing have made significant progress in gauging learner performance, predicting human ratings of the quality of learner productions, and benchmarking L2 development.

Benchmarking

Paper
Add Code

NTIRE 2021 Challenge on Perceptual Image Quality Assessment

no code implementations • 7 May 2021 • Jinjin Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Yu Qiao, Shuhang Gu, Radu Timofte, Manri Cheon, SungJun Yoon, Byungyeon Kang, Junwoo Lee, Qing Zhang, Haiyang Guo, Yi Bin, Yuqing Hou, Hengliang Luo, Jingyu Guo, ZiRui Wang, Hai Wang, Wenming Yang, Qingyan Bai, Shuwei Shi, Weihao Xia, Mingdeng Cao, Jiahao Wang, Yifan Chen, Yujiu Yang, Yang Li, Tao Zhang, Longtao Feng, Yiting Liao, Junlin Li, William Thong, Jose Costa Pereira, Ales Leonardis, Steven McDonagh, Kele Xu, Lehan Yang, Hengxing Cai, Pengfei Sun, Seyed Mehdi Ayyoubzadeh, Ali Royat, Sid Ahmed Fezza, Dounia Hammou, Wassim Hamidouche, Sewoong Ahn, Gwangjin Yoon, Koki Tsubota, Hiroaki Akutsu, Kiyoharu Aizawa

This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021.

Image Quality Assessment Image Restoration

Paper
Add Code

Neighbourhood-guided Feature Reconstruction for Occluded Person Re-Identification

no code implementations • 16 May 2021 • Shijie Yu, Dapeng Chen, Rui Zhao, Haobin Chen, Yu Qiao

Person images captured by surveillance cameras are often occluded by various obstacles, which lead to defective feature representation and harm person re-identification (Re-ID) performance.

Person Re-Identification

Paper
Add Code

FineAction: A Fine-Grained Video Dataset for Temporal Action Localization

1 code implementation • 24 May 2021 • Yi Liu, LiMin Wang, Yali Wang, Xiao Ma, Yu Qiao

Temporal action localization (TAL) is an important and challenging problem in video understanding.

Fine-Grained Action Detection Temporal Localization +2

Paper
Code

Multiple Domain Experts Collaborative Learning: Multi-Source Domain Generalization For Person Re-Identification

no code implementations • 26 May 2021 • Shijie Yu, Feng Zhu, Dapeng Chen, Rui Zhao, Haobin Chen, Shixiang Tang, Jinguo Zhu, Yu Qiao

In UDCL, a universal expert supervises the learning of domain experts and continuously gathers knowledge from all domain experts.

Domain Generalization Meta-Learning +1

Paper
Add Code

HDRUNet: Single Image HDR Reconstruction with Denoising and Dequantization

1 code implementation • 27 May 2021 • Xiangyu Chen, Yihao Liu, Zhengwen Zhang, Yu Qiao, Chao Dong

In this work, we propose a novel learning-based approach using a spatially dynamic encoder-decoder network, HDRUNet, to learn an end-to-end mapping for single image HDR reconstruction with denoising and dequantization.

Ranked #2 on Inverse-Tone-Mapping on MSU HDR Video Reconstruction Benchmark

Denoising HDR Reconstruction +2

146

Paper
Code

TSI: Temporal Saliency Integration for Video Action Recognition

no code implementations • 2 Jun 2021 • Haisheng Su, Jinyuan Feng, Dongliang Wang, Weihao Gan, Wei Wu, Yu Qiao

Specifically, SME aims to highlight the motion-sensitive area through local-global motion modeling, where the saliency alignment and pyramidal feature difference are conducted successively between neighboring frames to capture motion dynamics with less noises caused by misaligned background.

Action Recognition Temporal Action Localization

Paper
Add Code

CT-Net: Channel Tensorization Network for Video Classification

1 code implementation • ICLR 2021 • Kunchang Li, Xianhang Li, Yali Wang, Jun Wang, Yu Qiao

It can learn to exploit spatial, temporal and channel attention in a high-dimensional manner, to improve the cooperative power of all the feature dimensions in our CT-Module.

Ranked #18 on Action Recognition on Something-Something V1

Action Classification Classification +1

Paper
Code

Scalable Transformers for Neural Machine Translation

no code implementations • 4 Jun 2021 • Peng Gao, Shijie Geng, Yu Qiao, Xiaogang Wang, Jifeng Dai, Hongsheng Li

In this paper, we propose a novel Scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters.

Machine Translation NMT +1

Paper
Add Code

Refining Pseudo Labels with Clustering Consensus over Generations for Unsupervised Object Re-identification

1 code implementation • CVPR 2021 • Xiao Zhang, Yixiao Ge, Yu Qiao, Hongsheng Li

Unsupervised object re-identification targets at learning discriminative representations for object retrieval without any annotations.

Clustering Pseudo Label +1

Paper
Code

Alzheimer's Disease Detection from Spontaneous Speech through Combining Linguistic Complexity and (Dis)Fluency Features with Pretrained Language Models

no code implementations • 16 Jun 2021 • Yu Qiao, Xuefeng Yin, Daniel Wiechmann, Elma Kerz

In this paper, we combined linguistic complexity and (dis)fluency features with pretrained language models for the task of Alzheimer's disease detection of the 2021 ADReSSo (Alzheimer's Dementia Recognition through Spontaneous Speech) challenge.

Alzheimer's Disease Detection

Paper
Add Code

Prior-Induced Information Alignment for Image Matting

no code implementations • 28 Jun 2021 • Yuhao Liu, Jiake Xie, Yu Qiao, Yong Tang and, Xin Yang

Image matting is an ill-posed problem that aims to estimate the opacity of foreground pixels in an image.

Image Matting

Paper
Add Code

MixStyle Neural Networks for Domain Generalization and Adaptation

2 code implementations • 5 Jul 2021 • Kaiyang Zhou, Yongxin Yang, Yu Qiao, Tao Xiang

MixStyle is easy to implement with a few lines of code, does not require modification to training objectives, and can fit a variety of learning paradigms including supervised domain generalization, semi-supervised domain generalization, and unsupervised domain adaptation.

Data Augmentation Domain Generalization +6

1,082

Paper
Code

Blind Image Super-Resolution: A Survey and Beyond

no code implementations • 7 Jul 2021 • Anran Liu, Yihao Liu, Jinjin Gu, Yu Qiao, Chao Dong

This paper serves as a systematic review on recent progress in blind image SR, and proposes a taxonomy to categorize existing methods into three different classes according to their ways of degradation modelling and the data used for solving the SR model.

Image Super-Resolution

Paper
Add Code

RankSRGAN: Super Resolution Generative Adversarial Networks with Learning to Rank

no code implementations • 20 Jul 2021 • Wenlong Zhang, Yihao Liu, Chao Dong, Yu Qiao

To address the problem, we propose Super-Resolution Generative Adversarial Networks with Ranker (RankSRGAN) to optimize generator in the direction of different perceptual metrics.

Image Super-Resolution Learning-To-Rank

Paper
Add Code

Transferable Knowledge-Based Multi-Granularity Aggregation Network for Temporal Action Localization: Submission to ActivityNet Challenge 2021

no code implementations • 27 Jul 2021 • Haisheng Su, Peiqin Zhuang, Yukun Li, Dongliang Wang, Weihao Gan, Wei Wu, Yu Qiao

This technical report presents an overview of our solution used in the submission to 2021 HACS Temporal Action Localization Challenge on both Supervised Learning Track and Weakly-Supervised Learning Track.

Transfer Learning Weakly-supervised Learning +2

Paper
Add Code

Discovering Distinctive "Semantics" in Super-Resolution Networks

no code implementations • 1 Aug 2021 • Yihao Liu, Anran Liu, Jinjin Gu, Zhipeng Zhang, Wenhao Wu, Yu Qiao, Chao Dong

We show that a well-trained deep SR network is naturally a good descriptor of degradation information.

Dimensionality Reduction Image Super-Resolution

Paper
Add Code

A New Journey from SDRTV to HDRTV

1 code implementation • ICCV 2021 • Xiangyu Chen, Zhengwen Zhang, Jimmy S. Ren, Lynhoo Tian, Yu Qiao, Chao Dong

However, most available resources are still in standard dynamic range (SDR).

Ranked #1 on Inverse-Tone-Mapping on MSU HDR Video Reconstruction Benchmark

Inverse-Tone-Mapping

120

Paper
Code

Digging into Uncertainty in Self-supervised Multi-view Stereo

1 code implementation • ICCV 2021 • Hongbin Xu, Zhipeng Zhou, Yali Wang, Wenxiong Kang, Baigui Sun, Hao Li, Yu Qiao

Specially, the limitations can be categorized into two types: ambiguious supervision in foreground and invalid supervision in background.

Image Reconstruction Self-Supervised Learning

Paper
Code

Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos

no code implementations • 15 Sep 2021 • Junhao Zhang, Yali Wang, Zhipeng Zhou, Tianyu Luan, Zhe Wang, Yu Qiao

Graph Convolution Network (GCN) has been successfully used for 3D human pose estimation in videos.

Ranked #10 on 3D Human Pose Estimation on HumanEva-I

3D Human Pose Estimation 3D Pose Estimation

Paper
Add Code

A Novel Hybrid Convolutional Neural Network for Accurate Organ Segmentation in 3D Head and Neck CT Images

no code implementations • 26 Sep 2021 • Zijie Chen, Cheng Li, Junjun He, Jin Ye, Diping Song, Shanshan Wang, Lixu Gu, Yu Qiao

An essential step of RT planning is the accurate segmentation of various organs-at-risks (OARs) in HaN CT images.

Organ Segmentation Segmentation

Paper
Add Code

Group Shift Pointwise Convolution for Volumetric Medical Image Segmentation

no code implementations • 26 Sep 2021 • Junjun He, Jin Ye, Cheng Li, Diping Song, Wanli Chen, Shanshan Wang, Lixu Gu, Yu Qiao

Recent studies have witnessed the effectiveness of 3D convolutions on segmenting volumetric medical images.

Image Segmentation Semantic Segmentation +1

Paper
Add Code

UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning

3 code implementations • ICLR 2022 • Kunchang Li, Yali Wang, Gao Peng, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao

For Something-Something V1 and V2, our UniFormer achieves new state-of-the-art performances of 60. 8% and 71. 4% top-1 accuracy respectively.

Ranked #8 on Action Recognition on Something-Something V1

Action Classification Action Recognition +1

2,991

Paper
Code

Self-Slimming Vision Transformer

no code implementations • 29 Sep 2021 • Zhuofan Zong, Kunchang Li, Guanglu Song, Yali Wang, Yu Qiao, Biao Leng, Yu Liu

Specifically, we first design a novel Token Slimming Module (TSM), which can boost the inference efficiency of ViTs by dynamic token aggregation.

Knowledge Distillation

Paper
Add Code

Temporally Consistent Video Colorization with Deep Feature Propagation and Self-regularization Learning

1 code implementation • 9 Oct 2021 • Yihao Liu, Hengyuan Zhao, Kelvin C. K. Chan, Xintao Wang, Chen Change Loy, Yu Qiao, Chao Dong

We address this problem from a new perspective, by jointly considering colorization and temporal consistency in a unified framework.

Colorization Image Colorization

Paper
Code

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

2 code implementations • 9 Oct 2021 • Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, Yu Qiao

Large-scale contrastive vision-language pre-training has shown significant progress in visual representation learning.

Prompt Engineering Representation Learning

389

Paper
Code

Estimating IRI based on pavement distress type, density, and severity: Insights from machine learning techniques

no code implementations • 11 Oct 2021 • Yu Qiao, Sikai Chen, Majed Alinizzi, Miltos Alamaniotis, Samuel Labi

However, it is costly to measure IRI, and for this reason, certain road classes are excluded from IRI measurements at a network level.

Paper
Add Code

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

1 code implementation • 6 Nov 2021 • Renrui Zhang, Rongyao Fang, Wei zhang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li

To further enhance CLIP's few-shot capability, CLIP-Adapter proposed to fine-tune a lightweight residual feature adapter and significantly improves the performance for few-shot classification.

Language Modelling Transfer Learning

470

Paper
Code

Prediction of Listener Perception of Argumentative Speech in a Crowdsourced Dataset Using (Psycho-)Linguistic and Fluency Features

no code implementations • 13 Nov 2021 • Yu Qiao, Sourabh Zanwar, Rishab Bhattacharyya, Daniel Wiechmann, Wei Zhou, Elma Kerz, Ralf Schlüter

One of the key communicative competencies is the ability to maintain fluency in monologic speech and the ability to produce sophisticated language to argue a position convincingly.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

INTERN: A New Learning Paradigm Towards General Vision

no code implementations • 16 Nov 2021 • Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He, Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu, Gengshi Huang, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao

Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society.

Paper
Add Code

MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning

2 code implementations • 24 Nov 2021 • David Junhao Zhang, Kunchang Li, Yali Wang, Yunpeng Chen, Shashwat Chandra, Yu Qiao, Luoqi Liu, Mike Zheng Shou

With such multi-dimension and multi-scale factorization, our MorphMLP block can achieve a great accuracy-computation balance.

Ranked #38 on Action Recognition on Something-Something V2 (using extra training data)

Action Recognition Image Classification +3

165

Paper
Code

Self-slimmed Vision Transformer

1 code implementation • 24 Nov 2021 • Zhuofan Zong, Kunchang Li, Guanglu Song, Yali Wang, Yu Qiao, Biao Leng, Yu Liu

Specifically, we first design a novel Token Slimming Module (TSM), which can boost the inference efficiency of ViTs by dynamic token aggregation.

Knowledge Distillation

Paper
Code

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

1 code implementation • 26 Nov 2021 • Changyao Tian, Wenhai Wang, Xizhou Zhu, Jifeng Dai, Yu Qiao

Deep learning-based models encounter challenges when processing long-tailed data in the real world.

Ranked #2 on Long-tail Learning on iNaturalist 2018 (using extra training data)

Image Classification Long-tail Learning +1

Paper
Code

A Simple Long-Tailed Recognition Baseline via Vision-Language Model

1 code implementation • 29 Nov 2021 • Teli Ma, Shijie Geng, Mengmeng Wang, Jing Shao, Jiasen Lu, Hongsheng Li, Peng Gao, Yu Qiao

Recent advances in large-scale contrastive visual-language pretraining shed light on a new pathway for visual recognition.

Ranked #4 on Long-tail Learning on Places-LT (using extra training data)

Contrastive Learning Language Modelling +3

Paper
Code

PointCLIP: Point Cloud Understanding by CLIP

2 code implementations • CVPR 2022 • Renrui Zhang, Ziyu Guo, Wei zhang, Kunchang Li, Xupeng Miao, Bin Cui, Yu Qiao, Peng Gao, Hongsheng Li

On top of that, we design an inter-view adapter to better extract the global feature and adaptively fuse the few-shot knowledge learned from 3D into CLIP pre-trained in 2D.

Ranked #3 on 3D Open-Vocabulary Instance Segmentation on STPLS3D

3D Open-Vocabulary Instance Segmentation Few-Shot Learning +6

291

Paper
Code

CPRAL: Collaborative Panoptic-Regional Active Learning for Semantic Segmentation

no code implementations • 11 Dec 2021 • Yu Qiao, Jincheng Zhu, Chengjiang Long, Zeyao Zhang, Yuxin Wang, Zhenjun Du, Xin Yang

Acquiring the most representative examples via active learning (AL) can benefit many data-dependent computer vision tasks by minimizing efforts of image-level or pixel-wise annotations.

Active Learning Semantic Segmentation

Paper
Add Code

Multi-View Partial (MVP) Point Cloud Challenge 2021 on Completion and Registration: Methods and Results

2 code implementations • 22 Dec 2021 • Liang Pan, Tong Wu, Zhongang Cai, Ziwei Liu, Xumin Yu, Yongming Rao, Jiwen Lu, Jie zhou, Mingye Xu, Xiaoyuan Luo, Kexue Fu, Peng Gao, Manning Wang, Yali Wang, Yu Qiao, Junsheng Zhou, Xin Wen, Peng Xiang, Yu-Shen Liu, Zhizhong Han, Yuanjie Yan, Junyi An, Lifa Zhu, Changwei Lin, Dongrui Liu, Xin Li, Francisco Gómez-Fernández, Qinlong Wang, Yang Yang

Based on the MVP dataset, this paper reports methods and results in the Multi-View Partial Point Cloud Challenge 2021 on Completion and Registration.

3D Reconstruction Point Cloud Completion +2

153

Paper
Code

Reflash Dropout in Image Super-Resolution

no code implementations • CVPR 2022 • Xiangtao Kong, Xina Liu, Jinjin Gu, Yu Qiao, Chao Dong

Dropout is designed to relieve the overfitting problem in high-level vision tasks but is rarely applied in low-level vision tasks, like image super-resolution (SR).

Common Sense Reasoning Image Super-Resolution +1

Paper
Add Code

UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning

2 code implementations • 12 Jan 2022 • Kunchang Li, Yali Wang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao

For Something-Something V1 and V2, our UniFormer achieves new state-of-the-art performances of 60. 9% and 71. 2% top-1 accuracy respectively.

Representation Learning

2,991

Paper
Code

CP-Net: Contour-Perturbed Reconstruction Network for Self-Supervised Point Cloud Learning

no code implementations • 20 Jan 2022 • Mingye Xu, Yali Wang, Zhipeng Zhou, Hongbin Xu, Yu Qiao

To fill this gap, we propose a generic Contour-Perturbed Reconstruction Network (CP-Net), which can effectively guide self-supervised reconstruction to learn semantic content in the point cloud, and thus promote discriminative power of point cloud representation.

Point cloud reconstruction Self-Supervised Learning

Paper
Add Code

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

7 code implementations • 24 Jan 2022 • Kunchang Li, Yali Wang, Junhao Zhang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao

Different from the typical transformer blocks, the relation aggregators in our UniFormer block are equipped with local and global token affinity respectively in shallow and deep layers, allowing to tackle both redundancy and dependency for efficient and effective representation learning.

Ranked #153 on Image Classification on ImageNet

Image Classification object-detection +5

778

Paper
Code

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning

no code implementations • 9 Feb 2022 • Kexue Fu, Peng Gao, Renrui Zhang, Hongsheng Li, Yu Qiao, Manning Wang

Especially, we develop a variant of ViT for 3D point cloud feature extraction, which also achieves comparable results with existing backbones when combined with our framework, and visualization of the attention maps show that our model does understand the point cloud by combining the global shape information and multiple local structural information, which is consistent with the inspiration of our representation learning method.

Contrastive Learning Knowledge Distillation +1

Paper
Add Code

Hilbert Flattening: a Locality-Preserving Matrix Unfolding Method for Visual Discrimination

no code implementations • 21 Feb 2022 • Qingsong Zhao, Yi Wang, Zhipeng Zhou, Duoqian Miao, LiMin Wang, Yu Qiao, Cairong Zhao

Flattening is essential in computer vision by converting multi-dimensional feature maps or images into one-dimensional vectors.

Image Classification Representation Learning +1

Paper
Add Code

Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy

2 code implementations • 15 Mar 2022 • Yuanhan Zhang, Qinghong Sun, Yichun Zhou, Zexin He, Zhenfei Yin, Kun Wang, Lu Sheng, Yu Qiao, Jing Shao, Ziwei Liu

This work thus proposes a novel active learning framework for realistic dataset annotation.

Ranked #1 on Image Classification on Food-101 (using extra training data)

Active Learning Classification +3

161

Paper
Code

Measuring the Impact of (Psycho-)Linguistic and Readability Features and Their Spill Over Effects on the Prediction of Eye Movement Patterns

no code implementations • ACL 2022 • Daniel Wiechmann, Yu Qiao, Elma Kerz, Justus Mattern

There is a growing interest in the combined use of NLP and machine learning methods to predict gaze patterns during naturalistic reading.

Paper
Add Code

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

no code implementations • 16 Mar 2022 • Yinan He, Gengshi Huang, Siyu Chen, Jianing Teng, Wang Kun, Zhenfei Yin, Lu Sheng, Ziwei Liu, Yu Qiao, Jing Shao

2) Squeeze Stage: X-Learner condenses the model to a reasonable size and learns the universal and generalizable representation for various tasks transferring.

object-detection Object Detection +3

Paper
Add Code

PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark

2 code implementations • 21 Mar 2022 • Li Chen, Chonghao Sima, Yang Li, Zehan Zheng, Jiajie Xu, Xiangwei Geng, Hongyang Li, Conghui He, Jianping Shi, Yu Qiao, Junchi Yan

Methods for 3D lane detection have been recently proposed to address the issue of inaccurate lane layouts in many autonomous driving scenarios (uphill/downhill, bump, etc.).

Ranked #5 on 3D Lane Detection on Apollo Synthetic 3D Lane

3D Lane Detection Autonomous Driving +1

468

Paper
Code

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection

1 code implementation • ICCV 2023 • Renrui Zhang, Han Qiu, Tai Wang, Ziyu Guo, Xuanzhuo Xu, Ziteng Cui, Yu Qiao, Peng Gao, Hongsheng Li

In this paper, we introduce the first DETR framework for Monocular DEtection with a depth-guided TRansformer, named MonoDETR.

Ranked #9 on 3D Object Detection From Monocular Images on KITTI-360

3D Object Detection From Monocular Images Autonomous Driving +3

311

Paper
Code

POS-BERT: Point Cloud One-Stage BERT Pre-Training

1 code implementation • 3 Apr 2022 • Kexue Fu, Peng Gao, Shaolei Liu, Renrui Zhang, Yu Qiao, Manning Wang

We propose to use the dynamically updated momentum encoder as the tokenizer, which is updated and outputs the dynamic supervision signal along with the training process.

Contrastive Learning Language Modelling +3

Paper
Code

Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition

no code implementations • CVPR 2022 • Mingfei Han, David Junhao Zhang, Yali Wang, Rui Yan, Lina Yao, Xiaojun Chang, Yu Qiao

Learning spatial-temporal relation among multiple actors is crucial for group activity recognition.

Group Activity Recognition

Paper
Add Code

Pushing on Personality Detection from Verbal Behavior: A Transformer Meets Text Contours of Psycholinguistic Features

no code implementations • WASSA (ACL) 2022 • Elma Kerz, Yu Qiao, Sourabh Zanwar, Daniel Wiechmann

Research at the intersection of personality psychology, computer science, and linguistics has recently focused increasingly on modeling and predicting personality from language use.

Language Modelling

Paper
Add Code

Cross Domain Object Detection by Target-Perceived Dual Branch Distillation

1 code implementation • CVPR 2022 • Mengzhe He, Yali Wang, Jiaxi Wu, Yiru Wang, Hanqing Li, Bo Li, Weihao Gan, Wei Wu, Yu Qiao

It can adaptively enhance source detector to perceive objects in a target image, by leveraging target proposal contexts from iterative cross-attention.

Object object-detection +1

Paper
Code

ConvMAE: Masked Convolution Meets Masked Autoencoders

4 code implementations • 8 May 2022 • Peng Gao, Teli Ma, Hongsheng Li, Ziyi Lin, Jifeng Dai, Yu Qiao

Masked auto-encoding for feature pretraining and multi-scale hybrid convolution-transformer architectures can further unleash the potentials of ViT, leading to state-of-the-art performances on image classification, detection and semantic segmentation.

Computational Efficiency Image Classification +2

455

Paper
Code

Activating More Pixels in Image Super-Resolution Transformer

2 code implementations • CVPR 2023 • Xiangyu Chen, Xintao Wang, Jiantao Zhou, Yu Qiao, Chao Dong

In the training stage, we additionally adopt a same-task pre-training strategy to exploit the potential of the model for further improvement.

Ranked #1 on Image Super-Resolution on Set5 - 2x upscaling

Image Super-Resolution

1,076

Paper
Code

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

2 code implementations • 11 May 2022 • Yawei Li, Kai Zhang, Radu Timofte, Luc van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Yanbo Wang, Xiaozhong Ji, Chuming Lin, Donghao Luo, Ying Tai, Chengjie Wang, Zhizhong Zhang, Yuan Xie, Shen Cheng, Ziwei Luo, Lei Yu, Zhihong Wen, Qi Wu1, Youwei Li, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Yuanfei Huang, Meiguang Jin, Hua Huang, Jing Liu, Xinjian Zhang, Yan Wang, Lingshun Long, Gen Li, Yuanfan Zhang, Zuowei Cao, Lei Sun, Panaetov Alexander, Yucong Wang, Minjie Cai, Li Wang, Lu Tian, Zheyuan Wang, Hongbing Ma, Jie Liu, Chao Chen, Yidong Cai, Jie Tang, Gangshan Wu, Weiran Wang, Shirui Huang, Honglei Lu, Huan Liu, Keyan Wang, Jun Chen, Shi Chen, Yuchun Miao, Zimo Huang, Lefei Zhang, Mustafa Ayazoğlu, Wei Xiong, Chengyi Xiong, Fei Wang, Hao Li, Ruimian Wen, Zhijing Yang, Wenbin Zou, Weixin Zheng, Tian Ye, Yuncheng Zhang, Xiangzhen Kong, Aditya Arora, Syed Waqas Zamir, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Dandan Gaoand Dengwen Zhouand Qian Ning, Jingzhu Tang, Han Huang, YuFei Wang, Zhangheng Peng, Haobo Li, Wenxue Guan, Shenghua Gong, Xin Li, Jun Liu, Wanjun Wang, Dengwen Zhou, Kun Zeng, Hanjiang Lin, Xinyu Chen, Jinsheng Fang

The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29. 00dB on DIV2K validation set.

Image Super-Resolution

117

Paper
Code

Blueprint Separable Residual Network for Efficient Image Super-Resolution

1 code implementation • 12 May 2022 • Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Jinjin Gu, Yu Qiao, Chao Dong

One is the usage of blueprint separable convolution (BSConv), which takes place of the redundant convolution operation.

Image Super-Resolution

155

Paper
Code

Evaluating the Generalization Ability of Super-Resolution Networks

no code implementations • 14 May 2022 • Yihao Liu, Hengyuan Zhao, Jinjin Gu, Yu Qiao, Chao Dong

However, research on the generalization ability of Super-Resolution (SR) networks is currently absent.

Super-Resolution

Paper
Add Code

Vision Transformer Adapter for Dense Predictions

1 code implementation • 17 May 2022 • Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, Yu Qiao

This work investigates a simple yet powerful dense prediction task adapter for Vision Transformer (ViT).

Ranked #4 on Semantic Segmentation on PASCAL Context

Instance Segmentation Panoptic Segmentation +1

1,118

Paper
Code

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training

3 code implementations • 28 May 2022 • Renrui Zhang, Ziyu Guo, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, Hongsheng Li, Peng Gao

By fine-tuning on downstream tasks, Point-M2AE achieves 86. 43% accuracy on ScanObjectNN, +3. 36% to the second-best, and largely benefits the few-shot classification, part segmentation and 3D object detection with the hierarchical pre-training scheme.

Ranked #4 on 3D Point Cloud Linear Classification on ModelNet40 (using extra training data)

3D Object Detection 3D Point Cloud Linear Classification +5

198

Paper
Code

You Only Need 90K Parameters to Adapt Light: A Light Weight Transformer for Image Enhancement and Exposure Correction

1 code implementation • 30 May 2022 • Ziteng Cui, Kunchang Li, Lin Gu, Shenghan Su, Peng Gao, Zhengkai Jiang, Yu Qiao, Tatsuya Harada

Challenging illumination conditions (low-light, under-exposure and over-exposure) in the real world not only cast an unpleasant visual appearance but also taint the computer vision tasks.

Ranked #2 on Image Enhancement on Exposure-Errors

Low-Light Image Enhancement object-detection +2

421

Paper
Code

Siamese Image Modeling for Self-Supervised Vision Representation Learning

2 code implementations • CVPR 2023 • Chenxin Tao, Xizhou Zhu, Weijie Su, Gao Huang, Bin Li, Jie zhou, Yu Qiao, Xiaogang Wang, Jifeng Dai

Driven by these analysis, we propose Siamese Image Modeling (SiameseIM), which predicts the dense representations of an augmented view, based on another masked view from the same image but with different augmentations.

Representation Learning Self-Supervised Learning +1

Paper
Code

Level 2 Autonomous Driving on a Single Device: Diving into the Devils of Openpilot

no code implementations • 16 Jun 2022 • Li Chen, Tutian Tang, Zhitian Cai, Yang Li, Penghao Wu, Hongyang Li, Jianping Shi, Junchi Yan, Yu Qiao

Equipped with a wide span of sensors, predominant autonomous driving solutions are becoming more modular-oriented for safe system design.

Autonomous Driving

Paper
Add Code

Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline

1 code implementation • 16 Jun 2022 • Penghao Wu, Xiaosong Jia, Li Chen, Junchi Yan, Hongyang Li, Yu Qiao

The two branches are connected so that the control branch receives corresponding guidance from the trajectory branch at each time step.

Ranked #3 on Autonomous Driving on CARLA Leaderboard

Autonomous Driving CARLA longest6 +1

Paper
Code

CP3: Unifying Point Cloud Completion by Pretrain-Prompt-Predict Paradigm

no code implementations • 12 Jul 2022 • Mingye Xu, Yali Wang, Yihao Liu, Tong He, Yu Qiao

Inspired by prompting approaches from NLP, we creatively reinterpret point cloud generation and refinement as the prompting and predicting stages, respectively.

Point Cloud Completion

Paper
Add Code

HQANN: Efficient and Robust Similarity Search for Hybrid Queries with Structured and Unstructured Constraints

no code implementations • 16 Jul 2022 • Wei Wu, Junlin He, Yu Qiao, Guoheng Fu, Li Liu, Jin Yu

The in-memory approximate nearest neighbor search (ANNS) algorithms have achieved great success for fast high-recall query processing, but are extremely inefficient when handling hybrid queries with unstructured (i. e., feature vectors) and structured (i. e., related attributes) constraints.

Attribute

Paper
Add Code

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification

3 code implementations • 19 Jul 2022 • Renrui Zhang, Zhang Wei, Rongyao Fang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li

On top of that, the performance of Tip-Adapter can be further boosted to be state-of-the-art on ImageNet by fine-tuning the cache model for 10$\times$ fewer epochs than existing methods, which is both effective and efficient.

Retrieval Transfer Learning

470

Paper
Code

GenText: Unsupervised Artistic Text Generation via Decoupled Font and Texture Manipulation

no code implementations • 20 Jul 2022 • Qirui Huang, Bin Fu, Aozhong zhang, Yu Qiao

Specifically, our current work incorporates three different stages, stylization, destylization, and font transfer, respectively, into a unified platform with a single powerful encoder network and two separate style generator networks, one for font transfer, the other for stylization and destylization.

Style Transfer Text Style Transfer

Paper
Add Code

Vision-Centric BEV Perception: A Survey

1 code implementation • 4 Aug 2022 • Yuexin Ma, Tai Wang, Xuyang Bai, Huitong Yang, Yuenan Hou, Yaming Wang, Yu Qiao, Ruigang Yang, Dinesh Manocha, Xinge Zhu

In recent years, vision-centric Bird's Eye View (BEV) perception has garnered significant interest from both industry and academia due to its inherent advantages, such as providing an intuitive representation of the world and being conducive to data fusion.

636

Paper
Code

Frozen CLIP Models are Efficient Video Learners

2 code implementations • 6 Aug 2022 • Ziyi Lin, Shijie Geng, Renrui Zhang, Peng Gao, Gerard de Melo, Xiaogang Wang, Jifeng Dai, Yu Qiao, Hongsheng Li

Video recognition has been dominated by the end-to-end learning paradigm -- first initializing a video recognition model with weights of a pretrained image model and then conducting end-to-end training on videos.

Ranked #26 on Action Classification on Kinetics-400 (using extra training data)

Action Classification Video Recognition

155

Paper
Code

Recurrent Bilinear Optimization for Binary Neural Networks

2 code implementations • 4 Sep 2022 • Sheng Xu, Yanjing Li, Tiancheng Wang, Teli Ma, Baochang Zhang, Peng Gao, Yu Qiao, Jinhu Lv, Guodong Guo

To address this issue, Recurrent Bilinear Optimization is proposed to improve the learning process of BNNs (RBONNs) by associating the intrinsic bilinear variables in the back propagation process.

object-detection Object Detection

Paper
Code

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe

2 code implementations • 12 Sep 2022 • Hongyang Li, Chonghao Sima, Jifeng Dai, Wenhai Wang, Lewei Lu, Huijie Wang, Jia Zeng, Zhiqi Li, Jiazhi Yang, Hanming Deng, Hao Tian, Enze Xie, Jiangwei Xie, Li Chen, Tianyu Li, Yang Li, Yulu Gao, Xiaosong Jia, Si Liu, Jianping Shi, Dahua Lin, Yu Qiao

As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.

Autonomous Driving

2,870

Paper
Code

Collaboration of Pre-trained Models Makes Better Few-shot Learner

no code implementations • 25 Sep 2022 • Renrui Zhang, Bohao Li, Wei zhang, Hao Dong, Hongsheng Li, Peng Gao, Yu Qiao

In this paper, we propose CoMo, a Collaboration of pre-trained Models that incorporates diverse prior knowledge from various pre-training paradigms for better few-shot learning.

Few-Shot Learning Representation Learning

Paper
Add Code

Low-Resolution Action Recognition for Tiny Actions Challenge

no code implementations • 28 Sep 2022 • BoYu Chen, Yu Qiao, Yali Wang

Second, these activities are naturally distributed in a long-tailed way.

Action Recognition Super-Resolution

Paper
Add Code

Efficient Image Super-Resolution using Vast-Receptive-Field Attention

1 code implementation • 12 Oct 2022 • Lin Zhou, Haoming Cai, Jinjin Gu, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Yu Qiao, Chao Dong

In this work, we design an efficient SR network by improving the attention mechanism.

Image Super-Resolution

Paper
Code

Hierarchical and Progressive Image Matting

no code implementations • 13 Oct 2022 • Yu Qiao, Yuhao Liu, Ziqi Wei, Yuxin Wang, Qiang Cai, Guofeng Zhang, Xin Yang

In this paper, we propose an end-to-end Hierarchical and Progressive Attention Matting Network (HAttMatting++), which can better predict the opacity of the foreground from single RGB images without additional input.

Image Matting SSIM

Paper
Add Code

Wider and Higher: Intensive Integration and Global Foreground Perception for Image Matting

no code implementations • 13 Oct 2022 • Yu Qiao, Ziqi Wei, Yuhao Liu, Yuxin Wang, Dongsheng Zhou, Qiang Zhang, Xin Yang

This paper reviews recent deep-learning-based matting research and conceives our wider and higher motivation for image matting.

Image Matting

Paper
Add Code

VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe Inspection

no code implementations • 20 Oct 2022 • Yi Liu, Xuan Zhang, Ying Li, Guixin Liang, Yabing Jiang, Lixia Qiu, Haiping Tang, Fei Xie, Wei Yao, Yi Dai, Yu Qiao, Yali Wang

For this reason, we propose to advance research areas of video understanding, with a shift from traditional action recognition to industrial anomaly analysis.

Temporal Defect Localization Video Defect Classification

Paper
Add Code

PalGAN: Image Colorization with Palette Generative Adversarial Networks

1 code implementation • 20 Oct 2022 • Yi Wang, Menghan Xia, Lu Qi, Jing Shao, Yu Qiao

Multimodal ambiguity and color bleeding remain challenging in colorization.

Colorization Image Colorization

Paper
Code

Demystify Transformers & Convolutions in Modern Image Deep Networks

1 code implementation • 10 Nov 2022 • Xiaowei Hu, Min Shi, Weiyun Wang, Sitong Wu, Linjie Xing, Wenhai Wang, Xizhou Zhu, Lewei Lu, Jie zhou, Xiaogang Wang, Yu Qiao, Jifeng Dai

Our experiments on various tasks and an analysis of inductive bias show a significant performance boost due to advanced network-level and block-level designs, but performance differences persist among different STMs.

Image Deep Networks Spatial Token Mixer

Paper
Code

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

2 code implementations • CVPR 2023 • Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state.

Ranked #1 on Instance Segmentation on COCO test-dev (AP50 metric, using extra training data)

Classification Image Classification +3

2,310

Paper
Code

Stare at What You See: Masked Image Modeling without Reconstruction

no code implementations • CVPR 2023 • Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo

However, unlike the low-level features such as pixel values, we argue the features extracted by powerful teacher models already encode rich semantic correlation across regions in an intact image. This raises one question: is reconstruction necessary in Masked Image Modeling (MIM) with a teacher model?

Paper
Add Code

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

2 code implementations • CVPR 2023 • Hao Li, Jinguo Zhu, Xiaohu Jiang, Xizhou Zhu, Hongsheng Li, Chun Yuan, Xiaohua Wang, Yu Qiao, Xiaogang Wang, Wenhai Wang, Jifeng Dai

In this paper, we propose Uni-Perceiver v2, which is the first generalist model capable of handling major large-scale vision and vision-language tasks with competitive performance.

Language Modelling Multi-Task Learning

2,310

Paper
Code

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer

3 code implementations • 17 Nov 2022 • Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, LiMin Wang, Yu Qiao

UniFormer has successfully alleviated this issue, by unifying convolution and self-attention as a relation aggregator in the transformer format.

Video Understanding

3,888

Paper
Code

Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information

1 code implementation • CVPR 2023 • Weijie Su, Xizhou Zhu, Chenxin Tao, Lewei Lu, Bin Li, Gao Huang, Yu Qiao, Xiaogang Wang, Jie zhou, Jifeng Dai

It has been proved that combining multiple pre-training strategies and data from various modalities/sources can greatly boost the training of large-scale models.

Ranked #2 on Semantic Segmentation on ADE20K (using extra training data)

Image Classification Long-tailed Object Detection +3

Paper
Code

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

2 code implementations • 17 Nov 2022 • Guo Chen, Sen Xing, Zhe Chen, Yi Wang, Kunchang Li, Yizhuo Li, Yi Liu, Jiahao Wang, Yin-Dong Zheng, Bingkun Huang, Zhiyu Zhao, Junting Pan, Yifei HUANG, Zun Wang, Jiashuo Yu, Yinan He, Hongjie Zhang, Tong Lu, Yali Wang, LiMin Wang, Yu Qiao

In this report, we present our champion solutions to five tracks at Ego4D challenge.

Ranked #1 on State Change Object Detection on Ego4D

Future Hand Prediction Moment Queries +7

Paper
Code

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

2 code implementations • CVPR 2023 • Chenyu Yang, Yuntao Chen, Hao Tian, Chenxin Tao, Xizhou Zhu, Zhaoxiang Zhang, Gao Huang, Hongyang Li, Yu Qiao, Lewei Lu, Jie zhou, Jifeng Dai

The proposed method is verified with a wide spectrum of traditional and modern image backbones and achieves new SoTA results on the large-scale nuScenes dataset.

Ranked #5 on 3D Object Detection on Rope3D

3D Object Detection

2,870

Paper
Code

ResFormer: Scaling ViTs with Multi-Resolution Training

1 code implementation • CVPR 2023 • Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yu Qiao, Yu-Gang Jiang

We introduce, ResFormer, a framework that is built upon the seminal idea of multi-resolution training for improved performance on a wide spectrum of, mostly unseen, testing resolutions.

Action Recognition Image Classification +4

Paper
Code

Improving Training and Inference of Face Recognition Models via Random Temperature Scaling

no code implementations • 2 Dec 2022 • Lei Shang, Mouxiao Huang, Wu Shi, Yuchen Liu, Yang Liu, Fei Wang, Baigui Sun, Xuansong Xie, Yu Qiao

Intuitively, FR algorithms can benefit from both the estimation of uncertainty and the detection of out-of-distribution (OOD) samples.

Face Recognition Out of Distribution (OOD) Detection

Paper
Add Code

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

no code implementations • 4 Dec 2022 • Qihuang Zhong, Liang Ding, Yibing Zhan, Yu Qiao, Yonggang Wen, Li Shen, Juhua Liu, Baosheng Yu, Bo Du, Yixin Chen, Xinbo Gao, Chunyan Miao, Xiaoou Tang, DaCheng Tao

This technical report briefly describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard.

Ranked #1 on Common Sense Reasoning on ReCoRD

Common Sense Reasoning coreference-resolution +5

Paper
Add Code

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

1 code implementation • 6 Dec 2022 • Yi Wang, Kunchang Li, Yizhuo Li, Yinan He, Bingkun Huang, Zhiyu Zhao, Hongjie Zhang, Jilan Xu, Yi Liu, Zun Wang, Sen Xing, Guo Chen, Junting Pan, Jiashuo Yu, Yali Wang, LiMin Wang, Yu Qiao

Specifically, InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives, and selectively coordinates video representations of these two complementary frameworks in a learnable manner to boost various video applications.

Ranked #1 on Action Recognition on Something-Something V1 (using extra training data)

Action Classification Contrastive Learning +8

921

Paper
Code

Diff-Font: Diffusion Model for Robust One-Shot Font Generation

1 code implementation • 12 Dec 2022 • Haibin He, Xinyuan Chen, Chaoyue Wang, Juhua Liu, Bo Du, DaCheng Tao, Yu Qiao

Specifically, a large stroke-wise dataset is constructed, and a stroke-wise diffusion model is proposed to preserve the structure and the completion of each generated character.

Font Generation

Paper
Code

Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders

2 code implementations • CVPR 2023 • Renrui Zhang, Liuhui Wang, Yu Qiao, Peng Gao, Hongsheng Li

Pre-training by numerous image data has become de-facto for robust 2D representations.

Ranked #2 on 3D Point Cloud Linear Classification on ModelNet40 (using extra training data)

3D Point Cloud Linear Classification Few-Shot 3D Point Cloud Classification

198

Paper
Code

(Psycho-)Linguistic Features Meet Transformer Models for Improved Explainable and Controllable Text Simplification

no code implementations • 19 Dec 2022 • Yu Qiao, Xiaofei Li, Daniel Wiechmann, Elma Kerz

State-of-the-art text simplification (TS) systems adopt end-to-end neural network models to directly generate the simplified version of the input text, and usually function as a blackbox.

Text Simplification

Paper
Add Code

MANTIS at TSAR-2022 Shared Task: Improved Unsupervised Lexical Simplification with Pretrained Encoders

no code implementations • 19 Dec 2022 • Xiaofei Li, Daniel Wiechmann, Yu Qiao, Elma Kerz

In this paper we present our contribution to the TSAR-2022 Shared Task on Lexical Simplification of the EMNLP 2022 Workshop on Text Simplification, Accessibility, and Readability.

Language Modelling Lexical Simplification +4

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.