Search Results for author: Ping Luo

Found 270 papers, 155 papers with code

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

23 code implementations • NeurIPS 2021 • Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo

We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders.

Ranked #1 on Semantic Segmentation on COCO-Stuff full

C++ code Semantic Segmentation +1

124,457

Paper
Code

PVT v2: Improved Baselines with Pyramid Vision Transformer

16 code implementations • 25 Jun 2021 • Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao

We hope this work will facilitate state-of-the-art Transformer researches in computer vision.

Ranked #23 on Object Detection on COCO-O

Image Classification Object Detection +1

29,648

Paper
Code

DaViT: Dual Attention Vision Transformers

3 code implementations • 7 Apr 2022 • Mingyu Ding, Bin Xiao, Noel Codella, Ping Luo, Jingdong Wang, Lu Yuan

We show that these two self-attentions complement each other: (i) since each channel token contains an abstract representation of the entire image, the channel attention naturally captures global interactions and representations by taking all spatial positions into account when computing attention scores between channels; (ii) the spatial attention refines the local representations by performing fine-grained interactions across spatial locations, which in turn helps the global information modeling in channel attention.

Ranked #1 on Instance Segmentation on Object Detection on COCO minival

Computational Efficiency Image Classification +4

29,648

Paper
Code

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

6 code implementations • CVPR 2021 • Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei LI, Zehuan Yuan, Changhu Wang, Ping Luo

In our method, however, a fixed sparse set of learned object proposals, total length of $N$, are provided to object recognition head to perform classification and location.

Ranked #5 on 2D Object Detection on CeyMo

Object object-detection +2

27,693

Paper
Code

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

9 code implementations • ICCV 2021 • Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao

Unlike the recently-proposed Transformer model (e. g., ViT) that is specially designed for image classification, we propose Pyramid Vision Transformer~(PVT), which overcomes the difficulties of porting Transformer to various dense prediction tasks.

Ranked #5 on Semantic Segmentation on SynPASS

Image Classification Instance Segmentation +3

27,693

Paper
Code

ByteTrack: Multi-Object Tracking by Associating Every Detection Box

10 code implementations • arXiv 2021 • Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Fucheng Weng, Zehuan Yuan, Ping Luo, Wenyu Liu, Xinggang Wang

ByteTrack also achieves state-of-the-art performance on MOT20, HiEve and BDD100K tracking benchmarks.

Ranked #1 on Multiple Object Tracking on BDD100K val

Multi-Object Tracking Multiple Object Tracking +1

12,026

Paper
Code

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

3 code implementations • CVPR 2022 • Peize Sun, Jinkun Cao, Yi Jiang, Zehuan Yuan, Song Bai, Kris Kitani, Ping Luo

A typical pipeline for multi-object tracking (MOT) is to use a detector for object localization, and following re-identification (re-ID) for object association.

Multi-Object Tracking Object +3

12,022

Paper
Code

Whole-Body Human Pose Estimation in the Wild

2 code implementations • ECCV 2020 • Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo

This paper investigates the task of 2D human whole-body pose estimation, which aims to localize dense landmarks on the entire human body including face, hands, body, and feet.

Ranked #8 on 2D Human Pose Estimation on COCO-WholeBody

2D Human Pose Estimation Facial Landmark Detection +2

4,957

Paper
Code

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search

4 code implementations • CVPR 2021 • Lumin Xu, Yingda Guan, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang

Human pose estimation has achieved significant progress in recent years.

Ranked #23 on Pose Estimation on COCO test-dev (using extra training data)

Neural Architecture Search Pose Estimation

4,957

Paper
Code

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net

24 code implementations • ECCV 2018 • Xingang Pan, Ping Luo, Jianping Shi, Xiaoou Tang

IBN-Net carefully integrates Instance Normalization (IN) and Batch Normalization (BN) as building blocks, and can be wrapped into many advanced deep networks to improve their performances.

Ranked #3 on All-day Semantic Segmentation on All-day CityScapes

All-day Semantic Segmentation Domain Generalization +2

3,946

Paper
Code

RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer

2 code implementations • 12 Apr 2023 • Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin

Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.

Inductive Bias

3,132

Paper
Code

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

2 code implementations • 9 May 2023 • Zhaoyang Liu, Yinan He, Wenhai Wang, Weiyun Wang, Yi Wang, Shoufa Chen, Qinglong Zhang, Zeqiang Lai, Yang Yang, Qingyun Li, Jiashuo Yu, Kunchang Li, Zhe Chen, Xue Yang, Xizhou Zhu, Yali Wang, LiMin Wang, Ping Luo, Jifeng Dai, Yu Qiao

Different from existing interactive systems that rely on pure language, by incorporating pointing instructions, the proposed iGPT significantly improves the efficiency of communication between users and chatbots, as well as the accuracy of chatbots in vision-centric tasks, especially in complicated visual scenarios where the number of objects is greater than 2.

Language Modelling

3,114

Paper
Code

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

2 code implementations • NeurIPS 2023 • Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie zhou, Yu Qiao, Jifeng Dai

We hope this model can set a new baseline for generalist vision and language models.

Language Modelling Large Language Model

3,114

Paper
Code

Context Autoencoder for Self-Supervised Representation Learning

6 code implementations • 7 Feb 2022 • Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, Jingdong Wang

The pretraining tasks include two tasks: masked representation prediction - predict the representations for the masked patches, and masked patch reconstruction - reconstruct the masked patches.

Ranked #14 on Self-Supervised Image Classification on ImageNet (finetuned)

Instance Segmentation object-detection +5

3,078

Paper
Code

Bridging Video-text Retrieval with Multiple Choice Questions

2 code implementations • CVPR 2022 • Yuying Ge, Yixiao Ge, Xihui Liu, Dian Li, Ying Shan, XiaoHu Qie, Ping Luo

As an additional benefit, our method achieves competitive results with much shorter pre-training videos on single-modality downstream tasks, e. g., action recognition with linear evaluation.

Ranked #8 on Zero-Shot Video Retrieval on MSVD

Action Recognition Multiple-choice +8

2,968

Paper
Code

VideoChat: Chat-Centric Video Understanding

1 code implementation • 10 May 2023 • Kunchang Li, Yinan He, Yi Wang, Yizhuo Li, Wenhai Wang, Ping Luo, Yali Wang, LiMin Wang, Yu Qiao

In this paper, we initiate an attempt of developing an end-to-end chat-centric video understanding system, coined as VideoChat.

Ranked #6 on Video Question Answering on MVBench

Video-based Generative Performance Benchmarking (Consistency) Video-based Generative Performance Benchmarking (Contextual Understanding) +5

2,645

Paper
Code

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

1 code implementation • 28 Nov 2023 • Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Yi Liu, Zun Wang, Jilan Xu, Guo Chen, Ping Luo, LiMin Wang, Yu Qiao

With the rapid development of Multi-modal Large Language Models (MLLMs), a number of diagnostic benchmarks have recently emerged to evaluate the comprehension capabilities of these models.

Ranked #1 on Zero-Shot Video Question Answer on STAR Benchmark

Fairness Multiple-choice +8

2,645

Paper
Code

DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images

5 code implementations • CVPR 2019 • Yuying Ge, Ruimao Zhang, Lingyun Wu, Xiaogang Wang, Xiaoou Tang, Ping Luo

A strong baseline is proposed, called Match R-CNN, which builds upon Mask R-CNN to solve the above four tasks in an end-to-end manner.

Pose Estimation Retrieval +1

2,153

Paper
Code

PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

2 code implementations • 30 Sep 2023 • Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, Zhenguo Li

We hope PIXART-$\alpha$ will provide new insights to the AIGC community and startups to accelerate building their own high-quality yet low-cost generative models from scratch.

Image Generation Language Modelling

2,103

Paper
Code

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models

1 code implementation • 10 Jan 2024 • Junsong Chen, Yue Wu, Simian Luo, Enze Xie, Sayak Paul, Ping Luo, Hang Zhao, Zhenguo Li

As a state-of-the-art, open-source image generation model, PIXART-{\delta} offers a promising alternative to the Stable Diffusion family of models, contributing significantly to text-to-image synthesis.

Image Generation

2,103

Paper
Code

MaskGAN: Towards Diverse and Interactive Facial Image Manipulation

7 code implementations • CVPR 2020 • Cheng-Han Lee, Ziwei Liu, Lingyun Wu, Ping Luo

To overcome these drawbacks, we propose a novel framework termed MaskGAN, enabling diverse and interactive face manipulation.

Attribute Image Manipulation

2,001

Paper
Code

DiffusionDet: Diffusion Model for Object Detection

3 code implementations • ICCV 2023 • Shoufa Chen, Peize Sun, Yibing Song, Ping Luo

We propose DiffusionDet, a new framework that formulates object detection as a denoising diffusion process from noisy boxes to object boxes.

Denoising Object +2

1,977

Paper
Code

Universal Instance Perception as Object Discovery and Retrieval

1 code implementation • CVPR 2023 • Bin Yan, Yi Jiang, Jiannan Wu, Dong Wang, Ping Luo, Zehuan Yuan, Huchuan Lu

All instance perception tasks aim at finding certain objects specified by some queries such as category names, language expressions, and target annotations, but this complete field has been split into multiple independent subtasks.

Ranked #1 on Referring Expression Segmentation on RefCoCo val (using extra training data)

Described Object Detection Generalized Referring Expression Comprehension +15

1,435

Paper
Code

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

1 code implementation • 8 May 2023 • Tao Gong, Chengqi Lyu, Shilong Zhang, Yudong Wang, Miao Zheng, Qian Zhao, Kuikun Liu, Wenwei Zhang, Ping Luo, Kai Chen

To further enhance the ability to chat with humans of the MultiModal-GPT, we utilize language-only instruction-following data to train the MultiModal-GPT jointly.

Instruction Following Language Modelling

1,397

Paper
Code

Segmenting Transparent Object in the Wild with Transformer

2 code implementations • 21 Jan 2021 • Enze Xie, Wenjia Wang, Wenhai Wang, Peize Sun, Hang Xu, Ding Liang, Ping Luo

This work presents a new fine-grained transparent object segmentation dataset, termed Trans10K-v2, extending Trans10K-v1, the first large-scale transparent object segmentation dataset.

Ranked #3 on Semantic Segmentation on Trans10K

Object Segmentation +2

1,183

Paper
Code

CycleMLP: A MLP-like Architecture for Dense Prediction

8 code implementations • ICLR 2022 • Shoufa Chen, Enze Xie, Chongjian Ge, Runjian Chen, Ding Liang, Ping Luo

We build a family of models which surpass existing MLPs and even state-of-the-art Transformer-based models, e. g., Swin Transformer, while using fewer parameters and FLOPs.

Ranked #15 on Semantic Segmentation on DensePASS

Image Classification Instance Segmentation +4

1,183

Paper
Code

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

1 code implementation • 6 Dec 2023 • Zhouxia Wang, Ziyang Yuan, Xintao Wang, Tianshui Chen, Menghan Xia, Ping Luo, Ying Shan

Therefore, this paper presents MotionCtrl, a unified and flexible motion controller for video generation designed to effectively and independently control camera and object motion.

Object Video Generation

1,036

Paper
Code

Spatial As Deep: Spatial CNN for Traffic Scene Understanding

8 code implementations • 17 Dec 2017 • Xingang Pan, Jianping Shi, Ping Luo, Xiaogang Wang, Xiaoou Tang

Although CNN has shown strong capability to extract semantics from raw pixels, its capacity to capture spatial relationships of pixels across rows and columns of an image is not fully explored.

Ranked #50 on Lane Detection on CULane

Lane Detection Scene Understanding

1,022

Paper
Code

Towards Grand Unification of Object Tracking

1 code implementation • 14 Jul 2022 • Bin Yan, Yi Jiang, Peize Sun, Dong Wang, Zehuan Yuan, Ping Luo, Huchuan Lu

We present a unified method, termed Unicorn, that can simultaneously solve four tracking problems (SOT, MOT, VOS, MOTS) with a single network using the same model parameters.

Ranked #2 on Multi-Object Tracking and Segmentation on BDD100K val

Multi-Object Tracking Multi-Object Tracking and Segmentation +3

942

Paper
Code

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

1 code implementation • 13 Jul 2023 • Yi Wang, Yinan He, Yizhuo Li, Kunchang Li, Jiashuo Yu, Xin Ma, Xinhao Li, Guo Chen, Xinyuan Chen, Yaohui Wang, Conghui He, Ping Luo, Ziwei Liu, Yali Wang, LiMin Wang, Yu Qiao

Specifically, we utilize a multi-scale approach to generate video-related descriptions.

Action Recognition Contrastive Learning +7

895

Paper
Code

Harvest Video Foundation Models via Efficient Post-Pretraining

1 code implementation • 30 Oct 2023 • Yizhuo Li, Kunchang Li, Yinan He, Yi Wang, Yali Wang, LiMin Wang, Yu Qiao, Ping Luo

Building video-language foundation models is costly and difficult due to the redundant nature of video data and the lack of high-quality video-language datasets.

Question Answering Text Retrieval +2

895

Paper
Code

PolarMask: Single Shot Instance Segmentation with Polar Representation

2 code implementations • CVPR 2020 • Enze Xie, Peize Sun, Xiaoge Song, Wenhai Wang, Ding Liang, Chunhua Shen, Ping Luo

In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used as a mask prediction module for instance segmentation, by easily embedding it into most off-the-shelf detection methods.

Ranked #100 on Instance Segmentation on COCO test-dev

Distance regression Instance Segmentation +4

869

Paper
Code

PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond

1 code implementation • 5 May 2021 • Enze Xie, Wenhai Wang, Mingyu Ding, Ruimao Zhang, Ping Luo

Extensive experiments demonstrate the effectiveness of both PolarMask and PolarMask++, which achieve competitive results on instance segmentation in the challenging COCO dataset with single-model and single-scale training and testing, as well as new state-of-the-art results on rotate text detection and cell segmentation.

Ranked #81 on Instance Segmentation on COCO test-dev (using extra training data)

Cell Segmentation Instance Segmentation +5

869

Paper
Code

Differentiable Learning-to-Normalize via Switchable Normalization

3 code implementations • ICLR 2019 • Ping Luo, Jiamin Ren, Zhanglin Peng, Ruimao Zhang, Jingyu Li

We hope SN will help ease the usage and understand the normalization techniques in deep learning.

864

Paper
Code

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

1 code implementation • 20 Jul 2018 • Hang Zhou, Yu Liu, Ziwei Liu, Ping Luo, Xiaogang Wang

Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech.

Lip Reading Retrieval +2

812

Paper
Code

Towards Photo-Realistic Virtual Try-On by Adaptively Generating$\leftrightarrow$Preserving Image Content

3 code implementations • 12 Mar 2020 • Han Yang, Ruimao Zhang, Xiaobao Guo, Wei Liu, WangMeng Zuo, Ping Luo

First, a semantic layout generation module utilizes semantic segmentation of the reference image to progressively predict the desired semantic layout after try-on.

Ranked #4 on Virtual Try-on on VITON (IS metric)

Semantic Segmentation Virtual Try-on

797

Paper
Code

ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild

1 code implementation • 23 Aug 2022 • Lumin Xu, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang

We propose a single-network approach, termed ZoomNet, to take into account the hierarchical structure of the full human body and solve the scale variation of different body parts.

Ranked #2 on 2D Human Pose Estimation on COCO-WholeBody

2D Human Pose Estimation Neural Architecture Search +1

705

Paper
Code

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

2 code implementations • 21 Dec 2023 • Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai

However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs.

Ranked #1 on Zero-Shot Video Retrieval on MSR-VTT-full (using extra training data)

Image Retrieval Image-to-Text Retrieval +10

659

Paper
Code

What Makes for End-to-End Object Detection?

1 code implementation • 10 Dec 2020 • Peize Sun, Yi Jiang, Enze Xie, Wenqi Shao, Zehuan Yuan, Changhu Wang, Ping Luo

We identify that classification cost in matching cost is the main ingredient: (1) previous detectors only consider location cost, (2) by additionally introducing classification cost, previous detectors immediately produce one-to-one prediction during inference.

General Classification Object +2

641

Paper
Code

DriveLM: Driving with Graph Visual Question Answering

1 code implementation • 21 Dec 2023 • Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Ping Luo, Andreas Geiger, Hongyang Li

The experiments demonstrate that Graph VQA provides a simple, principled framework for reasoning about a driving scene, and DriveLM-Data provides a challenging benchmark for this task.

Autonomous Driving Question Answering +1

617

Paper
Code

MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning

1 code implementation • 24 Nov 2022 • Yao Lai, Yao Mu, Ping Luo

Firstly, MaskPlace recasts placement as a problem of learning pixel-level visual representation to comprehensively describe millions of modules on a chip, enabling placement in a high-resolution canvas and a large action space.

Layout Design Representation Learning +1

615

Paper
Code

Video Understanding with Large Language Models: A Survey

1 code implementation • 29 Dec 2023 • Yunlong Tang, Jing Bi, Siting Xu, Luchuan Song, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Feng Zheng, JianGuo Zhang, Ping Luo, Jiebo Luo, Chenliang Xu

With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly.

Video Understanding

614

Paper
Code

TransTrack: Multiple Object Tracking with Transformer

2 code implementations • 31 Dec 2020 • Peize Sun, Jinkun Cao, Yi Jiang, Rufeng Zhang, Enze Xie, Zehuan Yuan, Changhu Wang, Ping Luo

In this work, we propose TransTrack, a simple but efficient scheme to solve the multiple object tracking problems.

Ranked #6 on Multi-Object Tracking on SportsMOT (using extra training data)

Multi-Object Tracking Multiple Object Tracking with Transformer +3

608

Paper
Code

Do 2D GANs Know 3D Shape? Unsupervised 3D shape reconstruction from 2D Image GANs

1 code implementation • ICLR 2021 • Xingang Pan, Bo Dai, Ziwei Liu, Chen Change Loy, Ping Luo

Through our investigation, we found that such a pre-trained GAN indeed contains rich 3D knowledge and thus can be used to recover 3D shape from a single 2D image in an unsupervised manner.

3D Shape Reconstruction Object

570

Paper
Code

SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving

1 code implementation • 19 Sep 2023 • Xiangchao Yan, Runjian Chen, Bo Zhang, Jiakang Yuan, Xinyu Cai, Botian Shi, Wenqi Shao, Junchi Yan, Ping Luo, Yu Qiao

Our contributions are threefold: (1) Occupancy prediction is shown to be promising for learning general representations, which is demonstrated by extensive experiments on plenty of datasets and tasks.

3D Object Detection Autonomous Driving +3

564

Paper
Code

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

2 code implementations • 25 Aug 2023 • Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang, Peng Xu, Lirui Zhao, Zhiqian Li, Kaipeng Zhang, Peng Gao, Yu Qiao, Ping Luo

LWC modulates the extreme values of weights by optimizing the clipping threshold.

Common Sense Reasoning Computational Efficiency +3

547

Paper
Code

Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception

1 code implementation • 19 Jan 2023 • Bin Huang, Yangguang Li, Enze Xie, Feng Liang, Luya Wang, Mingzhu Shen, Fenggang Liu, Tianqi Wang, Ping Luo, Jing Shao

Recently, the pure camera-based Bird's-Eye-View (BEV) perception removes expensive Lidar sensors, making it a feasible solution for economical autonomous driving.

Autonomous Driving Data Augmentation

525

Paper
Code

Parser-Free Virtual Try-on via Distilling Appearance Flows

2 code implementations • CVPR 2021 • Yuying Ge, Yibing Song, Ruimao Zhang, Chongjian Ge, Wei Liu, Ping Luo

A recent pioneering work employed knowledge distillation to reduce the dependency of human parsing, where the try-on images produced by a parser-based method are used as supervisions to train a "student" network without relying on segmentation, making the student mimic the try-on ability of the parser-based model.

Ranked #1 on Virtual Try-on on MPV

Human Parsing Knowledge Distillation +1

522

Paper
Code

Scene as Occupancy

2 code implementations • ICCV 2023 • Chonghao Sima, Wenwen Tong, Tai Wang, Li Chen, Silei Wu, Hanming Deng, Yi Gu, Lewei Lu, Ping Luo, Dahua Lin, Hongyang Li

Human driver can easily describe the complex traffic scene by visual system.

Motion Planning

482

Paper
Code

OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping

1 code implementation • NeurIPS 2023 • Huijie Wang, Tianyu Li, Yang Li, Li Chen, Chonghao Sima, Zhenbo Liu, Bangjun Wang, Peijin Jia, Yuting Wang, Shengyin Jiang, Feng Wen, Hang Xu, Ping Luo, Junchi Yan, Wei zhang, Hongyang Li

Accurately depicting the complex traffic scene is a vital component for autonomous vehicles to execute correct judgments.

3D Lane Detection

482

Paper
Code

Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation

1 code implementation • ECCV 2020 • Xingang Pan, Xiaohang Zhan, Bo Dai, Dahua Lin, Chen Change Loy, Ping Luo

Learning a good image prior is a long-term goal for image restoration and manipulation.

Generative Adversarial Network Image Manipulation +2

474

Paper
Code

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

2 code implementations • 7 Jul 2023 • Shilong Zhang, Peize Sun, Shoufa Chen, Min Xiao, Wenqi Shao, Wenwei Zhang, Yu Liu, Kai Chen, Ping Luo

Before sending to LLM, the reference is replaced by RoI features and interleaved with language embeddings as a sequence.

Ranked #1 on Visual Question Answering (VQA) on VCR (Q-AR) test

Attribute Common Sense Reasoning +4

448

Paper
Code

FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

2 code implementations • 3 Nov 2021 • Zhe Chen, Jiahao Wang, Wenhai Wang, Guo Chen, Enze Xie, Ping Luo, Tong Lu

We propose an accurate and efficient scene text detection framework, termed FAST (i. e., faster arbitrarily-shaped text detector).

Ranked #2 on Scene Text Detection on MSRA-TD500

Image Classification Scene Text Detection +1

433

Paper
Code

LLaMA Pro: Progressive LLaMA with Block Expansion

1 code implementation • 4 Jan 2024 • Chengyue Wu, Yukang Gan, Yixiao Ge, Zeyu Lu, Jiahao Wang, Ye Feng, Ping Luo, Ying Shan

Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e. g., from LLaMA to CodeLLaMA.

Instruction Following Math

381

Paper
Code

End-to-End Autonomous Driving through V2X Cooperation

2 code implementations • 31 Mar 2024 • Haibao Yu, Wenxian Yang, Jiaru Zhong, Zhenwei Yang, Siqi Fan, Ping Luo, Zaiqing Nie

Cooperatively utilizing both ego-vehicle and infrastructure sensor data via V2X communication has emerged as a promising approach for advanced autonomous driving.

Autonomous Driving

370

Paper
Code

Going Denser with Open-Vocabulary Part Segmentation

2 code implementations • ICCV 2023 • Peize Sun, Shoufa Chen, Chenchen Zhu, Fanyi Xiao, Ping Luo, Saining Xie, Zhicheng Yan

In this paper, we propose a detector with the ability to predict both open-vocabulary objects and their part segmentation.

Object object-detection +3

360

Paper
Code

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models

1 code implementation • 15 Jun 2023 • Peng Xu, Wenqi Shao, Kaipeng Zhang, Peng Gao, Shuo Liu, Meng Lei, Fanqing Meng, Siyuan Huang, Yu Qiao, Ping Luo

Large Vision-Language Models (LVLMs) have recently played a dominant role in multimodal vision-language learning.

Hallucination Image Captioning +3

359

Paper
Code

Tiny LVLM-eHub: Early Multimodal Experiments with Bard

1 code implementation • 7 Aug 2023 • Wenqi Shao, Yutao Hu, Peng Gao, Meng Lei, Kaipeng Zhang, Fanqing Meng, Peng Xu, Siyuan Huang, Hongsheng Li, Yu Qiao, Ping Luo

Secondly, it conducts an in-depth analysis of LVLMs' predictions using the ChatGPT Ensemble Evaluation (CEE), which leads to a robust and accurate evaluation and exhibits improved alignment with human evaluation compared to the word matching approach.

Hallucination Visual Reasoning

359

Paper
Code

Generalized Predictive Model for Autonomous Driving

1 code implementation • 14 Mar 2024 • Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, Jun Zhang, Andreas Geiger, Yu Qiao, Hongyang Li

In this paper, we introduce the first large-scale video prediction model in the autonomous driving discipline.

Autonomous Driving Video Prediction

357

Paper
Code

Fashion Landmark Detection in the Wild

4 code implementations • 10 Aug 2016 • Ziwei Liu, Sijie Yan, Ping Luo, Xiaogang Wang, Xiaoou Tang

Fashion landmark is also compared to clothing bounding boxes and human joints in two applications, fashion attribute prediction and clothes retrieval, showing that fashion landmark is a more discriminative representation to understand fashion images.

Attribute Pose Estimation +1

348

Paper
Code

A Survey of Reasoning with Foundation Models

1 code implementation • 17 Dec 2023 • Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng, Jifeng Dai, Ping Luo, Jingdong Wang, Ji-Rong Wen, Xipeng Qiu, Yike Guo, Hui Xiong, Qun Liu, Zhenguo Li

Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation.

Medical Diagnosis

338

Paper
Code

RestoreFormer: High-Quality Blind Face Restoration from Undegraded Key-Value Pairs

1 code implementation • CVPR 2022 • Zhouxia Wang, Jiawei Zhang, Runjian Chen, Wenping Wang, Ping Luo

Blind face restoration is to recover a high-quality face image from unknown degradations.

Blind Face Restoration Face Reconstruction +1

314

Paper
Code

Learning Depth-Guided Convolutions for Monocular 3D Object Detection

2 code implementations • CVPR 2020 • Mingyu Ding, Yuqi Huo, Hongwei Yi, Zhe Wang, Jianping Shi, Zhiwu Lu, Ping Luo

3D object detection from a single image without LiDAR is a challenging task due to the lack of accurate depth information.

Ranked #17 on Vehicle Pose Estimation on KITTI Cars Hard

Monocular 3D Object Detection Object +2

309

Paper
Code

Language as Queries for Referring Video Object Segmentation

1 code implementation • CVPR 2022 • Jiannan Wu, Yi Jiang, Peize Sun, Zehuan Yuan, Ping Luo

Referring video object segmentation (R-VOS) is an emerging cross-modal task that aims to segment the target object referred by a language expression in all video frames.

Ranked #3 on Referring Expression Segmentation on A2D Sentences (using extra training data)

Object Object Tracking +5

308

Paper
Code

Video Object Segmentation with Re-identification

3 code implementations • 1 Aug 2017 • Xiaoxiao Li, Yuankai Qi, Zhe Wang, Kai Chen, Ziwei Liu, Jianping Shi, Ping Luo, Xiaoou Tang, Chen Change Loy

Specifically, our Video Object Segmentation with Re-identification (VS-ReID) model includes a mask propagation module and a ReID module.

Object Segmentation +4

289

Paper
Code

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

2 code implementations • 26 May 2022 • Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, Ping Luo

To address this challenge, we propose an effective adaptation approach for Transformer, namely AdaptFormer, which can adapt the pre-trained ViTs into many different image and video tasks efficiently.

Action Recognition Video Recognition

289

Paper
Code

DetCo: Unsupervised Contrastive Learning for Object Detection

2 code implementations • ICCV 2021 • Enze Xie, Jian Ding, Wenhai Wang, Xiaohang Zhan, Hang Xu, Peize Sun, Zhenguo Li, Ping Luo

Unlike most recent methods that focused on improving accuracy of image classification, we present a novel contrastive learning approach, named DetCo, which fully explores the contrasts between global image and local image patches to learn discriminative representations for object detection.

Contrastive Learning Image Classification +2

264

Paper
Code

Dense Distinct Query for End-to-End Object Detection

1 code implementation • CVPR 2023 • Shilong Zhang, Xinjiang Wang, Jiaqi Wang, Jiangmiao Pang, Chengqi Lyu, Wenwei Zhang, Ping Luo, Kai Chen

Concretely, we first lay dense queries like traditional detectors and then select distinct ones for one-to-one assignments.

Ranked #3 on Object Detection on CrowdHuman (full body)

Object object-detection +1

236

Paper
Code

Graph-based Topology Reasoning for Driving Scenes

1 code implementation • 11 Apr 2023 • Tianyu Li, Li Chen, Huijie Wang, Yang Li, Jiazhi Yang, Xiangwei Geng, Shengyin Jiang, Yuting Wang, Hang Xu, Chunjing Xu, Junchi Yan, Ping Luo, Hongyang Li

Understanding the road genome is essential to realize autonomous driving.

Ranked #5 on 3D Lane Detection on OpenLane-V2 val

3D Lane Detection Autonomous Driving +1

229

Paper
Code

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

2 code implementations • 25 Dec 2023 • Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo

We evaluate our unified models on various benchmarks.

Ranked #7 on Referring Expression Segmentation on Refer-YouTube-VOS (2021 public validation)

Image Segmentation Object +5

218

Paper
Code

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

1 code implementation • CVPR 2022 • Zhiqi Li, Wenhai Wang, Enze Xie, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo, Tong Lu

Specifically, we supervise the attention modules in the mask decoder in a layer-wise manner.

Ranked #4 on Panoptic Segmentation on COCO test-dev

Instance Segmentation Panoptic Segmentation +1

195

Paper
Code

End-to-End Dense Video Captioning with Parallel Decoding

2 code implementations • ICCV 2021 • Teng Wang, Ruimao Zhang, Zhichao Lu, Feng Zheng, Ran Cheng, Ping Luo

Dense video captioning aims to generate multiple associated captions with their temporal locations from the video.

Ranked #5 on Dense Video Captioning on YouCook2

Caption Generation Dense Video Captioning

188

Paper
Code

Pose for Everything: Towards Category-Agnostic Pose Estimation

1 code implementation • 21 Jul 2022 • Lumin Xu, Sheng Jin, Wang Zeng, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang

In this paper, we introduce the task of Category-Agnostic Pose Estimation (CAPE), which aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition.

Ranked #4 on 2D Pose Estimation on MP-100

Category-Agnostic Pose Estimation Pose Estimation

183

Paper
Code

Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer

1 code implementation • CVPR 2022 • Wang Zeng, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang

Vision transformers have achieved great successes in many computer vision tasks.

Ranked #4 on 2D Human Pose Estimation on COCO-WholeBody

2D Human Pose Estimation 3D Human Pose Estimation +1

178

Paper
Code

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

1 code implementation • 22 Mar 2021 • Zhe Chen, Wenhai Wang, Enze Xie, Tong Lu, Ping Luo

(1) We divide input image into small patches and adopt TIN, successfully transferring image style with arbitrary high-resolution.

Style Transfer

176

Paper
Code

Learning Object-Language Alignments for Open-Vocabulary Object Detection

1 code implementation • 27 Nov 2022 • Chuang Lin, Peize Sun, Yi Jiang, Ping Luo, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan, Jianfei Cai

In this paper, we propose a novel open-vocabulary object detection framework directly learning from image-text pair data.

Object object-detection +3

168

Paper
Code

3D Human Mesh Regression with Dense Correspondence

3 code implementations • CVPR 2020 • Wang Zeng, Wanli Ouyang, Ping Luo, Wentao Liu, Xiaogang Wang

This paper proposes a model-free 3D human mesh estimation framework, named DecoMR, which explicitly establishes the dense correspondence between the mesh and the local image features in the UV space (i. e. a 2D space used for texture mapping of 3D mesh).

Ranked #1 on 3D Human Reconstruction on Surreal

3D Human Pose Estimation 3D Human Reconstruction +1

164

Paper
Code

VDT: General-purpose Video Diffusion Transformers via Mask Modeling

1 code implementation • 22 May 2023 • Haoyu Lu, Guoxing Yang, Nanyi Fei, Yuqi Huo, Zhiwu Lu, Ping Luo, Mingyu Ding

We also propose a unified spatial-temporal mask modeling mechanism, seamlessly integrated with the model, to cater to diverse video generation scenarios.

Autonomous Driving Video Generation +1

146

Paper
Code

DDP: Diffusion Model for Dense Visual Prediction

1 code implementation • ICCV 2023 • Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo

We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline.

Ranked #2 on Monocular Depth Estimation on SUN-RGBD

Denoising Monocular Depth Estimation +2

145

Paper
Code

DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations

1 code implementation • 24 Jan 2022 • Yuanfeng Ji, Lu Zhang, Jiaxiang Wu, Bingzhe Wu, Long-Kai Huang, Tingyang Xu, Yu Rong, Lanqing Li, Jie Ren, Ding Xue, Houtim Lai, Shaoyong Xu, Jing Feng, Wei Liu, Ping Luo, Shuigeng Zhou, Junzhou Huang, Peilin Zhao, Yatao Bian

AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient.

Benchmarking Drug Discovery +1

143

Paper
Code

Switchable Whitening for Deep Representation Learning

1 code implementation • ICCV 2019 • Xingang Pan, Xiaohang Zhan, Jianping Shi, Xiaoou Tang, Ping Luo

Unlike existing works that design normalization techniques for specific tasks, we propose Switchable Whitening (SW), which provides a general form unifying different whitening methods as well as standardization methods.

Ranked #6 on Robust Object Detection on DWD

Domain Adaptation Image Classification +4

139

Paper
Code

HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers

1 code implementation • CVPR 2021 • Mingyu Ding, Xiaochen Lian, Linjie Yang, Peng Wang, Xiaojie Jin, Zhiwu Lu, Ping Luo

Last, we proposed an efficient fine-grained search strategy to train HR-NAS, which effectively explores the search space, and finds optimal architectures given various tasks and computation resources.

Image Classification Neural Architecture Search +3

138

Paper
Code

Domain-Adaptive Few-Shot Learning

1 code implementation • 19 Mar 2020 • An Zhao, Mingyu Ding, Zhiwu Lu, Tao Xiang, Yulei Niu, Jiechao Guan, Ji-Rong Wen, Ping Luo

Existing few-shot learning (FSL) methods make the implicit assumption that the few target class samples are from the same domain as the source class samples.

Domain Adaptation Few-Shot Learning

136

Paper
Code

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval

1 code implementation • 26 Apr 2022 • Yuying Ge, Yixiao Ge, Xihui Liu, Alex Jinpeng Wang, Jianping Wu, Ying Shan, XiaoHu Qie, Ping Luo

Dominant pre-training work for video-text retrieval mainly adopt the "dual-encoder" architectures to enable efficient retrieval, where two separate encoders are used to contrast global video and text representations, but ignore detailed local semantics.

Ranked #7 on Zero-Shot Video Retrieval on MSVD

Action Recognition Retrieval +6

129

Paper
Code

Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning

1 code implementation • 11 Oct 2021 • Chongjian Ge, Youwei Liang, Yibing Song, Jianbo Jiao, Jue Wang, Ping Luo

Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL.

Image Classification object-detection +3

116

Paper
Code

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

1 code implementation • NeurIPS 2021 • Chongjian Ge, Youwei Liang, Yibing Song, Jianbo Jiao, Jue Wang, Ping Luo

Image Classification object-detection +3

116

Paper
Code

V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting

1 code implementation • CVPR 2023 • Haibao Yu, Wenxian Yang, Hongzhi Ruan, Zhenwei Yang, Yingjuan Tang, Xu Gao, Xin Hao, Yifeng Shi, Yifeng Pan, Ning Sun, Juan Song, Jirui Yuan, Ping Luo, Zaiqing Nie

Utilizing infrastructure and vehicle-side information to track and forecast the behaviors of surrounding traffic participants can significantly improve decision-making and safety in autonomous driving.

Autonomous Driving Decision Making +1

113

Paper
Code

Not All Pixels Are Equal: Difficulty-aware Semantic Segmentation via Deep Layer Cascade

1 code implementation • CVPR 2017 • Xiaoxiao Li, Ziwei Liu, Ping Luo, Chen Change Loy, Xiaoou Tang

Third, in comparison to MC, LC is an end-to-end trainable framework, allowing joint learning of all sub-models.

Ranked #22 on Semantic Segmentation on PASCAL VOC 2012 test

Semantic Segmentation

108

Paper
Code

Multi-Compound Transformer for Accurate Biomedical Image Segmentation

1 code implementation • 28 Jun 2021 • Yuanfeng Ji, Ruimao Zhang, Huijie Wang, Zhen Li, Lingyun Wu, Shaoting Zhang, Ping Luo

The recent vision transformer(i. e. for image classification) learns non-local attentive interaction of different patch tokens.

Image Classification Image Segmentation +2

107

Paper
Code

RestoreFormer++: Towards Real-World Blind Face Restoration from Undegraded Key-Value Pairs

1 code implementation • 14 Aug 2023 • Zhouxia Wang, Jiawei Zhang, Tianshui Chen, Wenping Wang, Ping Luo

In this work, we propose RestoreFormer++, which on the one hand introduces fully-spatial attention mechanisms to model the contextual information and the interplay with the priors, and on the other hand, explores an extending degrading model to help generate more realistic degraded face images to alleviate the synthetic-to-real-world gap.

Blind Face Restoration

107

Paper
Code

Disentangled Cycle Consistency for Highly-realistic Virtual Try-On

1 code implementation • CVPR 2021 • Chongjian Ge, Yibing Song, Yuying Ge, Han Yang, Wei Liu, Ping Luo

To this end, DCTON can be naturally trained in a self-supervised manner following cycle consistency learning.

Virtual Try-on

104

Paper
Code

Advancing Vision Transformers with Group-Mix Attention

1 code implementation • 26 Nov 2023 • Chongjian Ge, Xiaohan Ding, Zhan Tong, Li Yuan, Jiangliu Wang, Yibing Song, Ping Luo

The attention map is computed based on the mixtures of tokens and group proxies and used to re-combine the tokens and groups in Value.

Image Classification object-detection +2

102

Paper
Code

End-to-End Video Text Spotting with Transformer

1 code implementation • 20 Mar 2022 • Weijia Wu, Yuanqiang Cai, Chunhua Shen, Debing Zhang, Ying Fu, Hong Zhou, Ping Luo

Recent video text spotting methods usually require the three-staged pipeline, i. e., detecting text in individual images, recognizing localized text, tracking text streams with post-processing to generate final results.

Text Detection Text Spotting

Paper
Code

3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal

1 code implementation • 22 Jul 2022 • Hao Meng, Sheng Jin, Wentao Liu, Chen Qian, Mengxiang Lin, Wanli Ouyang, Ping Luo

Unlike most previous works that directly predict the 3D poses of two interacting hands simultaneously, we propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately.

3D Interacting Hand Pose Estimation Hand Pose Estimation

Paper
Code

Segmenting Transparent Objects in the Wild

1 code implementation • ECCV 2020 • Enze Xie, Wenjia Wang, Wenhai Wang, Mingyu Ding, Chunhua Shen, Ping Luo

To address this important problem, this work proposes a large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10, 428 images of real scenarios with carefully manual annotations, which are 10 times larger than the existing datasets.

Ranked #4 on Semantic Segmentation on Trans10K

Segmentation Semantic Segmentation +1

Paper
Code

TextSR: Content-Aware Text Super-Resolution Guided by Recognition

1 code implementation • 16 Sep 2019 • Wenjia Wang, Enze Xie, Peize Sun, Wenhai Wang, Lixun Tian, Chunhua Shen, Ping Luo

Nonetheless, most of the previous methods may not work well in recognizing text with low resolution which is often seen in natural scene images.

Scene Text Recognition Super-Resolution

Paper
Code

MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation

1 code implementation • 19 Apr 2023 • Chongjian Ge, Junsong Chen, Enze Xie, Zhongdao Wang, Lanqing Hong, Huchuan Lu, Zhenguo Li, Ping Luo

These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities.

3D Object Detection Autonomous Driving +3

Paper
Code

Unconstrained Fashion Landmark Detection via Hierarchical Recurrent Transformer Networks

2 code implementations • 7 Aug 2017 • Sijie Yan, Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, Xiaoou Tang

This work addresses unconstrained fashion landmark detection, where clothing bounding boxes are not provided in both training and test.

Paper
Code

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers

1 code implementation • ICCV 2023 • Mengzhao Chen, Wenqi Shao, Peng Xu, Mingbao Lin, Kaipeng Zhang, Fei Chao, Rongrong Ji, Yu Qiao, Ping Luo

Token compression aims to speed up large-scale vision transformers (e. g. ViTs) by pruning (dropping) or merging tokens.

Ranked #4 on Efficient ViTs on ImageNet-1K (with DeiT-S)

Efficient ViTs

Paper
Code

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

2 code implementations • ECCV 2020 • Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen, Ping Luo

Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection.

Language Modelling Sentence +2

Paper
Code

SSN: Learning Sparse Switchable Normalization via SparsestMax

1 code implementation • CVPR 2019 • Wenqi Shao, Tianjian Meng, Jingyu Li, Ruimao Zhang, Yudian Li, Xiaogang Wang, Ping Luo

Unlike $\ell_1$ and $\ell_0$ constraints that impose difficulties in optimization, we turn this constrained optimization problem into feed-forward computation by proposing SparsestMax, which is a sparse version of softmax.

Paper
Code

MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision

1 code implementation • 30 Aug 2023 • Jianning Li, Zongwei Zhou, Jiancheng Yang, Antonio Pepe, Christina Gsaxner, Gijs Luijten, Chongyu Qu, Tiezheng Zhang, Xiaoxi Chen, Wenxuan Li, Marek Wodzinski, Paul Friedrich, Kangxian Xie, Yuan Jin, Narmada Ambigapathy, Enrico Nasca, Naida Solak, Gian Marco Melito, Viet Duc Vu, Afaque R. Memon, Christopher Schlachta, Sandrine de Ribaupierre, Rajnikant Patel, Roy Eagleson, Xiaojun Chen, Heinrich Mächler, Jan Stefan Kirschke, Ezequiel de la Rosa, Patrick Ferdinand Christ, Hongwei Bran Li, David G. Ellis, Michele R. Aizenberg, Sergios Gatidis, Thomas Küstner, Nadya Shusharina, Nicholas Heller, Vincent Andrearczyk, Adrien Depeursinge, Mathieu Hatt, Anjany Sekuboyina, Maximilian Löffler, Hans Liebl, Reuben Dorent, Tom Vercauteren, Jonathan Shapey, Aaron Kujawa, Stefan Cornelissen, Patrick Langenhuizen, Achraf Ben-Hamadou, Ahmed Rekik, Sergi Pujades, Edmond Boyer, Federico Bolelli, Costantino Grana, Luca Lumetti, Hamidreza Salehi, Jun Ma, Yao Zhang, Ramtin Gharleghi, Susann Beier, Arcot Sowmya, Eduardo A. Garza-Villarreal, Thania Balducci, Diego Angeles-Valdez, Roberto Souza, Leticia Rittner, Richard Frayne, Yuanfeng Ji, Vincenzo Ferrari, Soumick Chatterjee, Florian Dubost, Stefanie Schreiber, Hendrik Mattern, Oliver Speck, Daniel Haehn, Christoph John, Andreas Nürnberger, João Pedrosa, Carlos Ferreira, Guilherme Aresta, António Cunha, Aurélio Campilho, Yannick Suter, Jose Garcia, Alain Lalande, Vicky Vandenbossche, Aline Van Oevelen, Kate Duquesne, Hamza Mekhzoum, Jef Vandemeulebroucke, Emmanuel Audenaert, Claudia Krebs, Timo Van Leeuwen, Evie Vereecke, Hauke Heidemeyer, Rainer Röhrig, Frank Hölzle, Vahid Badeli, Kathrin Krieger, Matthias Gunzer, Jianxu Chen, Timo van Meegdenburg, Amin Dada, Miriam Balzer, Jana Fragemann, Frederic Jonske, Moritz Rempe, Stanislav Malorodov, Fin H. Bahnsen, Constantin Seibold, Alexander Jaus, Zdravko Marinov, Paul F. Jaeger, Rainer Stiefelhagen, Ana Sofia Santos, Mariana Lindo, André Ferreira, Victor Alves, Michael Kamp, Amr Abourayya, Felix Nensa, Fabian Hörst, Alexander Brehmer, Lukas Heine, Yannik Hanusrichter, Martin Weßling, Marcel Dudda, Lars E. Podleska, Matthias A. Fink, Julius Keyl, Konstantinos Tserpes, Moon-Sung Kim, Shireen Elhabian, Hans Lamecker, Dženan Zukić, Beatriz Paniagua, Christian Wachinger, Martin Urschler, Luc Duong, Jakob Wasserthal, Peter F. Hoyer, Oliver Basu, Thomas Maal, Max J. H. Witjes, Gregor Schiele, Ti-chiun Chang, Seyed-Ahmad Ahmadi, Ping Luo, Bjoern Menze, Mauricio Reyes, Thomas M. Deserno, Christos Davatzikos, Behrus Puladi, Pascal Fua, Alan L. Yuille, Jens Kleesiek, Jan Egger

For the medical domain, we present a large collection of anatomical shapes (e. g., bones, organs, vessels) and 3D models of surgical instrument, called MedShapeNet, created to facilitate the translation of data-driven vision algorithms to medical applications and to adapt SOTA vision algorithms to medical problems.

Anatomy Mixed Reality

Paper
Code

Deep Learning Face Attributes in the Wild

2 code implementations • ICCV 2015 • Ziwei Liu, Ping Luo, Xiaogang Wang, Xiaoou Tang

LNet is pre-trained by massive general object categories for face localization, while ANet is pre-trained by massive face identities for attribute prediction.

Ranked #6 on Facial Attribute Classification on LFWA

Attribute Facial Attribute Classification

Paper
Code

Large-batch Optimization for Dense Visual Predictions

1 code implementation • 20 Oct 2022 • Zeyue Xue, Jianming Liang, Guanglu Song, Zhuofan Zong, Liang Chen, Yu Liu, Ping Luo

To address this challenge, we propose a simple yet effective algorithm, named Adaptive Gradient Variance Modulator (AGVM), which can train dense visual predictors with very large batch size, enabling several benefits more appealing than prior arts.

Instance Segmentation object-detection +3

Paper
Code

Vehicle-Infrastructure Cooperative 3D Object Detection via Feature Flow Prediction

1 code implementation • 19 Mar 2023 • Haibao Yu, Yingjuan Tang, Enze Xie, Jilei Mao, Jirui Yuan, Ping Luo, Zaiqing Nie

Cooperatively utilizing both ego-vehicle and infrastructure sensor data can significantly enhance autonomous driving perception abilities.

3D Object Detection Autonomous Driving +1

Paper
Code

Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

1 code implementation • NeurIPS 2023 • Haibao Yu, Yingjuan Tang, Enze Xie, Jilei Mao, Ping Luo, Zaiqing Nie

To address these issues in vehicle-infrastructure cooperative 3D (VIC3D) object detection, we propose the Feature Flow Net (FFNet), a novel cooperative detection framework.

3D Object Detection Autonomous Driving +1

Paper
Code

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning

1 code implementation • 4 Jan 2024 • Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo

Charts play a vital role in data visualization, understanding data patterns, and informed decision-making.

Data Visualization Decision Making +2

Paper
Code

STAR: A Structure-Aware Lightweight Transformer for Real-Time Image Enhancement

1 code implementation • ICCV 2021 • Zhaoyang Zhang, Yitong Jiang, Jun Jiang, Xiaogang Wang, Ping Luo, Jinwei Gu

STAR is a general architecture that can be easily adapted to different image enhancement tasks.

Color Constancy Image Enhancement +3

Paper
Code

RelativeNAS: Relative Neural Architecture Search via Slow-Fast Learning

2 code implementations • 14 Sep 2020 • Hao Tan, Ran Cheng, Shihua Huang, Cheng He, Changxiao Qiu, Fan Yang, Ping Luo

Despite the remarkable successes of Convolutional Neural Networks (CNNs) in computer vision, it is time-consuming and error-prone to manually design a CNN.

Keypoint Detection Neural Architecture Search +3

Paper
Code

Multi-frame Collaboration for Effective Endoscopic Video Polyp Detection via Spatial-Temporal Feature Transformation

1 code implementation • 8 Jul 2021 • Lingyun Wu, Zhiqiang Hu, Yuanfeng Ji, Ping Luo, Shaoting Zhang

For example, STFT improves the still image baseline FCOS by 10. 6% and 20. 6% on the comprehensive F1-score of the polyp localization task in CVC-Clinic and ASUMayo datasets, respectively, and outperforms the state-of-the-art video-based method by 3. 6% and 8. 0%, respectively.

Paper
Code

Learning Transferable Spatiotemporal Representations from Natural Script Knowledge

1 code implementation • CVPR 2023 • Ziyun Zeng, Yuying Ge, Xihui Liu, Bin Chen, Ping Luo, Shu-Tao Xia, Yixiao Ge

Pre-training on large-scale video data has become a common recipe for learning transferable spatiotemporal representations in recent years.

Descriptive Representation Learning +1

Paper
Code

Learning Versatile Neural Architectures by Propagating Network Codes

1 code implementation • ICLR 2022 • Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, Jingdong Wang, Ping Luo

(4) Thorough studies of NCP on inter-, cross-, and intra-tasks highlight the importance of cross-task neural architecture design, i. e., multitask neural architectures and architecture transferring between different tasks.

Image Segmentation Neural Architecture Search +2

Paper
Code

AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners

1 code implementation • 3 Feb 2023 • Zhixuan Liang, Yao Mu, Mingyu Ding, Fei Ni, Masayoshi Tomizuka, Ping Luo

For example, AdaptDiffuser not only outperforms the previous art Diffuser by 20. 8% on Maze2D and 7. 5% on MuJoCo locomotion, but also adapts better to new tasks, e. g., KUKA pick-and-place, by 27. 9% without requiring additional expert data.

Paper
Code

An Empirical Investigation of Representation Learning for Imitation

2 code implementations • 16 May 2022 • Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah

We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation across several environment suites.

Image Classification Imitation Learning +1

Paper
Code

Bringing Events Into Video Deblurring With Non-Consecutively Blurry Frames

1 code implementation • ICCV 2021 • Wei Shang, Dongwei Ren, Dongqing Zou, Jimmy S. Ren, Ping Luo, WangMeng Zuo

EFM can also be easily incorporated into existing deblurring networks, making event-driven deblurring task benefit from state-of-the-art deblurring methods.

Deblurring

Paper
Code

AdaX: Adaptive Gradient Descent with Exponential Long Term Memory

1 code implementation • 21 Apr 2020 • Wenjie Li, Zhaoyang Zhang, Xinjiang Wang, Ping Luo

Although adaptive optimization algorithms such as Adam show fast convergence in many machine learning tasks, this paper identifies a problem of Adam by analyzing its performance in a simple non-convex synthetic problem, showing that Adam's fast convergence would possibly lead the algorithm to local minimums.

Paper
Code

Polygon-free: Unconstrained Scene Text Detection with Box Annotations

1 code implementation • 26 Nov 2020 • Weijia Wu, Enze Xie, Ruimao Zhang, Wenhai Wang, Hong Zhou, Ping Luo

For example, without using polygon annotations, PSENet achieves an 80. 5% F-score on TotalText [3] (vs. 80. 9% of fully supervised counterpart), 31. 1% better than training directly with upright bounding box annotations, and saves 80%+ labeling costs.

Scene Text Detection Text Detection

Paper
Code

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention

1 code implementation • CVPR 2023 • Mingyu Ding, Yikang Shen, Lijie Fan, Zhenfang Chen, Zitian Chen, Ping Luo, Joshua B. Tenenbaum, Chuang Gan

When looking at an image, we can decompose the scene into entities and their parts as well as obtain the dependencies between them.

Paper
Code

Rethinking Resolution in the Context of Efficient Video Recognition

1 code implementation • 26 Sep 2022 • Chuofan Ma, Qiushan Guo, Yi Jiang, Zehuan Yuan, Ping Luo, Xiaojuan Qi

Our key finding is that the major cause of degradation is not information loss in the down-sampling process, but rather the mismatch between network architecture and input scale.

Knowledge Distillation Video Recognition

Paper
Code

Accelerating Vision-Language Pretraining with Free Language Modeling

1 code implementation • CVPR 2023 • Teng Wang, Yixiao Ge, Feng Zheng, Ran Cheng, Ying Shan, XiaoHu Qie, Ping Luo

FLM successfully frees the prediction rate from the tie-up with the corruption rate while allowing the corruption spans to be customized for each token to be predicted.

Language Modelling Masked Language Modeling

Paper
Code

$π$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

1 code implementation • 27 Apr 2023 • Chengyue Wu, Teng Wang, Yixiao Ge, Zeyu Lu, Ruisong Zhou, Ying Shan, Ping Luo

Foundation models have achieved great advances in multi-task learning with a unified interface of unimodal and multimodal tasks.

Multi-Task Learning

Paper
Code

When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks

1 code implementation • CVPR 2021 • Jiahang Wang, Sheng Jin, Wentao Liu, Weizhong Liu, Chen Qian, Ping Luo

However, unlike human vision that is robust to various data corruptions such as blur and pixelation, current pose estimators are easily confused by these corruptions.

Knowledge Distillation Pose Estimation

Paper
Code

Beyond One-to-One: Rethinking the Referring Image Segmentation

1 code implementation • ICCV 2023 • Yutao Hu, Qixiong Wang, Wenqi Shao, Enze Xie, Zhenguo Li, Jungong Han, Ping Luo

In this paper, we address this issue from two perspectives.

Image Segmentation Semantic Segmentation +1

Paper
Code

EGC: Image Generation and Classification via a Diffusion Energy-Based Model

1 code implementation • ICCV 2023 • Qiushan Guo, Chuofan Ma, Yi Jiang, Zehuan Yuan, Yizhou Yu, Ping Luo

Learning image classification and image generation using the same set of network parameters is a challenging problem.

Denoising Image Classification +1

Paper
Code

FAT: Learning Low-Bitwidth Parametric Representation via Frequency-Aware Transformation

1 code implementation • 15 Feb 2021 • Chaofan Tao, Rui Lin, Quan Chen, Zhaoyang Zhang, Ping Luo, Ngai Wong

Prior arts often discretize the network weights by carefully tuning hyper-parameters of quantization (e. g. non-uniform stepsize and layer-wise bitwidths), which are complicated and sub-optimal because the full-precision and low-precision models have a large discrepancy.

Neural Network Compression Quantization

Paper
Code

Dynamic Token Normalization Improves Vision Transformers

1 code implementation • ICLR 2022 • Wenqi Shao, Yixiao Ge, Zhaoyang Zhang, Xuyuan Xu, Xiaogang Wang, Ying Shan, Ping Luo

It is difficult for Transformers to capture inductive bias such as the positional context in an image with LN.

Inductive Bias ListOps +2

Paper
Code

MLLMs-Augmented Visual-Language Representation Learning

1 code implementation • 30 Nov 2023 • Yanqing Liu, Kai Wang, Wenqi Shao, Ping Luo, Yu Qiao, Mike Zheng Shou, Kaipeng Zhang, Yang You

Visual-language pre-training has achieved remarkable success in many multi-modal tasks, largely attributed to the availability of large-scale image-text datasets.

Representation Learning Retrieval +1

Paper
Code

Foundation Model is Efficient Multimodal Multitask Model Selector

1 code implementation • NeurIPS 2023 • Fanqing Meng, Wenqi Shao, Zhanglin Peng, Chonghe Jiang, Kaipeng Zhang, Yu Qiao, Ping Luo

This paper investigates an under-explored but important problem: given a collection of pre-trained neural networks, predicting their performance on each multi-modal task without fine-tuning them, such as image recognition, referring, captioning, visual question answering, and text question answering.

Model Selection Question Answering +1

Paper
Code

CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer

1 code implementation • 17 Jun 2022 • Yao Mu, Shoufa Chen, Mingyu Ding, Jianyu Chen, Runjian Chen, Ping Luo

In visual control, learning transferable state representation that can transfer between different control tasks is important to reduce the training sample size.

Transfer Learning

Paper
Code

PoseTrans: A Simple Yet Effective Pose Transformation Augmentation for Human Pose Estimation

1 code implementation • 16 Aug 2022 • Wentao Jiang, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Si Liu

Human pose estimation aims to accurately estimate a wide variety of human poses.

Data Augmentation Pose Estimation

Paper
Code

CO^3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving

1 code implementation • 8 Jun 2022 • Runjian Chen, Yao Mu, Runsen Xu, Wenqi Shao, Chenhan Jiang, Hang Xu, Zhenguo Li, Ping Luo

In this paper, we propose CO^3, namely Cooperative Contrastive Learning and Contextual Shape Prediction, to learn 3D representation for outdoor-scene point clouds in an unsupervised manner.

Autonomous Driving Contrastive Learning +1

Paper
Code

Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos

1 code implementation • 11 Mar 2023 • Teng Wang, Jinrui Zhang, Feng Zheng, Wenhao Jiang, Ran Cheng, Ping Luo

Our framework is easily extensible to tasks covering visually-grounded language understanding and generation.

Ranked #1 on Natural Language Moment Retrieval on ActivityNet Captions

Dense Video Captioning Natural Language Moment Retrieval +2

Paper
Code

AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation

1 code implementation • 16 Jun 2022 • Yuanfeng Ji, Haotian Bai, Jie Yang, Chongjian Ge, Ye Zhu, Ruimao Zhang, Zhen Li, Lingyan Zhang, Wanling Ma, Xiang Wan, Ping Luo

Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods.

Image Segmentation Medical Image Segmentation +3

Paper
Code

Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space

1 code implementation • 7 Jul 2022 • Wenqi Shao, Xun Zhao, Yixiao Ge, Zhaoyang Zhang, Lei Yang, Xiaogang Wang, Ying Shan, Ping Luo

It is challenging because the ground-truth model ranking for each task can only be generated by fine-tuning the pre-trained models on the target dataset, which is brute-force and computationally expensive.

Ranked #2 on Transferability on classification benchmark

Transferability

Paper
Code

Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning

1 code implementation • 9 Oct 2022 • Yao Mu, Yuzheng Zhuang, Fei Ni, Bin Wang, Jianyu Chen, Jianye Hao, Ping Luo

This paper addresses such a challenge by Decomposed Mutual INformation Optimization (DOMINO) for context learning, which explicitly learns a disentangled context to maximize the mutual information between the context and historical trajectories, while minimizing the state transition prediction error.

Decision Making Meta Reinforcement Learning +2

Paper
Code

Channel Equilibrium Networks for Learning Deep Representation

1 code implementation • ICML 2020 • Wenqi Shao, Shitao Tang, Xingang Pan, Ping Tan, Xiaogang Wang, Ping Luo

Unlike prior arts that simply removed the inhibited channels, we propose to "wake them up" during training by designing a novel neural building block, termed Channel Equilibrium (CE) block, which enables channels at the same layer to contribute equally to the learned representation.

Paper
Code

VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix

1 code implementation • 17 Jun 2022 • Teng Wang, Wenhao Jiang, Zhichao Lu, Feng Zheng, Ran Cheng, Chengguo Yin, Ping Luo

Existing vision-language pre-training (VLP) methods primarily rely on paired image-text datasets, which are either annotated by enormous human labors, or crawled from the internet followed by elaborate data cleaning techniques.

Contrastive Learning Data Augmentation +2

Paper
Code

Learning a Reinforced Agent for Flexible Exposure Bracketing Selection

1 code implementation • CVPR 2020 • Zhouxia Wang, Jiawei Zhang, Mude Lin, Jiong Wang, Ping Luo, Jimmy Ren

Automatically selecting exposure bracketing (images exposed differently) is important to obtain a high dynamic range image by using multi-exposure fusion.

Paper
Code

Real-time Controllable Denoising for Image and Video

1 code implementation • CVPR 2023 • Zhaoyang Zhang, Yitong Jiang, Wenqi Shao, Xiaogang Wang, Ping Luo, Kaimo Lin, Jinwei Gu

Controllable image denoising aims to generate clean samples with human perceptual priors and balance sharpness and smoothness.

Image Denoising Video Denoising

Paper
Code

Cached Transformers: Improving Transformers with Differentiable Memory Cache

1 code implementation • 20 Dec 2023 • Zhaoyang Zhang, Wenqi Shao, Yixiao Ge, Xiaogang Wang, Jinwei Gu, Ping Luo

This work introduces a new Transformer model called Cached Transformer, which uses Gated Recurrent Cached (GRC) attention to extend the self-attention mechanism with a differentiable memory cache of tokens.

Image Classification Instance Segmentation +6

Paper
Code

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

1 code implementation • 31 Mar 2024 • Lirui Zhao, Yue Yang, Kaipeng Zhang, Wenqi Shao, Yuxin Zhang, Yu Qiao, Ping Luo, Rongrong Ji

Text-to-image (T2I) generative models have attracted significant attention and found extensive applications within and beyond academic research.

Language Modelling Large Language Model

Paper
Code

Towards High-Quality Temporal Action Detection with Sparse Proposals

1 code implementation • 18 Sep 2021 • Jiannan Wu, Peize Sun, Shoufa Chen, Jiewen Yang, Zihao Qi, Lan Ma, Ping Luo

Towards high-quality temporal action detection, we introduce Sparse Proposals to interact with the hierarchical features.

Action Detection Avg +2

Paper
Code

Don't Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement Learning

1 code implementation • 21 Feb 2022 • Zhecheng Yuan, Guozheng Ma, Yao Mu, Bo Xia, Bo Yuan, Xueqian Wang, Ping Luo, Huazhe Xu

One of the key challenges in visual Reinforcement Learning (RL) is to learn policies that can generalize to unseen environments.

Data Augmentation Reinforcement Learning (RL)

Paper
Code

A Large-Scale Car Dataset for Fine-Grained Categorization and Verification

3 code implementations • CVPR 2015 • Linjie Yang, Ping Luo, Chen Change Loy, Xiaoou Tang

Updated on 24/09/2015: This update provides preliminary experiment results for fine-grained classification on the surveillance data of CompCars.

Ranked #5 on Fine-Grained Image Classification on CompCars

Fine-Grained Image Classification General Classification

Paper
Code

Webly Supervised Image Classification with Self-Contained Confidence

4 code implementations • ECCV 2020 • Jingkang Yang, Litong Feng, Weirong Chen, Xiaopeng Yan, Huabin Zheng, Ping Luo, Wayne Zhang

Therefore, a simple yet effective WSL framework is proposed.

Ranked #7 on Image Classification on WebVision-1000

Classification General Classification +2

Paper
Code

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

1 code implementation • 18 Feb 2024 • Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo

Large language models (LLMs) have demonstrated outstanding performance in various tasks, such as text summarization, text question-answering, and etc.

Question Answering Text Summarization

Paper
Code

On Batch Adaptive Training for Deep Learning: Lower Loss and Larger Step Size

1 code implementation • ICLR 2018 • Runyao Chen, Kun Wu, Ping Luo

Mini-batch gradient descent and its variants are commonly used in deep learning.

Image Classification

Paper
Code

Multi-Level Contrastive Learning for Dense Prediction Task

1 code implementation • 4 Apr 2023 • Qiushan Guo, Yizhou Yu, Yi Jiang, Jiannan Wu, Zehuan Yuan, Ping Luo

We extend our pretext task to supervised pre-training, which achieves a similar performance to self-supervised learning.

Contrastive Learning Self-Supervised Learning

Paper
Code

Exploiting Context Information for Generic Event Boundary Captioning

1 code implementation • 3 Jul 2022 • Jinrui Zhang, Teng Wang, Feng Zheng, Ran Cheng, Ping Luo

Previous methods only process the information of a single boundary at a time, which lacks utilization of video context information.

Boundary Captioning

Paper
Code

Real-time End-to-End Video Text Spotter with Contrastive Representation Learning

1 code implementation • 18 Jul 2022 • Wejia Wu, Zhuang Li, Jiahong Li, Chunhua Shen, Hong Zhou, Size Li, Zhongyuan Wang, Ping Luo

Our contributions are three-fold: 1) CoText simultaneously address the three tasks (e. g., text detection, tracking, recognition) in a real-time end-to-end trainable framework.

Contrastive Learning Representation Learning +2

Paper
Code

Understanding Self-Supervised Pretraining with Part-Aware Representation Learning

1 code implementation • 27 Jan 2023 • Jie Zhu, Jiyang Qi, Mingyu Ding, Xiaokang Chen, Ping Luo, Xinggang Wang, Wenyu Liu, Leye Wang, Jingdong Wang

The study is mainly motivated by that random views, used in contrastive learning, and random masked (visible) patches, used in masked image modeling, are often about object parts.

Contrastive Learning Object +1

Paper
Code

WIDER FACE: A Face Detection Benchmark

1 code implementation • CVPR 2016 • Shuo Yang, Ping Luo, Chen Change Loy, Xiaoou Tang

Face detection is one of the most studied topics in the computer vision community.

Ranked #34 on Face Detection on WIDER Face (Medium)

Face Detection

Paper
Code

Zero-shot Generative Linguistic Steganography

1 code implementation • 16 Mar 2024 • Ke Lin, Yiyang Luo, Zijian Zhang, Ping Luo

Generative linguistic steganography attempts to hide secret messages into covertext.

In-Context Learning Linguistic steganography

Paper
Code

Batch Kalman Normalization: Towards Training Deep Neural Networks with Micro-Batches

no code implementations • 9 Feb 2018 • Guangrun Wang, Jiefeng Peng, Ping Luo, Xinjiang Wang, Liang Lin

As an indispensable component, Batch Normalization (BN) has successfully improved the training of deep neural networks (DNNs) with mini-batches, by normalizing the distribution of the internal representation for each hidden layer.

Image Classification

Paper
Add Code

Mix-and-Match Tuning for Self-Supervised Semantic Segmentation

no code implementations • 2 Dec 2017 • Xiaohang Zhan, Ziwei Liu, Ping Luo, Xiaoou Tang, Chen Change Loy

The key of this new form of learning is to design a proxy task (e. g. image colorization), from which a discriminative loss can be formulated on unlabeled data.

Colorization Image Colorization +3

Paper
Add Code

Tree-Structured Neural Machine for Linguistics-Aware Sentence Generation

no code implementations • 30 Apr 2017 • Ganbin Zhou, Ping Luo, Rongyu Cao, Yijun Xiao, Fen Lin, Bo Chen, Qing He

Then, with a proposed tree-structured search method, the model is able to generate the most probable responses in the form of dependency trees, which are finally flattened into sequences as the system output.

Sentence

Paper
Add Code

From Facial Expression Recognition to Interpersonal Relation Prediction

no code implementations • 21 Sep 2016 • Zhanpeng Zhang, Ping Luo, Chen Change Loy, Xiaoou Tang

Unlike existing models that typically learn from facial expression labels alone, we devise an effective multitask network that is capable of learning from rich auxiliary attributes such as gender, age, and head pose, beyond just facial expression data.

Attribute Facial Expression Recognition +2

Paper
Add Code

Faceness-Net: Face Detection through Deep Facial Part Responses

no code implementations • 29 Jan 2017 • Shuo Yang, Ping Luo, Chen Change Loy, Xiaoou Tang

We propose a deep convolutional neural network (CNN) for face detection leveraging on facial attributes based supervision.

Face Detection

Paper
Add Code

Deep Learning Markov Random Field for Semantic Segmentation

no code implementations • 23 Jun 2016 • Ziwei Liu, Xiaoxiao Li, Ping Luo, Chen Change Loy, Xiaoou Tang

Semantic segmentation tasks can be well modeled by Markov Random Field (MRF).

Segmentation Semantic Segmentation +2

Paper
Add Code

Semantic Image Segmentation via Deep Parsing Network

no code implementations • ICCV 2015 • Ziwei Liu, Xiaoxiao Li, Ping Luo, Chen Change Loy, Xiaoou Tang

This paper addresses semantic image segmentation by incorporating rich information into Markov Random Field (MRF), including high-order relations and mixture of label contexts.

Ranked #89 on Semantic Segmentation on Cityscapes test

Image Segmentation Semantic Segmentation

Paper
Add Code

From Facial Parts Responses to Face Detection: A Deep Learning Approach

1 code implementation • ICCV 2015 • Shuo Yang, Ping Luo, Chen Change Loy, Xiaoou Tang

In this paper, we propose a novel deep convolutional network (DCN) that achieves outstanding performance on FDDB, PASCAL Face, and AFW.

Face Detection

Paper
Code

Learning Social Relation Traits from Face Images

no code implementations • ICCV 2015 • Zhanpeng Zhang, Ping Luo, Chen Change Loy, Xiaoou Tang

Social relation defines the association, e. g, warm, friendliness, and dominance, between two or more people.

Attribute Relation

Paper
Add Code

Learning Deep Representation for Face Alignment with Auxiliary Attributes

no code implementations • 18 Aug 2014 • Zhanpeng Zhang, Ping Luo, Chen Change Loy, Xiaoou Tang

In this study, we show that landmark detection or face alignment task is not a single and independent problem.

Ranked #13 on Unsupervised Facial Landmark Detection on MAFL

Attribute Face Alignment

Paper
Add Code

DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection

no code implementations • CVPR 2015 • Wanli Ouyang, Xiaogang Wang, Xingyu Zeng, Shi Qiu, Ping Luo, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Chen-Change Loy, Xiaoou Tang

In this paper, we propose deformable deep convolutional neural networks for generic object detection.

Object object-detection +1

Paper
Add Code

Learning to Recognize Pedestrian Attribute

no code implementations • 5 Jan 2015 • Yubin Deng, Ping Luo, Chen Change Loy, Xiaoou Tang

Learning to recognize pedestrian attributes at far distance is a challenging problem in visual surveillance since face and body close-shots are hardly available; instead, only far-view image frames of pedestrian are given.

Attribute Informativeness

Paper
Add Code

Clothing Co-Parsing by Joint Image Segmentation and Labeling

no code implementations • CVPR 2014 • Wei Yang, Ping Luo, Liang Lin

This paper aims at developing an integrated system of clothing co-parsing, in order to jointly parse a set of clothing images (unsegmented but annotated with tags) into semantic configurations.

Image Segmentation Semantic Segmentation

Paper
Add Code

Pedestrian Detection aided by Deep Learning Semantic Tasks

no code implementations • CVPR 2015 • Yonglong Tian, Ping Luo, Xiaogang Wang, Xiaoou Tang

Rather than expensively annotating scene attributes, we transfer attributes information from existing scene segmentation datasets to the pedestrian dataset, by proposing a novel deep model to learn high-level features from multiple tasks and multiple data sources.

Ranked #30 on Pedestrian Detection on Caltech

Pedestrian Detection Scene Segmentation

Paper
Add Code

DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection

no code implementations • 11 Sep 2014 • Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Yuanjun Xiong, Chen Qian, Zhenyao Zhu, Ruohui Wang, Chen-Change Loy, Xiaogang Wang, Xiaoou Tang

In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty.

Object object-detection +1

Paper
Add Code

Deep Learning Multi-View Representation for Face Recognition

no code implementations • 26 Jun 2014 • Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang

Intriguingly, even without accessing 3D data, human not only can recognize face identity, but can also imagine face images of a person under different viewpoints given a single 2D image, making face perception in the brain robust to view changes.

Face Recognition

Paper
Add Code

Recover Canonical-View Faces in the Wild with Deep Neural Networks

no code implementations • 14 Apr 2014 • Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang

Face images in the wild undergo large intra-personal variations, such as poses, illuminations, occlusions, and low resolutions, which cause great challenges to face-related applications.

Face Reconstruction Face Verification

Paper
Add Code

SCAN: Self-and-Collaborative Attention Network for Video Person Re-identification

no code implementations • 16 Jul 2018 • Ruimao Zhang, Hongbin Sun, Jingyu Li, Yuying Ge, Liang Lin, Ping Luo, Xiaogang Wang

To address the above issues, we present a novel and practical deep architecture for video person re-identification termed Self-and-Collaborative Attention Network (SCAN).

Video-Based Person Re-Identification

Paper
Add Code

Temporal Sequence Distillation: Towards Few-Frame Action Recognition in Videos

no code implementations • 15 Aug 2018 • Zhaoyang Zhang, Zhanghui Kuang, Ping Luo, Litong Feng, Wei zhang

Secondly, TSD significantly reduces the computations to run video action recognition with compressed frames on the cloud, while maintaining high recognition accuracies.

Action Recognition In Videos Temporal Action Localization

Paper
Add Code

Hierarchical Neural Network for Extracting Knowledgeable Snippets and Documents

no code implementations • 22 Aug 2018 • Ganbin Zhou, Rongyu Cao, Xiang Ao, Ping Luo, Fen Lin, Leyu Lin, Qing He

Additionally, a "low-level sharing, high-level splitting" structure of CNN is designed to handle the documents from different content domains.

Paper
Add Code

Towards Understanding Regularization in Batch Normalization

1 code implementation • ICLR 2019 • Ping Luo, Xinjiang Wang, Wenqi Shao, Zhanglin Peng

Batch Normalization (BN) improves both convergence and generalization in training neural networks.

Paper
Code

Do Normalization Layers in a Deep ConvNet Really Need to Be Distinct?

no code implementations • 19 Nov 2018 • Ping Luo, Zhanglin Peng, Jiamin Ren, Ruimao Zhang

Our results suggest that (1) using distinct normalizers improves both learning and generalization of a ConvNet; (2) the choices of normalizers are more related to depth and batch size, but less relevant to parameter initialization, learning rate decay, and solver; (3) different tasks and datasets have different behaviors when learning to select normalizers.

Paper
Add Code

FaceFeat-GAN: a Two-Stage Approach for Identity-Preserving Face Synthesis

no code implementations • 4 Dec 2018 • Yujun Shen, Bolei Zhou, Ping Luo, Xiaoou Tang

In the second stage, they compete in the image domain to render photo-realistic images that contain high diversity but preserve identity.

Face Generation Vocal Bursts Valence Prediction

Paper
Add Code

Kalman Normalization: Normalizing Internal Representations Across Network Layers

no code implementations • NeurIPS 2018 • Guangrun Wang, Jiefeng Peng, Ping Luo, Xinjiang Wang, Liang Lin

In this paper, we present a novel normalization method, called Kalman Normalization (KN), for improving and accelerating the training of DNNs, particularly under the context of micro-batches.

object-detection Object Detection

Paper
Add Code

Multi-View Perceptron: a Deep Model for Learning Face Identity and View Representations

no code implementations • NeurIPS 2014 • Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang

Face Recognition

Paper
Add Code

FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis

no code implementations • CVPR 2018 • Yujun Shen, Ping Luo, Junjie Yan, Xiaogang Wang, Xiaoou Tang

Existing methods typically formulate GAN as a two-player game, where a discriminator distinguishes face images from the real and synthesized domains, while a generator reduces its discriminativeness by synthesizing a face of photo-realistic quality.

Face Generation

Paper
Add Code

Learning Deep Architectures via Generalized Whitened Neural Networks

no code implementations • ICML 2017 • Ping Luo

Whitened Neural Network (WNN) is a recent advanced deep architecture, which improves convergence and generalization of canonical neural networks by whitening their internal hidden representation.

Computational Efficiency

Paper
Add Code

Switchable Deep Network for Pedestrian Detection

no code implementations • CVPR 2014 • Ping Luo, Yonglong Tian, Xiaogang Wang, Xiaoou Tang

In this paper, we propose a Switchable Deep Network (SDN) for pedestrian detection.

Pedestrian Detection

Paper
Add Code

DeepFashion: Powering Robust Clothes Recognition and Retrieval With Rich Annotations

no code implementations • CVPR 2016 • Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, Xiaoou Tang

To demonstrate the advantages of DeepFashion, we propose a new deep model, namely FashionNet, which learns clothing features by jointly predicting clothing attributes and landmarks.

Retrieval

Paper
Add Code

Learning Object Interactions and Descriptions for Semantic Image Segmentation

no code implementations • CVPR 2017 • Guangrun Wang, Ping Luo, Liang Lin, Xiaogang Wang

This work significantly increases segmentation accuracy of CNNs by learning from an Image Descriptions in the Wild (IDW) dataset.

Image Captioning Image Segmentation +3

Paper
Add Code

Deep Learning Strong Parts for Pedestrian Detection

no code implementations • ICCV 2015 • Yonglong Tian, Ping Luo, Xiaogang Wang, Xiaoou Tang

Third, each part detector in DeepParts is a strong detector that can detect pedestrian by observing only a part of a proposal.

Occlusion Handling Pedestrian Detection

Paper
Add Code

Deep Dual Learning for Semantic Image Segmentation

no code implementations • ICCV 2017 • Ping Luo, Guangrun Wang, Liang Lin, Xiaogang Wang

The estimated labelmaps that capture accurate object classes and boundaries are used as ground truths in training to boost performance.

Image Segmentation Semantic Segmentation

Paper
Add Code

WIDER Face and Pedestrian Challenge 2018: Methods and Results

no code implementations • 19 Feb 2019 • Chen Change Loy, Dahua Lin, Wanli Ouyang, Yuanjun Xiong, Shuo Yang, Qingqiu Huang, Dongzhan Zhou, Wei Xia, Quanquan Li, Ping Luo, Junjie Yan, Jian-Feng Wang, Zuoxin Li, Ye Yuan, Boxun Li, Shuai Shao, Gang Yu, Fangyun Wei, Xiang Ming, Dong Chen, Shifeng Zhang, Cheng Chi, Zhen Lei, Stan Z. Li, Hongkai Zhang, Bingpeng Ma, Hong Chang, Shiguang Shan, Xilin Chen, Wu Liu, Boyan Zhou, Huaxiong Li, Peng Cheng, Tao Mei, Artem Kukharenko, Artem Vasenin, Nikolay Sergievskiy, Hua Yang, Liangqi Li, Qiling Xu, Yuan Hong, Lin Chen, Mingjun Sun, Yirong Mao, Shiying Luo, Yongjun Li, Ruiping Wang, Qiaokang Xie, Ziyang Wu, Lei Lu, Yiheng Liu, Wengang Zhou

This paper presents a review of the 2018 WIDER Challenge on Face and Pedestrian.

Face Detection Pedestrian Detection +2

Paper
Add Code

Atom Responding Machine for Dialog Generation

no code implementations • 14 May 2019 • Ganbin Zhou, Ping Luo, Jingwu Chen, Fen Lin, Leyu Lin, Qing He

To enrich the generated responses, ARM introduces a large number of molecule-mechanisms as various responding styles, which are conducted by taking different combinations from a few atom-mechanisms.

Paper
Add Code

Switchable Normalization for Learning-to-Normalize Deep Representation

no code implementations • 22 Jul 2019 • Ping Luo, Ruimao Zhang, Jiamin Ren, Zhanglin Peng, Jingyu Li

Analyses of SN are also presented to answer the following three questions: (a) Is it useful to allow each normalization layer to select its own normalizer?

Paper
Add Code

Deep Self-Learning From Noisy Labels

no code implementations • ICCV 2019 • Jiangfan Han, Ping Luo, Xiaogang Wang

Unlike previous works constrained by many conditions, making them infeasible to real noisy cases, this work presents a novel deep self-learning framework to train a robust network on the real noisy datasets without extra supervision.

Learning with noisy labels Self-Learning

Paper
Add Code

Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once

no code implementations • ICCV 2019 • Jiangfan Han, Xiaoyi Dong, Ruimao Zhang, Dong-Dong Chen, Weiming Zhang, Nenghai Yu, Ping Luo, Xiaogang Wang

Recently, generation-based methods have received much attention since they directly use feed-forward networks to generate the adversarial samples, which avoid the time-consuming iterative attacking procedure in optimization-based and gradient-based methods.

Classification General Classification

Paper
Add Code

Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks

no code implementations • ICCV 2019 • Zhaoyang Zhang, Jingyu Li, Wenqi Shao, Zhanglin Peng, Ruimao Zhang, Xiaogang Wang, Ping Luo

ResNeXt, still suffers from the sub-optimal performance due to manually defining the number of groups as a constant over all of the layers.

Paper
Add Code

Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid

no code implementations • ICCV 2019 • Zhanghui Kuang, Yiming Gao, Guanbin Li, Ping Luo, Yimin Chen, Liang Lin, Wayne Zhang

To address this issue, we propose a novel Graph Reasoning Network (GRNet) on a Similarity Pyramid, which learns similarities between a query and a gallery cloth by using both global and local representations in multiple scales.

Ranked #4 on Image Retrieval on DeepFashion - Consumer-to-shop (Rank-1 metric)

Image Retrieval Retrieval

Paper
Add Code

Scale Calibrated Training: Improving Generalization of Deep Networks via Scale-Specific Normalization

no code implementations • 31 Aug 2019 • Zhuoran Yu, Aojun Zhou, Yukun Ma, Yudian Li, Xiaohan Zhang, Ping Luo

Experiment results show that SCT improves accuracy of single Resnet-50 on ImageNet by 1. 7% and 11. 5% accuracy when testing on image sizes of 224 and 128 respectively.

Data Augmentation Image Classification +1

Paper
Add Code

PDA: Progressive Data Augmentation for General Robustness of Deep Neural Networks

no code implementations • 11 Sep 2019 • Hang Yu, Aishan Liu, Xianglong Liu, Gengchao Li, Ping Luo, Ran Cheng, Jichen Yang, Chongzhi Zhang

In other words, DNNs trained with PDA are able to obtain more robustness against both adversarial attacks as well as common corruptions than the recent state-of-the-art methods.

Data Augmentation

Paper
Add Code

Vision-Infused Deep Audio Inpainting

no code implementations • ICCV 2019 • Hang Zhou, Ziwei Liu, Xudong Xu, Ping Luo, Xiaogang Wang

Extensive experiments demonstrate that our framework is capable of inpainting realistic and varying audio segments with or without visual contexts.

Audio inpainting Image Inpainting

Paper
Add Code

Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow

no code implementations • 28 Nov 2019 • Mingyu Ding, Zhe Wang, Bolei Zhou, Jianping Shi, Zhiwu Lu, Ping Luo

Moreover, our framework is able to utilize both labeled and unlabeled frames in the video through joint training, while no additional calculation is required in inference.

Optical Flow Estimation Segmentation +3

Paper
Add Code

How Does BN Increase Collapsed Neural Network Filters?

no code implementations • 30 Jan 2020 • Sheng Zhou, Xinjiang Wang, Ping Luo, Litong Feng, Wenjie Li, Wei zhang

This phenomenon is caused by the normalization effect of BN, which induces a non-trainable region in the parameter space and reduces the network capacity as a result.

object-detection Object Detection

Paper
Add Code

Exemplar Normalization for Learning Deep Representation

no code implementations • CVPR 2020 • Ruimao Zhang, Zhanglin Peng, Lingyun Wu, Zhen Li, Ping Luo

This work investigates a novel dynamic learning-to-normalize (L2N) problem by proposing Exemplar Normalization (EN), which is able to learn different normalization methods for different convolutional layers and image samples of a deep network.

Semantic Segmentation

Paper
Add Code

Convolution-Weight-Distribution Assumption: Rethinking the Criteria of Channel Pruning

no code implementations • 24 Apr 2020 • Zhongzhan Huang, Wenqi Shao, Xinjiang Wang, Liang Lin, Ping Luo

Channel pruning is a popular technique for compressing convolutional neural networks (CNNs), where various pruning criteria have been proposed to remove the redundant filters.

Paper
Add Code

Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation

no code implementations • ECCV 2020 • Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, Ping Luo

The modules of HGG can be trained end-to-end with the keypoint detection network and is able to supervise the grouping process in a hierarchical manner.

Ranked #3 on Keypoint Detection on OCHuman

2D Human Pose Estimation Clustering +4

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.