Search Results for author: Lei Zhang

Found 577 papers, 254 papers with code

CvT: Introducing Convolutions to Vision Transformers

14 code implementations • ICCV 2021 • Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang

We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.

Ranked #3 on Image Classification on Flowers-102 (using extra training data)

Image Classification

124,984

Paper
Code

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

7 code implementations • 9 Mar 2023 • Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang

To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion.

Ranked #1 on Zero-Shot Object Detection on MSCOCO

Referring Expression Referring Expression Comprehension +2

124,984

Paper
Code

Dynamic Head: Unifying Object Detection Heads with Attentions

3 code implementations • CVPR 2021 • Xiyang Dai, Yinpeng Chen, Bin Xiao, Dongdong Chen, Mengchen Liu, Lu Yuan, Lei Zhang

In this paper, we present a novel dynamic head framework to unify object detection heads with attentions.

Ranked #3 on Object Detection on COCO 2017 val (AP75 metric)

Object object-detection +1

27,790

Paper
Code

MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

11 code implementations • 27 Jul 2016 • Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, Jianfeng Gao

In this paper, we design a benchmark task and provide the associated datasets for recognizing face images and link them to corresponding entity keys in a knowledge base.

Face Recognition Image Captioning

21,265

Paper
Code

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

15 code implementations • 7 Mar 2022 • Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum

Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results.

Ranked #1 on Real-Time Object Detection on COCO 2017 val

Real-Time Object Detection

13,472

Paper
Code

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

1 code implementation • 25 Jan 2024 • Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang

We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM).

Segmentation

13,472

Paper
Code

HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

20 code implementations • CVPR 2020 • Bowen Cheng, Bin Xiao, Jingdong Wang, Honghui Shi, Thomas S. Huang, Lei Zhang

HigherHRNet even surpasses all top-down methods on CrowdPose test (67. 6% AP), suggesting its robustness in crowded scene.

Ranked #2 on Pose Estimation on UAV-Human

2D Human Pose Estimation Multi-Person Pose Estimation +2

12,059

Paper
Code

Lite-HRNet: A Lightweight High-Resolution Network

15 code implementations • CVPR 2021 • Changqian Yu, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, Jingdong Wang

We introduce a lightweight unit, conditional channel weighting, to replace costly pointwise (1x1) convolutions in shuffle blocks.

Ranked #33 on Pose Estimation on COCO test-dev (using extra training data)

Pose Estimation Real-Time Semantic Segmentation +1

12,059

Paper
Code

VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results

1 code implementation • International Conference on Computer Vision Workshops 2019 • Dawei Du, Pengfei Zhu, Longyin Wen, Xiao Bian, Haibin Lin, QinGhua Hu, Tao Peng, Jiayu Zheng, Xinyao Wang, Yue Zhang, Liefeng Bo, Hailin Shi, Rui Zhu, Aashish Kumar, Aijin Li, Almaz Zinollayev, Anuar Askergaliyev, Arne Schumann, Binjie Mao, Byeongwon Lee, Chang Liu, Changrui Chen, Chunhong Pan, Chunlei Huo, Da Yu, Dechun Cong, Dening Zeng, Dheeraj Reddy Pailla, Di Li, Dong Wang, Donghyeon Cho, Dongyu Zhang, Furui Bai, George Jose, Guangyu Gao, Guizhong Liu, Haitao Xiong, Hao Qi, Haoran Wang, Heqian Qiu, Hongliang Li, Huchuan Lu, Ildoo Kim, Jaekyum Kim, Jane Shen, Jihoon Lee, Jing Ge, Jingjing Xu, Jingkai Zhou, Jonas Meier, Jun Won Choi, Junhao Hu, Junyi Zhang, Junying Huang, Kaiqi Huang, Keyang Wang, Lars Sommer, Lei Jin, Lei Zhang

Results of 33 object detection algorithms are presented.

Object object-detection +1

12,059

Paper
Code

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation

9 code implementations • CVPR 2023 • Feng Li, Hao Zhang, Huaizhe xu, Shilong Liu, Lei Zhang, Lionel M. Ni, Heung-Yeung Shum

In this paper we present Mask DINO, a unified object detection and segmentation framework.

Ranked #1 on Panoptic Segmentation on COCO test-dev

Image Segmentation Instance Segmentation +3

12,059

Paper
Code

Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use Cases

1 code implementation • 26 Mar 2023 • Yunjie Ji, Yong Deng, Yan Gong, Yiping Peng, Qiang Niu, Lei Zhang, Baochang Ma, Xiangang Li

However current research rarely studies the impact of different amounts of instruction data on model performance, especially in the real-world use cases.

Math

7,541

Paper
Code

WantWords: An Open-source Online Reverse Dictionary System

1 code implementation • EMNLP 2020 • Fanchao Qi, Lei Zhang, Yanhui Yang, Zhiyuan Liu, Maosong Sun

A reverse dictionary takes descriptions of words as input and outputs words semantically matching the input descriptions.

Reverse Dictionary

6,950

Paper
Code

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

65 code implementations • CVPR 2018 • Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang

Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning.

Ranked #29 on Visual Question Answering (VQA) on VQA v2 test-std

Image Captioning Visual Question Answering

5,415

Paper
Code

Large-Scale Intelligent Microservices

1 code implementation • 17 Sep 2020 • Mark Hamilton, Nick Gonsalves, Christina Lee, Anand Raman, Brendan Walsh, Siddhartha Prasad, Dalitso Banda, Lucy Zhang, Mei Gao, Lei Zhang, William T. Freeman

Deploying Machine Learning (ML) algorithms within databases is a challenge due to the varied computational footprints of modern ML algorithms and the myriad of database technologies each with its own restrictive syntax.

Anomaly Detection

4,967

Paper
Code

Osprey: Pixel Understanding with Visual Instruction Tuning

2 code implementations • 15 Dec 2023 • Yuqian Yuan, Wentong Li, Jian Liu, Dongqi Tang, Xinjie Luo, Chi Qin, Lei Zhang, Jianke Zhu

In this paper, we propose Osprey, a mask-text instruction tuning approach, to extend MLLMs by incorporating fine-grained mask regions into language instruction, aiming at achieving pixel-wise visual understanding.

Language Modelling

3,357

Paper
Code

AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results

3 code implementations • 15 Sep 2020 • Kai Zhang, Martin Danelljan, Yawei Li, Radu Timofte, Jie Liu, Jie Tang, Gangshan Wu, Yu Zhu, Xiangyu He, Wenjie Xu, Chenghua Li, Cong Leng, Jian Cheng, Guangyang Wu, Wenyi Wang, Xiaohong Liu, Hengyuan Zhao, Xiangtao Kong, Jingwen He, Yu Qiao, Chao Dong, Maitreya Suin, Kuldeep Purohit, A. N. Rajagopalan, Xiaochuan Li, Zhiqiang Lang, Jiangtao Nie, Wei Wei, Lei Zhang, Abdul Muqeet, Jiwon Hwang, Subin Yang, JungHeum Kang, Sung-Ho Bae, Yongwoo Kim, Geun-Woo Jeon, Jun-Ho Choi, Jun-Hyuk Kim, Jong-Seok Lee, Steven Marty, Eric Marty, Dongliang Xiong, Siang Chen, Lin Zha, Jiande Jiang, Xinbo Gao, Wen Lu, Haicheng Wang, Vineeth Bhaskara, Alex Levinshtein, Stavros Tsogkas, Allan Jepson, Xiangzhen Kong, Tongtong Zhao, Shanshan Zhao, Hrishikesh P. S, Densen Puthussery, Jiji C. V, Nan Nan, Shuai Liu, Jie Cai, Zibo Meng, Jiaming Ding, Chiu Man Ho, Xuehui Wang, Qiong Yan, Yuzhi Zhao, Long Chen, Jiangtao Zhang, Xiaotong Luo, Liang Chen, Yanyun Qu, Long Sun, Wenhao Wang, Zhenbing Liu, Rushi Lan, Rao Muhammad Umer, Christian Micheloni

This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results.

Image Super-Resolution

2,713

Paper
Code

Gradient Centralization: A New Optimization Technique for Deep Neural Networks

7 code implementations • ECCV 2020 • Hongwei Yong, Jianqiang Huang, Xian-Sheng Hua, Lei Zhang

It has been shown that using the first and second order statistics (e. g., mean and variance) to perform Z-score standardization on network activations or weight vectors, such as batch normalization (BN) and weight standardization (WS), can improve the training performance.

Fine-Grained Image Classification General Classification

2,643

Paper
Code

Tag2Text: Guiding Vision-Language Model via Image Tagging

2 code implementations • 10 Mar 2023 • Xinyu Huang, Youcai Zhang, Jinyu Ma, Weiwei Tian, Rui Feng, Yuejie Zhang, Yaqian Li, Yandong Guo, Lei Zhang

This paper presents Tag2Text, a vision language pre-training (VLP) framework, which introduces image tagging into vision-language models to guide the learning of visual-linguistic features.

Language Modelling TAG

2,423

Paper
Code

Recognize Anything: A Strong Image Tagging Model

2 code implementations • 6 Jun 2023 • Youcai Zhang, Xinyu Huang, Jinyu Ma, Zhaoyang Li, Zhaochuan Luo, Yanchun Xie, Yuzhuo Qin, Tong Luo, Yaqian Li, Shilong Liu, Yandong Guo, Lei Zhang

We are releasing the RAM at \url{https://recognize-anything. github. io/} to foster the advancements of large models in computer vision.

Semantic Parsing

2,423

Paper
Code

Open-Set Image Tagging with Multi-Grained Text Supervision

2 code implementations • 23 Oct 2023 • Xinyu Huang, Yi-Jie Huang, Youcai Zhang, Weiwei Tian, Rui Feng, Yuejie Zhang, Yanchun Xie, Yaqian Li, Lei Zhang

Specifically, for predefined commonly used tag categories, RAM++ showcases 10. 2 mAP and 15. 4 mAP enhancements over CLIP on OpenImages and ImageNet.

Human-Object Interaction Detection Open Set Learning +1

2,423

Paper
Code

GAN Prior Embedded Network for Blind Face Restoration in the Wild

3 code implementations • CVPR 2021 • Tao Yang, Peiran Ren, Xuansong Xie, Lei Zhang

The proposed GAN prior embedded network (GPEN) is easy-to-implement, and it can generate visually photo-realistic results.

Ranked #1 on Blind Face Restoration on CelebA-HQ

Blind Face Restoration Generative Adversarial Network +1

2,295

Paper
Code

Mutual Consistency Learning for Semi-supervised Medical Image Segmentation

2 code implementations • 21 Sep 2021 • Yicheng Wu, ZongYuan Ge, Donghao Zhang, Minfeng Xu, Lei Zhang, Yong Xia, Jianfei Cai

In this paper, we propose a novel mutual consistency network (MC-Net+) to effectively exploit the unlabeled data for semi-supervised medical image segmentation.

Image Segmentation Segmentation +2

1,993

Paper
Code

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

16 code implementations • CVPR 2022 • Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang

Our method is universal and can be easily plugged into any DETR-like methods by adding dozens of lines of code to achieve a remarkable improvement.

Object Detection

1,978

Paper
Code

Grounded Language-Image Pre-training

2 code implementations • CVPR 2022 • Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq-Neng Hwang, Kai-Wei Chang, Jianfeng Gao

The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich.

Ranked #1 on 2D Object Detection on RF100

Described Object Detection Few-Shot Object Detection +1

1,959

Paper
Code

Semantic-SAM: Segment and Recognize Anything at Any Granularity

1 code implementation • 10 Jul 2023 • Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao

In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.

Image Segmentation Segmentation +1

1,917

Paper
Code

Visual In-Context Prompting

3 code implementations • 22 Nov 2023 • Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao

In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain.

Segmentation Visual Prompting

1,917

Paper
Code

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

1 code implementation • 21 Mar 2024 • Qing Jiang, Feng Li, Zhaoyang Zeng, Tianhe Ren, Shilong Liu, Lei Zhang

Recognizing the complementary strengths and weaknesses of both text and visual prompts, we introduce T-Rex2 that synergizes both prompts within a single model through contrastive learning.

Contrastive Learning Descriptive +3

1,855

Paper
Code

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

7 code implementations • ICLR 2022 • Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang

We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR.

Ranked #11 on 2D Object Detection on SARDet-100K

Object Detection

1,820

Paper
Code

detrex: Benchmarking Detection Transformers

1 code implementation • 12 Jun 2023 • Tianhe Ren, Shilong Liu, Feng Li, Hao Zhang, Ailing Zeng, Jie Yang, Xingyu Liao, Ding Jia, Hongyang Li, He Cao, Jianan Wang, Zhaoyang Zeng, Xianbiao Qi, Yuhui Yuan, Jianwei Yang, Lei Zhang

To address this issue, we develop a unified, highly modular, and lightweight codebase called detrex, which supports a majority of the mainstream DETR-based instance recognition algorithms, covering various fundamental tasks, including object detection, segmentation, and pose estimation.

Benchmarking object-detection +2

1,820

Paper
Code

Are Transformers Effective for Time Series Forecasting?

4 code implementations • 26 May 2022 • Ailing Zeng, Muxi Chen, Lei Zhang, Qiang Xu

Recently, there has been a surge of Transformer-based solutions for the long-term time series forecasting (LTSF) task.

Ranked #1 on Time Series Forecasting on ETTh1 (96) Univariate

Anomaly Detection Temporal Relation Extraction +2

1,783

Paper
Code

Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

20 code implementations • 13 Aug 2016 • Kai Zhang, WangMeng Zuo, Yunjin Chen, Deyu Meng, Lei Zhang

Discriminative model learning for image denoising has been recently attracting considerable attentions due to its favorable denoising performance.

Ranked #4 on JPEG Artifact Correction on LIVE1 (Quality 40 Grayscale)

Color Image Denoising Image Deblocking +3

1,382

Paper
Code

A Simple Framework for Open-Vocabulary Segmentation and Detection

2 code implementations • ICCV 2023 • Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang

We present OpenSeeD, a simple Open-vocabulary Segmentation and Detection framework that jointly learns from different segmentation and detection datasets.

Ranked #2 on Instance Segmentation on ADE20K val (using extra training data)

Instance Segmentation Panoptic Segmentation +2

1,246

Paper
Code

Unified Vision-Language Pre-Training for Image Captioning and VQA

3 code implementations • 24 Sep 2019 • Luowei Zhou, Hamid Palangi, Lei Zhang, Houdong Hu, Jason J. Corso, Jianfeng Gao

The model is unified in that (1) it can be fine-tuned for either vision-language generation (e. g., image captioning) or understanding (e. g., visual question answering) tasks, and (2) it uses a shared multi-layer transformer network for both encoding and decoding, which differs from many existing methods where the encoder and decoder are implemented using separate models.

Ranked #1 on Image Captioning on Flickr30k Captions test

Image Captioning Question Answering +2

1,202

Paper
Code

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

4 code implementations • ECCV 2020 • Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiao-Wei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, Jianfeng Gao

Large-scale pre-training methods of learning cross-modal representations on image-text pairs are becoming popular for vision-language tasks.

Ranked #1 on Image Retrieval on MS COCO (Recall@10 metric)

Image Captioning Image Retrieval +3

1,202

Paper
Code

Accelerating Dataset Distillation via Model Augmentation

2 code implementations • CVPR 2023 • Lei Zhang, Jie Zhang, Bowen Lei, Subhabrata Mukherjee, Xiang Pan, Bo Zhao, Caiwen Ding, Yao Li, Dongkuan Xu

Dataset Distillation (DD), a newly emerging field, aims at generating much smaller but efficient synthetic training datasets from large ones.

1,164

Paper
Code

VinVL: Revisiting Visual Representations in Vision-Language Models

7 code implementations • CVPR 2021 • Pengchuan Zhang, Xiujun Li, Xiaowei Hu, Jianwei Yang, Lei Zhang, Lijuan Wang, Yejin Choi, Jianfeng Gao

In our experiments we feed the visual features generated by the new object detection model into a Transformer-based VL fusion model \oscar \cite{li2020oscar}, and utilize an improved approach \short\ to pre-train the VL model and fine-tune it on a wide range of downstream VL tasks.

Ranked #2 on Image-text matching on CommercialAdsDataset

Image Captioning Image-text matching +4

1,027

Paper
Code

Blind Face Restoration via Deep Multi-scale Component Dictionaries

1 code implementation • ECCV 2020 • Xiaoming Li, Chaofeng Chen, Shangchen Zhou, Xianhui Lin, WangMeng Zuo, Lei Zhang

Next, with the degraded input, we match and select the most similar component features from their corresponding dictionaries and transfer the high-quality details to the input via the proposed dictionary feature transfer (DFT) block.

Ranked #32 on Video Super-Resolution on MSU Video Super Resolution Benchmark: Detail Restoration

Blind Face Restoration Video Super-Resolution

908

Paper
Code

Deep Plug-and-Play Super-Resolution for Arbitrary Blur Kernels

1 code implementation • CVPR 2019 • Kai Zhang, WangMeng Zuo, Lei Zhang

In this paper, we propose a principled formulation and framework by extending bicubic degradation based deep SISR with the help of plug-and-play framework to handle LR images with arbitrary blur kernels.

Deblurring Image Restoration +1

833

Paper
Code

Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization

1 code implementation • 28 Aug 2023 • Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, Lei Zhang

Diffusion models have demonstrated impressive performance in various image generation, editing, enhancement and translation tasks.

Image Enhancement Image Generation +3

790

Paper
Code

Learning Image-adaptive 3D Lookup Tables for High Performance Photo Enhancement in Real-time

1 code implementation • 30 Sep 2020 • Hui Zeng, Jianrui Cai, Lida Li, Zisheng Cao, Lei Zhang

The small CNN works on the down-sampled version of the input image to predict content-dependent weights to fuse the multiple basis 3D LUTs into an image-adaptive one, which is employed to transform the color and tone of source images efficiently.

Ranked #5 on Image Enhancement on MIT-Adobe 5k (SSIM on proRGB metric)

4k Image Enhancement +2

735

Paper
Code

A Strong and Reproducible Object Detector with Only Public Datasets

2 code implementations • 25 Apr 2023 • Tianhe Ren, Jianwei Yang, Shilong Liu, Ailing Zeng, Feng Li, Hao Zhang, Hongyang Li, Zhaoyang Zeng, Lei Zhang

This work presents Focal-Stable-DINO, a strong and reproducible object detection model which achieves 64. 6 AP on COCO val2017 and 64. 8 AP on COCO test-dev using only 700M parameters without any test time augmentation.

Ranked #5 on Object Detection on COCO minival (using extra training data)

object-detection Object Detection

649

Paper
Code

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

1 code implementation • 9 Nov 2023 • Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang, Jianfeng Gao, Chunyuan Li

LLaVA-Plus is a general-purpose multimodal assistant that expands the capabilities of large multimodal models.

Ranked #1 on LMM real-life tasks on Leaderboard

Instruction Following LLM real-life tasks +3

621

Paper
Code

Plug-and-Play Image Restoration with Deep Denoiser Prior

4 code implementations • 31 Aug 2020 • Kai Zhang, Yawei Li, WangMeng Zuo, Lei Zhang, Luc van Gool, Radu Timofte

Recent works on plug-and-play image restoration have shown that a denoiser can implicitly serve as the image prior for model-based methods to solve many inverse problems.

Deblurring Demosaicking +1

598

Paper
Code

Learning Deep CNN Denoiser Prior for Image Restoration

2 code implementations • CVPR 2017 • Kai Zhang, WangMeng Zuo, Shuhang Gu, Lei Zhang

Recent works have revealed that, with the aid of variable splitting techniques, denoiser prior can be plugged in as a modular part of model-based optimization methods to solve other inverse problems (e. g., deblurring).

Ranked #1 on Color Image Denoising on BSD68 sigma5

Color Image Denoising Deblurring +2

578

Paper
Code

One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer

1 code implementation • CVPR 2023 • Jing Lin, Ailing Zeng, Haoqian Wang, Lei Zhang, Yu Li

It is challenging to perform this task with a single network due to resolution issues, i. e., the face and hands are usually located in extremely small regions.

Ranked #3 on 3D Human Pose Estimation on UBody

3D Human Pose Estimation 3D Human Reconstruction +1

568

Paper
Code

Toward Convolutional Blind Denoising of Real Photographs

3 code implementations • CVPR 2019 • Shi Guo, Zifei Yan, Kai Zhang, WangMeng Zuo, Lei Zhang

While deep convolutional neural networks (CNNs) have achieved impressive success in image denoising with additive white Gaussian noise (AWGN), their performance remains limited on real-world noisy photographs.

Ranked #4 on Denoising on Darmstadt Noise Dataset

Image Denoising Noise Estimation

492

Paper
Code

ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation

1 code implementation • ICCV 2023 • Yuxiang Wei, Yabo Zhang, Zhilong Ji, Jinfeng Bai, Lei Zhang, WangMeng Zuo

In addition to the unprecedented ability in imaginary creation, large text-to-image models are expected to take customized concepts in image generation.

Text-to-Image Generation

479

Paper
Code

Image-Adaptive YOLO for Object Detection in Adverse Weather Conditions

1 code implementation • 15 Dec 2021 • Wenyu Liu, Gaofeng Ren, Runsheng Yu, Shi Guo, Jianke Zhu, Lei Zhang

Though deep learning-based object detection methods have achieved promising results on the conventional datasets, it is still challenging to locate objects from the low-quality images captured in adverse weather conditions.

Image Enhancement object-detection +1

467

Paper
Code

Improving Nighttime Driving-Scene Segmentation via Dual Image-adaptive Learnable Filters

2 code implementations • 4 Jul 2022 • Wenyu Liu, Wentong Li, Jianke Zhu, Miaomiao Cui, Xuansong Xie, Lei Zhang

With DIAL-Filters, we design both unsupervised and supervised frameworks for nighttime driving-scene segmentation, which can be trained in an end-to-end manner.

Autonomous Driving Scene Segmentation +1

467

Paper
Code

FFDNet: Toward a Fast and Flexible Solution for CNN based Image Denoising

7 code implementations • 11 Oct 2017 • Kai Zhang, WangMeng Zuo, Lei Zhang

Due to the fast inference and good performance, discriminative learning methods have been widely studied in image denoising.

Ranked #1 on Grayscale Image Denoising on BSD68 sigma75

Color Image Denoising Image Denoising

441

Paper
Code

Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset

1 code implementation • NeurIPS 2023 • Jing Lin, Ailing Zeng, Shunlin Lu, Yuanhao Cai, Ruimao Zhang, Haoqian Wang, Lei Zhang

In this paper, we present Motion-X, a large-scale 3D expressive whole-body motion dataset.

Human Mesh Recovery text annotation

427

Paper
Code

Learning a Single Convolutional Super-Resolution Network for Multiple Degradations

1 code implementation • CVPR 2018 • Kai Zhang, WangMeng Zuo, Lei Zhang

Recent years have witnessed the unprecedented success of deep convolutional neural networks (CNNs) in single image super-resolution (SISR).

Ranked #27 on Video Super-Resolution on MSU Video Super Resolution Benchmark: Detail Restoration

Image Super-Resolution Video Super-Resolution

420

Paper
Code

AIM 2019 Challenge on Constrained Super-Resolution: Methods and Results

2 code implementations • 4 Nov 2019 • Kai Zhang, Shuhang Gu, Radu Timofte, Zheng Hui, Xiumei Wang, Xinbo Gao, Dongliang Xiong, Shuai Liu, Ruipeng Gang, Nan Nan, Chenghua Li, Xueyi Zou, Ning Kang, Zhan Wang, Hang Xu, Chaofeng Wang, Zheng Li, Lin-Lin Wang, Jun Shi, Wenyu Sun, Zhiqiang Lang, Jiangtao Nie, Wei Wei, Lei Zhang, Yazhe Niu, Peijin Zhuo, Xiangzhen Kong, Long Sun, Wenhao Wang

The challenge had 3 tracks.

Image Super-Resolution

416

Paper
Code

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

1 code implementation • CVPR 2021 • Jie Liang, Hui Zeng, Lei Zhang

Existing image-to-image translation (I2IT) methods are either constrained to low-resolution images or long inference time due to their heavy computational burden on the convolution of high-resolution feature maps.

Ranked #1 on Photo Retouching on MIT-Adobe 5k (1080p)

4k Attribute +4

407

Paper
Code

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

3 code implementations • ICCV 2021 • Pengchuan Zhang, Xiyang Dai, Jianwei Yang, Bin Xiao, Lu Yuan, Lei Zhang, Jianfeng Gao

This paper presents a new Vision Transformer (ViT) architecture Multi-Scale Vision Longformer, which significantly enhances the ViT of \cite{dosovitskiy2020image} for encoding high-resolution images using two techniques.

Ranked #45 on Instance Segmentation on COCO minival

Image Classification Instance Segmentation +2

403

Paper
Code

Box2Mask: Box-supervised Instance Segmentation via Level-set Evolution

2 code implementations • 3 Dec 2022 • Wentong Li, Wenyu Liu, Jianke Zhu, Miaomiao Cui, Risheng Yu, Xiansheng Hua, Lei Zhang

In contrast to fully supervised methods using pixel-wise mask labels, box-supervised instance segmentation takes advantage of simple box annotations, which has recently attracted increasing research attention.

Ranked #1 on Box-supervised Instance Segmentation on PASCAL VOC 2012 val

Box-supervised Instance Segmentation Segmentation

399

Paper
Code

Query2Label: A Simple Transformer Way to Multi-Label Classification

2 code implementations • 22 Jul 2021 • Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, Jun Zhu

The use of Transformer is rooted in the need of extracting local discriminative features adaptively for different labels, which is a strongly desired property due to the existence of multiple objects in one image.

Ranked #1 on Multi-Label Classification on PASCAL VOC 2012

Classification Multi-Label Classification

381

Paper
Code

Image Scene Graph Generation (SGG) Benchmark

1 code implementation • 27 Jul 2021 • Xiaotian Han, Jianwei Yang, Houdong Hu, Lei Zhang, Jianfeng Gao, Pengchuan Zhang

There is a surge of interest in image scene graph generation (object, attribute and relationship detection) due to the need of building fine-grained image understanding models that go beyond object detection.

Attribute Graph Generation +6

375

Paper
Code

Progressive Semantic-Aware Style Transformation for Blind Face Restoration

1 code implementation • CVPR 2021 • Chaofeng Chen, Xiaoming Li, Lingbo Yang, Xianhui Lin, Lei Zhang, Kwan-Yee K. Wong

Compared with previous networks, the proposed PSFR-GAN makes full use of the semantic (parsing maps) and pixel (LQ images) space information from different scales of input pairs.

Ranked #4 on Blind Face Restoration on CelebA-Test

Blind Face Restoration Face Parsing +2

366

Paper
Code

Improving the Stability of Diffusion Models for Content Consistent Super-Resolution

1 code implementation • 30 Dec 2023 • Lingchen Sun, Rongyuan Wu, Zhengqiang Zhang, Hongwei Yong, Lei Zhang

To improve the stability of diffusion prior-based SR, we propose to employ the diffusion models to refine image structures, while employing the generative adversarial training to enhance image fine details.

Image Super-Resolution Text-to-Image Generation

329

Paper
Code

MicroNet: Improving Image Recognition with Extremely Low FLOPs

1 code implementation • ICCV 2021 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos

This paper aims at addressing the problem of substantial performance degradation at extremely low computational cost (e. g. 5M FLOPs on ImageNet classification).

328

Paper
Code

Feature Pyramid and Hierarchical Boosting Network for Pavement Crack Detection

1 code implementation • 18 Jan 2019 • Fan Yang, Lei Zhang, Sijia Yu, Danil Prokhorov, Xue Mei, Haibin Ling

To demonstrate the superiority and generality of the proposed method, we evaluate the proposed method on five crack datasets and compare it with state-of-the-art crack detection, edge detection, semantic segmentation methods.

Edge Detection Semantic Segmentation

313

Paper
Code

OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering

1 code implementation • CVPR 2023 • Zhiyuan Ma, Xiangyu Zhu, GuoJun Qi, Zhen Lei, Lei Zhang

In this paper, we propose One-shot Talking face Avatar (OTAvatar), which constructs face avatars by a generalized controllable tri-plane rendering solution so that each personalized avatar can be constructed from only one portrait as the reference.

285

Paper
Code

Object-driven Text-to-Image Synthesis via Adversarial Training

1 code implementation • CVPR 2019 • Wenbo Li, Pengchuan Zhang, Lei Zhang, Qiuyuan Huang, Xiaodong He, Siwei Lyu, Jianfeng Gao

In this paper, we propose Object-driven Attentive Generative Adversarial Newtorks (Obj-GANs) that allow object-centered text-to-image synthesis for complex scenes.

Image Generation Object

283

Paper
Code

SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution

1 code implementation • 27 Nov 2023 • Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, Lei Zhang

First, we train a degradation-aware prompt extractor, which can generate accurate soft and hard semantic prompts even under strong degradation.

Image Super-Resolution

283

Paper
Code

Semi-supervised Left Atrium Segmentation with Mutual Consistency Training

3 code implementations • 4 Mar 2021 • Yicheng Wu, Minfeng Xu, ZongYuan Ge, Jianfei Cai, Lei Zhang

Such mutual consistency encourages the two decoders to have consistent and low-entropy predictions and enables the model to gradually capture generalized features from these unlabeled challenging regions.

Image Segmentation Left Atrium Segmentation +4

272

Paper
Code

Deep CNNs Meet Global Covariance Pooling: Better Representation and Generalization

3 code implementations • 15 Apr 2019 • Qilong Wang, Jiangtao Xie, WangMeng Zuo, Lei Zhang, Peihua Li

The proposed methods are highly modular, readily plugged into existing deep CNNs.

Ranked #1 on Image Classification on iNaturalist (Top 3 Error metric)

Fine-Grained Visual Recognition General Classification +3

268

Paper
Code

Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution

2 code implementations • CVPR 2022 • Jie Liang, Hui Zeng, Lei Zhang

In this paper, we demonstrate that it is possible to train a GAN-based SISR model which can stably generate perceptually realistic details while inhibiting visual artifacts.

Image Super-Resolution

250

Paper
Code

HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation

1 code implementation • ICCV 2023 • Xuan Ju, Ailing Zeng, Chenchen Zhao, Jianan Wang, Lei Zhang, Qiang Xu

While such a plug-and-play approach is appealing, the inevitable and uncertain conflicts between the original images produced from the frozen SD branch and the given condition incur significant challenges for the learnable branch, which essentially conducts image feature editing for condition enforcement.

Denoising Image Generation

249

Paper
Code

PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency

1 code implementation • CVPR 2021 • Jie Liang, Hui Zeng, Miaomiao Cui, Xuansong Xie, Lei Zhang

HRP requires that more attention should be paid to human regions, while GLC requires that a group of portrait photos should be retouched to a consistent tone.

Photo Retouching

242

Paper
Code

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

1 code implementation • 5 Dec 2023 • Hao Zhang, Hongyang Li, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao, Lei Zhang, Chunyuan Li, Jianwei Yang

To address this issue, we have created GVC data that allows for the combination of grounding and chat capabilities.

234

Paper
Code

UniPose: Detecting Any Keypoints

1 code implementation • 12 Oct 2023 • Jie Yang, Ailing Zeng, Ruimao Zhang, Lei Zhang

This work proposes a unified framework called UniPose to detect keypoints of any articulated (e. g., human and animal), rigid, and soft objects via visual or textual prompts for fine-grained vision understanding and manipulation.

Ranked #1 on 2D Human Pose Estimation on Human-Art (using extra training data)

2D Human Pose Estimation 2D Pose Estimation +4

233

Paper
Code

Real-world Noisy Image Denoising: A New Benchmark

2 code implementations • 7 Apr 2018 • Jun Xu, Hui Li, Zhetong Liang, David Zhang, Lei Zhang

In order to promote the study on this problem while implementing the concurrent real-world image denoising datasets, we construct a new benchmark dataset which contains comprehensive real-world noisy images of different natural scenes.

Image Denoising

231

Paper
Code

Unsupervised Pre-training for Person Re-identification

1 code implementation • CVPR 2021 • Dengpan Fu, Dongdong Chen, Jianmin Bao, Hao Yang, Lu Yuan, Lei Zhang, Houqiang Li, Dong Chen

In this paper, we present a large scale unlabeled person re-identification (Re-ID) dataset "LUPerson" and make the first attempt of performing unsupervised pre-training for improving the generalization ability of the learned person Re-ID feature representation.

Ranked #1 on Person Re-Identification on Market-1501 (using extra training data)

Data Augmentation Person Re-Identification +1

217

Paper
Code

Large-Scale Pre-training for Person Re-identification with Noisy Labels

2 code implementations • CVPR 2022 • Dengpan Fu, Dongdong Chen, Hao Yang, Jianmin Bao, Lu Yuan, Lei Zhang, Houqiang Li, Fang Wen, Dong Chen

Since theses ID labels automatically derived from tracklets inevitably contain noises, we develop a large-scale Pre-training framework utilizing Noisy Labels (PNL), which consists of three learning modules: supervised Re-ID learning, prototype-based contrastive learning, and label-guided contrastive learning.

Ranked #7 on Person Re-Identification on CUHK03

Contrastive Learning Multi-Object Tracking +3

217

Paper
Code

Variational Denoising Network: Toward Blind Noise Modeling and Removal

2 code implementations • NeurIPS 2019 • Zongsheng Yue, Hongwei Yong, Qian Zhao, Lei Zhang, Deyu Meng

On one hand, as other data-driven deep learning methods, our method, namely variational denoising network (VDN), can perform denoising efficiently due to its explicit form of posterior expression.

Ranked #10 on Image Denoising on DND

Image Denoising Noise Estimation +1

211

Paper
Code

Deep Variational Network Toward Blind Image Restoration

2 code implementations • 25 Aug 2020 • Zongsheng Yue, Hongwei Yong, Qian Zhao, Lei Zhang, Deyu Meng, Kwan-Yen K. Wong

In this proposed model, a pixel-wise non-i. i. d.

Image Deblocking Image Denoising +3

211

Paper
Code

Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes

1 code implementation • CVPR 2023 • Xuan Ju, Ailing Zeng, Jianan Wang, Qiang Xu, Lei Zhang

Humans have long been recorded in a variety of forms since antiquity.

3D Human Pose Estimation Human Detection +1

189

Paper
Code

Efficient Long-Range Attention Network for Image Super-resolution

1 code implementation • 13 Mar 2022 • Xindong Zhang, Hui Zeng, Shi Guo, Lei Zhang

A highly efficient long-range attention block (ELAB) is then built by simply cascading two shift-conv with a GMSA module, which is further accelerated by using a shared attention mechanism.

Ranked #11 on Image Super-Resolution on Manga109 - 4x upscaling

Image Super-Resolution

188

Paper
Code

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds

1 code implementation • CVPR 2022 • Chenhang He, Ruihuang Li, Shuai Li, Lei Zhang

VoxSeT is built upon a voxel-based set attention (VSA) module, which reduces the self-attention in each voxel by two cross-attentions and models features in a hidden space induced by a group of latent codes.

3D Object Detection object-detection

183

Paper
Code

A PID Controller Approach for Stochastic Optimization of Deep Networks

3 code implementations • CVPR 2018 • Wangpeng An, Haoqian Wang, Qingyun Sun, Jun Xu, Qionghai Dai, Lei Zhang

We first reveal the intrinsic connections between SGD-Momentum and PID based controller, then present the optimization algorithm which exploits the past, current, and change of gradients to update the network parameters.

Stochastic Optimization

182

Paper
Code

Box-supervised Instance Segmentation with Level Set Evolution

1 code implementation • 19 Jul 2022 • Wentong Li, Wenyu Liu, Jianke Zhu, Miaomiao Cui, Xiansheng Hua, Lei Zhang

A simple mask supervised SOLOv2 model is adapted to predict the instance-aware mask map as the level set for each instance.

Box-supervised Instance Segmentation Segmentation

181

Paper
Code

Detection Transformer with Stable Matching

1 code implementation • ICCV 2023 • Shilong Liu, Tianhe Ren, Jiayu Chen, Zhaoyang Zeng, Hao Zhang, Feng Li, Hongyang Li, Jun Huang, Hang Su, Jun Zhu, Lei Zhang

We point out that the unstable matching in DETR is caused by a multi-optimization path problem, which is highlighted by the one-to-one matching design in DETR.

Position

177

Paper
Code

Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR

1 code implementation • 13 Mar 2023 • Feng Li, Ailing Zeng, Shilong Liu, Hao Zhang, Hongyang Li, Lei Zhang, Lionel M. Ni

Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance.

object-detection Object Detection

176

Paper
Code

Dual Adversarial Network: Toward Real-world Noise Removal and Noise Generation

2 code implementations • ECCV 2020 • Zongsheng Yue, Qian Zhao, Lei Zhang, Deyu Meng

Specifically, we approximate the joint distribution with two different factorized forms, which can be formulated as a denoiser mapping the noisy image to the clean one and a generator mapping the clean image to the noisy one.

Ranked #2 on Noise Estimation on SIDD

Image Denoising Noise Estimation

170

Paper
Code

Bridging the Gap between Spatial and Spectral Domains: A Unified Framework for Graph Neural Networks

1 code implementation • 21 Jul 2021 • Zhiqian Chen, Fanglan Chen, Lei Zhang, Taoran Ji, Kaiqun Fu, Liang Zhao, Feng Chen, Lingfei Wu, Charu Aggarwal, Chang-Tien Lu

Deep learning's performance has been extensively recognized recently.

Image Classification Natural Language Understanding +1

165

Paper
Code

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution

1 code implementation • CVPR 2022 • jianqi ma, Zhetong Liang, Lei Zhang

The semantics of the text are firstly extracted by a text recognition module as text prior information.

Image Super-Resolution SSIM

156

Paper
Code

Suppress and Balance: A Simple Gated Network for Salient Object Detection

3 code implementations • ECCV 2020 • Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu, Lei Zhang

With the help of multilevel gate units, the valuable context information from the encoder can be optimally transmitted to the decoder.

Ranked #15 on Dichotomous Image Segmentation on DIS-TE4

Dichotomous Image Segmentation object-detection +1

155

Paper
Code

Towards Diverse Binary Segmentation via A Simple yet General Gated Network

1 code implementation • 18 Mar 2023 • Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu, Lei Zhang

They ignore two key problems when the encoder exchanges information with the decoder: one is the lack of interference control mechanism between them, the other is without considering the disparity of the contributions from different encoder levels.

Segmentation Semantic Segmentation

155

Paper
Code

Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization

1 code implementation • CVPR 2022 • Yabin Zhang, Minghan Li, Ruihuang Li, Kui Jia, Lei Zhang

In this work, we, for the first time to our best knowledge, propose to perform Exact Feature Distribution Matching (EFDM) by exactly matching the empirical Cumulative Distribution Functions (eCDFs) of image features, which could be implemented by applying the Exact Histogram Matching (EHM) in the image feature space.

Domain Generalization Style Transfer

154

Paper
Code

Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking

1 code implementation • CVPR 2018 • Feng Li, Cheng Tian, WangMeng Zuo, Lei Zhang, Ming-Hsuan Yang

Compared with SRDCF, STRCF with hand-crafted features provides a 5 times speedup and achieves a gain of 5. 4% and 3. 6% AUC score on OTB-2015 and Temple-Color, respectively.

Ranked #9 on Visual Object Tracking on VOT2017/18

Visual Object Tracking Visual Tracking

147

Paper
Code

Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation

3 code implementations • 3 Feb 2023 • Jie Yang, Ailing Zeng, Shilong Liu, Feng Li, Ruimao Zhang, Lei Zhang

This paper presents a novel end-to-end framework with Explicit box Detection for multi-person Pose estimation, called ED-Pose, where it unifies the contextual learning between human-level (global) and keypoint-level (local) information.

Ranked #2 on 2D Human Pose Estimation on Human-Art

2D Human Pose Estimation Human Detection +3

138

Paper
Code

A Dual Weighting Label Assignment Scheme for Object Detection

1 code implementation • CVPR 2022 • Shuai Li, Chenhang He, Ruihuang Li, Lei Zhang

Existing LA methods mostly focus on the design of pos weighting function, while the neg weight is directly derived from the pos weight.

Object object-detection +2

136

Paper
Code

Text Prior Guided Scene Text Image Super-resolution

1 code implementation • 29 Jun 2021 • jianqi ma, Shi Guo, Lei Zhang

Our experiments on the benchmark TextZoom dataset show that TPGSR can not only effectively improve the visual quality of scene text images, but also significantly improve the text recognition accuracy over existing STISR methods.

Image Super-Resolution

126

Paper
Code

Deep Convolutional Dictionary Learning for Image Denoising

1 code implementation • CVPR 2021 • Hongyi Zheng, Hongwei Yong, Lei Zhang

Inspired by the great success of deep neural networks (DNNs), many unfolding methods have been proposed to integrate traditional image modeling techniques, such as dictionary learning (DicL) and sparse coding, into DNNs for image restoration.

Dictionary Learning Image Denoising +2

124

Paper
Code

Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution

1 code implementation • 27 Mar 2022 • Jie Liang, Hui Zeng, Lei Zhang

Specifically, a tiny regression network is employed to predict the degradation parameters of the input image, while several convolutional experts with the same topology are jointly optimized to specify the network parameters via a non-linear mixture of experts.

Image Super-Resolution

122

Paper
Code

UniVS: Unified and Universal Video Segmentation with Prompts as Queries

1 code implementation • 28 Feb 2024 • Minghan Li, Shuai Li, Xindong Zhang, Lei Zhang

Despite the recent advances in unified image segmentation (IS), developing a unified video segmentation (VS) model remains a challenge.

Ranked #2 on Video Semantic Segmentation on VSPW (using extra training data)

Referring Expression Segmentation Referring Video Object Segmentation +6

119

Paper
Code

A Probabilistic Quality Representation Approach to Deep Blind Image Quality Prediction

1 code implementation • 28 Aug 2017 • Hui Zeng, Lei Zhang, Alan C. Bovik

Recognizing this, we propose a new representation of perceptual image quality, called probabilistic quality representation (PQR), to describe the image subjective score distribution, whereby a more robust loss function can be employed to train a deep BIQA model.

Blind Image Quality Assessment regression

114

Paper
Code

Learning Dual Memory Dictionaries for Blind Face Restoration

1 code implementation • 15 Oct 2022 • Xiaoming Li, Shiguang Zhang, Shangchen Zhou, Lei Zhang, WangMeng Zuo

Generally, it is a challenging and intractable task to improve the photo-realistic performance of blind restoration and adaptively handle the generic and specific restoration scenarios with a single unified model.

Blind Face Restoration

112

Paper
Code

Grid Anchor based Image Cropping: A New Benchmark and An Efficient Model

1 code implementation • 18 Sep 2019 • Hui Zeng, Lida Li, Zisheng Cao, Lei Zhang

The employed evaluation metrics such as intersection-over-union cannot reliably reflect the real performance of a cropping model, either.

Image Cropping

111

Paper
Code

Reliable and Efficient Image Cropping: A Grid Anchor based Approach

1 code implementation • CVPR 2019 • Hui Zeng, Lida Li, Zisheng Cao, Lei Zhang

Consequently, a grid anchor based cropping benchmark is constructed, where all crops of each image are annotated and more reliable evaluation metrics are defined.

Image Cropping

110

Paper
Code

Real-World Video Super-Resolution: A Benchmark Dataset and a Decomposition Based Learning Scheme

1 code implementation • ICCV 2021 • Xi Yang, Wangmeng Xiang, Hui Zeng, Lei Zhang

Existing VSR methods are mostly trained and evaluated on synthetic datasets, where the LR videos are uniformly downsampled from their high-resolution (HR) counterparts by some simple operators (e. g., bicubic downsampling).

Video Super-Resolution

108

Paper
Code

Human Guided Ground-truth Generation for Realistic Image Super-resolution

1 code implementation • CVPR 2023 • Du Chen, Jie Liang, Xindong Zhang, Ming Liu, Hui Zeng, Lei Zhang

A human guided GT image dataset with both positive and negative samples is then constructed, and a loss function is proposed to train the Real-ISR models.

Image Enhancement Image Super-Resolution

108

Paper
Code

Multi-channel Reverse Dictionary Model

1 code implementation • 18 Dec 2019 • Lei Zhang, Fanchao Qi, Zhiyuan Liu, Yasheng Wang, Qun Liu, Maosong Sun

A reverse dictionary takes the description of a target word as input and outputs the target word together with other words that match the description.

Reverse Dictionary Sentence

106

Paper
Code

MP-Former: Mask-Piloted Transformer for Image Segmentation

1 code implementation • CVPR 2023 • Hao Zhang, Feng Li, Huaizhe xu, Shijia Huang, Shilong Liu, Lionel M. Ni, Lei Zhang

We present a mask-piloted Transformer which improves masked-attention in Mask2Former for image segmentation.

Image Segmentation Segmentation +1

106

Paper
Code

From Face to Natural Image: Learning Real Degradation for Blind Image Super-Resolution

1 code implementation • 3 Oct 2022 • Xiaoming Li, Chaofeng Chen, Xianhui Lin, WangMeng Zuo, Lei Zhang

Notably, LQ face images, which may have the same degradation process as natural images, can be robustly restored with photo-realistic textures by exploiting their strong structural priors.

Image Generation Image Super-Resolution

105

Paper
Code

Deep Adaptive Inference Networks for Single Image Super-Resolution

1 code implementation • 8 Apr 2020 • Ming Liu, Zhilu Zhang, Liya Hou, WangMeng Zuo, Lei Zhang

Nonetheless, content and resource adaptive model is more preferred, and it is encouraging to apply simpler and efficient networks to the easier regions with less details and the scenarios with restricted efficiency constraints.

Image Super-Resolution

Paper
Code

Dense Learning based Semi-Supervised Object Detection

1 code implementation • CVPR 2022 • Binghui Chen, Pengyu Li, Xiang Chen, Biao Wang, Lei Zhang, Xian-Sheng Hua

Semi-supervised object detection (SSOD) aims to facilitate the training and deployment of object detectors with the help of a large amount of unlabeled data.

Object object-detection +2

Paper
Code

HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset

1 code implementation • ICCV 2021 • GuanYing Chen, Chaofeng Chen, Shi Guo, Zhetong Liang, Kwan-Yee K. Wong, Lei Zhang

Secondly, we conduct more sophisticated alignment and temporal fusion in the feature space of the coarse HDR video to produce better reconstruction.

HDR Reconstruction Optical Flow Estimation +1

Paper
Code

MosAIc: Finding Artistic Connections across Culture with Conditional Image Retrieval

1 code implementation • 14 Jul 2020 • Mark Hamilton, Stephanie Fu, Mindren Lu, Johnny Bui, Darius Bopp, Zhenbang Chen, Felix Tran, Margaret Wang, Marina Rogers, Lei Zhang, Chris Hoder, William T. Freeman

We introduce MosAIc, an interactive web app that allows users to find pairs of semantically related artworks that span different cultures, media, and millennia.

Cultural Vocal Bursts Intensity Prediction Image Retrieval +2

Paper
Code

CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise

3 code implementations • CVPR 2018 • Kuang-Huei Lee, Xiaodong He, Lei Zhang, Linjun Yang

We demonstrate the effectiveness of the proposed algorithm on both of the label noise detection task and the image classification on noisy data task on several large-scale datasets.

Ranked #2 on Image Classification on Food-101N (using extra training data)

Classification General Classification +2

Paper
Code

Blind Image Super-resolution with Elaborate Degradation Modeling on Noise and Kernel

1 code implementation • CVPR 2022 • Zongsheng Yue, Qian Zhao, Jianwen Xie, Lei Zhang, Deyu Meng, Kwan-Yee K. Wong

To address the above issues, this paper proposes a model-based blind SISR method under the probabilistic framework, which elaborately models image degradation from the perspectives of noise and blur kernel.

Image Super-Resolution

Paper
Code

Chasing Sparsity in Vision Transformers: An End-to-End Exploration

1 code implementation • NeurIPS 2021 • Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang

For example, our sparsified DeiT-Small at (5%, 50%) sparsity for (data, architecture), improves 0. 28% top-1 accuracy, and meanwhile enjoys 49. 32% FLOPs and 4. 40% running time savings.

Ranked #20 on Efficient ViTs on ImageNet-1K (with DeiT-T)

Efficient ViTs

Paper
Code

SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

1 code implementation • ICCV 2021 • Jiapeng Tang, Jiabao Lei, Dan Xu, Feiying Ma, Kui Jia, Lei Zhang

To this end, we propose to learn implicit surface reconstruction by sign-agnostic optimization of convolutional occupancy networks, to simultaneously achieve advanced scalability to large-scale scenes, generality to novel shapes, and applicability to raw scans in a unified framework.

Surface Reconstruction

Paper
Code

Unsupervised Multi-Class Domain Adaptation: Theory, Algorithms, and Practice

2 code implementations • 20 Feb 2020 • Yabin Zhang, Bin Deng, Hui Tang, Lei Zhang, Kui Jia

By using MCSD as a measure of domain distance, we develop a new domain adaptation bound for multi-class UDA; its data-dependent, probably approximately correct bound is also developed that naturally suggests adversarial learning objectives to align conditional feature distributions across source and target domains.

Domain Adaptation Multi-class Classification

Paper
Code

Generative Action Description Prompts for Skeleton-based Action Recognition

3 code implementations • ICCV 2023 • Wangmeng Xiang, Chao Li, Yuxuan Zhou, Biao Wang, Lei Zhang

More specifically, we employ a pre-trained large-scale language model as the knowledge engine to automatically generate text descriptions for body parts movements of actions, and propose a multi-modal training scheme by utilizing the text encoder to generate feature vectors for different body parts and supervise the skeleton encoder for action representation learning.

Ranked #5 on Skeleton Based Action Recognition on N-UCLA

Action Recognition Language Modelling +2

Paper
Code

AutoLoc: Weakly-supervised Temporal Action Localization

1 code implementation • 22 Jul 2018 • Zheng Shou, Hang Gao, Lei Zhang, Kazuyuki Miyazawa, Shih-Fu Chang

In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance.

Weakly-supervised Temporal Action Localization Weakly Supervised Temporal Action Localization

Paper
Code

A Benchmark for Edge-Preserving Image Smoothing

1 code implementation • 2 Apr 2019 • Feida Zhu, Zhetong Liang, Xixi Jia, Lei Zhang, Yizhou Yu

This benchmark includes an image dataset with groundtruth image smoothing results as well as baseline algorithms that can generate competitive edge-preserving smoothing results for a wide range of image contents.

image smoothing

Paper
Code

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption

1 code implementation • CVPR 2021 • Zhengyuan Yang, Yijuan Lu, JianFeng Wang, Xi Yin, Dinei Florencio, Lijuan Wang, Cha Zhang, Lei Zhang, Jiebo Luo

Due to this aligned representation learning, even pre-trained on the same downstream task dataset, TAP already boosts the absolute accuracy on the TextVQA dataset by +5. 4%, compared with a non-TAP baseline.

Caption Generation Language Modelling +5

Paper
Code

A Single Stream Network for Robust and Real-time RGB-D Salient Object Detection

1 code implementation • ECCV 2020 • Xiaoqi Zhao, Lihe Zhang, Youwei Pang, Huchuan Lu, Lei Zhang

In this work, we design a single stream network to directly use the depth map to guide early fusion and middle fusion between RGB and depth, which saves the feature encoder of the depth stream and achieves a lightweight and real-time model.

Ranked #15 on Thermal Image Segmentation on RGB-T-Glass-Segmentation

object-detection RGB-D Salient Object Detection +3

Paper
Code

Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation

1 code implementation • CVPR 2022 • Ruihuang Li, Shuai Li, Chenhang He, Yabin Zhang, Xu Jia, Lei Zhang

One popular solution to this challenging task is self-training, which selects high-scoring predictions on target samples as pseudo labels for training.

Ranked #9 on Image-to-Image Translation on SYNTHIA-to-Cityscapes

Segmentation Semantic Segmentation +1

Paper
Code

Learning Domain Adaptive Object Detection with Probabilistic Teacher

2 code implementations • 13 Jun 2022 • Meilin Chen, WeiJie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Yunfeng Yan, Donglian Qi, Yueting Zhuang, Di Xie, ShiLiang Pu

In addition, we conduct anchor adaptation in parallel with localization adaptation, since anchor can be regarded as a learnable parameter.

Object object-detection +1

Paper
Code

Point2Mask: Point-supervised Panoptic Segmentation via Optimal Transport

1 code implementation • ICCV 2023 • Wentong Li, Yuqian Yuan, Song Wang, Jianke Zhu, Jianshu Li, Jian Liu, Lei Zhang

Weakly-supervised image segmentation has recently attracted increasing research attentions, aiming to avoid the expensive pixel-wise labeling.

Image Segmentation Panoptic Segmentation

Paper
Code

MomentDiff: Generative Video Moment Retrieval from Random to Real

1 code implementation • NeurIPS 2023 • Pandeng Li, Chen-Wei Xie, Hongtao Xie, Liming Zhao, Lei Zhang, Yun Zheng, Deli Zhao, Yongdong Zhang

Video moment retrieval pursues an efficient and generalized solution to identify the specific temporal segments within an untrimmed video that correspond to a given language description.

Moment Retrieval Retrieval

Paper
Code

Learning a Wavelet-like Auto-Encoder to Accelerate Deep Neural Networks

2 code implementations • 20 Dec 2017 • Tianshui Chen, Liang Lin, WangMeng Zuo, Xiaonan Luo, Lei Zhang

In this work, aiming at a general and comprehensive way for neural network acceleration, we develop a Wavelet-like Auto-Encoder (WAE) that decomposes the original input image into two low-resolution channels (sub-images) and incorporate the WAE into the classification neural networks for joint training.

Classification General Classification +1

Paper
Code

Neural Interactive Keypoint Detection

1 code implementation • ICCV 2023 • Jie Yang, Ailing Zeng, Feng Li, Shilong Liu, Ruimao Zhang, Lei Zhang

Click-Pose explores how user feedback can cooperate with a neural keypoint detector to correct the predicted keypoints in an interactive way for a faster and more effective annotation process.

Keypoint Detection

Paper
Code

A Benchmark for Chinese-English Scene Text Image Super-resolution

1 code implementation • ICCV 2023 • jianqi ma, Zhetong Liang, Wangmeng Xiang, Xi Yang, Lei Zhang

Scene Text Image Super-resolution (STISR) aims to recover high-resolution (HR) scene text images with visually pleasant and readable text content from the given low-resolution (LR) input.

Image Super-Resolution

Paper
Code

Towards Effective Multiple-in-One Image Restoration: A Sequential and Prompt Learning Strategy

1 code implementation • 7 Jan 2024 • Xiangtao Kong, Chao Dong, Lei Zhang

While single task image restoration (IR) has achieved significant successes, it remains a challenging issue to train a single model which can tackle multiple IR tasks.

Image Restoration

Paper
Code

Sharpness-Aware Gradient Matching for Domain Generalization

1 code implementation • CVPR 2023 • Pengfei Wang, Zhaoxiang Zhang, Zhen Lei, Lei Zhang

In this paper, we present two conditions to ensure that the model could converge to a flat minimum with a small loss, and present an algorithm, named Sharpness-Aware Gradient Matching (SAGM), to meet the two conditions for improving model generalization capability.

Domain Generalization

Paper
Code

MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks

1 code implementation • 28 Apr 2023 • Lei Zhang, Yuge Zhang, Kan Ren, Dongsheng Li, Yuqing Yang

In contrast, though human engineers have the incredible ability to understand tasks and reason about solutions, their experience and knowledge are often sparse and difficult to utilize by quantitative approaches.

AutoML

Paper
Code

When Unsupervised Domain Adaptation Meets Tensor Representations

1 code implementation • ICCV 2017 • Hao Lu, Lei Zhang, Zhiguo Cao, Wei Wei, Ke Xian, Chunhua Shen, Anton Van Den Hengel

Domain adaption (DA) allows machine learning methods trained on data sampled from one distribution to be applied to data sampled from another.

Unsupervised Domain Adaptation

Paper
Code

Label-efficient Segmentation via Affinity Propagation

1 code implementation • NeurIPS 2023 • Wentong Li, Yuqian Yuan, Song Wang, Wenyu Liu, Dongqi Tang, Jian Liu, Jianke Zhu, Lei Zhang

In this work, we formulate the affinity modeling as an affinity propagation process, and propose a local and a global pairwise affinity terms to generate accurate soft pseudo labels.

Box-supervised Instance Segmentation Segmentation +2

Paper
Code

MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences

1 code implementation • CVPR 2023 • Chenhang He, Ruihuang Li, Yabin Zhang, Shuai Li, Lei Zhang

Current top-performing multi-frame detectors mostly follow a Detect-and-Fuse framework, which extracts features from each frame of the sequence and fuses them to detect the objects in the current frame.

3D Object Detection Autonomous Driving +1

Paper
Code

Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution

1 code implementation • 1 Dec 2023 • Xi Yang, Chenhang He, jianqi ma, Lei Zhang

To ensure the content consistency among adjacent frames, we exploit the temporal dynamics in LR videos to guide the diffusion process by optimizing the latent sampling path with a motion-guided loss, ensuring that the generated HR video maintains a coherent and continuous visual flow.

Image Restoration Video Super-Resolution

Paper
Code

PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation

1 code implementation • 23 Jan 2024 • Zhaozhi Xie, Bochen Guan, Weihao Jiang, Muyang Yi, Yue Ding, Hongtao Lu, Lei Zhang

In this paper, we introduce a novel prompt-driven adapter into SAM, namely Prompt Adapter Segment Anything Model (PA-SAM), aiming to enhance the segmentation mask quality of the original SAM.

Image Segmentation Segmentation +1

Paper
Code

DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding

1 code implementation • 28 Nov 2022 • Shilong Liu, Yaoyuan Liang, Feng Li, Shijia Huang, Hao Zhang, Hang Su, Jun Zhu, Lei Zhang

As phrase extraction can be regarded as a $1$D text segmentation problem, we formulate PEG as a dual detection problem and propose a novel DQ-DETR model, which introduces dual queries to probe different features from image and text for object prediction and phrase mask prediction.

Ranked #7 on Referring Expression Comprehension on RefCOCO

object-detection Object Detection +4

Paper
Code

Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition

1 code implementation • 27 Jul 2022 • Wangmeng Xiang, Chao Li, Biao Wang, Xihan Wei, Xian-Sheng Hua, Lei Zhang

For 3D video-based tasks such as action recognition, however, directly applying spatiotemporal transformers on video data will bring heavy computation and memory burdens due to the largely increased number of patches and the quadratic complexity of self-attention computation.

Ranked #9 on Action Recognition on Diving-48

Action Classification Action Recognition

Paper
Code

Joint HDR Denoising and Fusion: A Real-World Mobile HDR Image Dataset

1 code implementation • CVPR 2023 • Shuaizheng Liu, Xindong Zhang, Lingchen Sun, Zhetong Liang, Hui Zeng, Lei Zhang

In this work, we develop, for the first time to our best knowledge, an HDR image dataset by using mobile phone cameras, namely Mobile-HDR dataset.

Denoising

Paper
Code

Neural Architecture Search With Representation Mutual Information

1 code implementation • CVPR 2022 • Xiawu Zheng, Xiang Fei, Lei Zhang, Chenglin Wu, Fei Chao, Jianzhuang Liu, Wei Zeng, Yonghong Tian, Rongrong Ji

Building upon RMI, we further propose a new search algorithm termed RMI-NAS, facilitating with a theorem to guarantee the global optimal of the searched architecture.

Neural Architecture Search

Paper
Code

WSOD^2: Learning Bottom-up and Top-down Objectness Distillation for Weakly-supervised Object Detection

1 code implementation • 11 Sep 2019 • Zhaoyang Zeng, Bei Liu, Jianlong Fu, Hongyang Chao, Lei Zhang

We study on weakly-supervised object detection (WSOD) which plays a vital role in relieving human involvement from object-level annotations.

Object object-detection +3

Paper
Code

Conditional Directed Graph Convolution for 3D Human Pose Estimation

1 code implementation • 16 Jul 2021 • WenBo Hu, Changgong Zhang, Fangneng Zhan, Lei Zhang, Tien-Tsin Wong

Based on this representation, we further propose a spatial-temporal conditional directed graph convolution to leverage varying non-local dependence for different poses by conditioning the graph topology on input poses.

Ranked #15 on 3D Human Pose Estimation on MPI-INF-3DHP

3D Human Pose Estimation

Paper
Code

A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift

1 code implementation • CVPR 2022 • Shi Guo, Xi Yang, jianqi ma, Gaofeng Ren, Lei Zhang

Denoising and demosaicking are two essential steps to reconstruct a clean full-color image from the raw data.

Ranked #1 on Joint Demosaicing and Denoising on 5 frames with input 2160x3840

4k Burst Image Reconstruction +4

Paper
Code

One Shot Learning as Instruction Data Prospector for Large Language Models

1 code implementation • 16 Dec 2023 • Yunshui Li, Binyuan Hui, Xiaobo Xia, Jiaxi Yang, Min Yang, Lei Zhang, Shuzheng Si, Junhao Liu, Tongliang Liu, Fei Huang, Yongbin Li

Nuggets assesses the potential of individual instruction examples to act as effective one shot examples, thereby identifying those that can significantly enhance diverse task performance.

One-Shot Learning

Paper
Code

Learning Symmetry Consistent Deep CNNs for Face Completion

1 code implementation • 19 Dec 2018 • Xiaoming Li, Ming Liu, Jieru Zhu, WangMeng Zuo, Meng Wang, Guosheng Hu, Lei Zhang

As for missing pixels on both of half-faces, we present a generative reconstruction subnet together with a perceptual symmetry loss to enforce symmetry consistency of recovered structures.

Ranked #1 on Facial Inpainting on VggFace2

Face Recognition Facial Inpainting

Paper
Code

One-to-Few Label Assignment for End-to-End Dense Detection

1 code implementation • CVPR 2023 • Shuai Li, Minghan Li, Ruihuang Li, Chenhang He, Lei Zhang

The positive and negative weights of these soft anchors are dynamically adjusted during training so that they can contribute more to ``representation learning'' in the early training stage, and contribute more to ``duplicated prediction removal'' in the later stage.

Representation Learning

Paper
Code

DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training

1 code implementation • CVPR 2023 • Yihao Chen, Xianbiao Qi, Jianan Wang, Lei Zhang

In this way, we can reduce the GPU memory consumption of contrastive loss computation from $\bigO(B^2)$ to $\bigO(\frac{B^2}{N})$, where $B$ and $N$ are the batch size and the number of GPUs used for training.

Contrastive Learning

Paper
Code

Attention Diversification for Domain Generalization

1 code implementation • 9 Oct 2022 • Rang Meng, Xianfeng Li, WeiJie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Mingli Song, Di Xie, ShiLiang Pu

Under this guidance, a novel Attention Diversification framework is proposed, in which Intra-Model and Inter-Model Attention Diversification Regularization are collaborated to reassign appropriate attention to diverse task-related features.

Domain Generalization

Paper
Code

Simultaneous Fidelity and Regularization Learning for Image Restoration

1 code implementation • 12 Apr 2018 • Dongwei Ren, WangMeng Zuo, David Zhang, Lei Zhang, Ming-Hsuan Yang

For blind deconvolution, as estimation error of blur kernel is usually introduced, the subsequent non-blind deconvolution process does not restore the latent image well.

Denoising Image Deconvolution +1

Paper
Code

Joint Denoising and Demosaicking with Green Channel Prior for Real-world Burst Images

1 code implementation • 25 Jan 2021 • Shi Guo, Zhetong Liang, Lei Zhang

Considering the fact that the green channel has twice the sampling rate and better quality than the red and blue channels in CFA raw data, we propose to use this green channel prior (GCP) to build a GCP-Net for the JDD-B task.

Demosaicking Denoising +1

Paper
Code

Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer

1 code implementation • ECCV 2020 • Yuanyi Zhong, Jian-Feng Wang, Jian Peng, Lei Zhang

In this paper, we propose an effective knowledge transfer framework to boost the weakly supervised object detection accuracy with the help of an external fully-annotated source dataset, whose categories may not overlap with the target domain.

Object object-detection +2

Paper
Code

Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model

1 code implementation • 20 May 2023 • Jie Yang, Bingliang Li, Fengyu Yang, Ailing Zeng, Lei Zhang, Ruimao Zhang

Extensive experiments demonstrate that DiffHOI significantly outperforms the state-of-the-art in regular detection (i. e., 41. 50 mAP) and zero-shot detection.

Ranked #2 on Zero-Shot Human-Object Interaction Detection on HICO-DET (using extra training data)

Human-Object Interaction Detection Zero-Shot Human-Object Interaction Detection

Paper
Code

SEED: Self-supervised Distillation For Visual Representation

1 code implementation • ICLR 2021 • Zhiyuan Fang, JianFeng Wang, Lijuan Wang, Lei Zhang, Yezhou Yang, Zicheng Liu

This paper is concerned with self-supervised learning for small models.

Knowledge Distillation Self-Supervised Learning +1

Paper
Code

Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

1 code implementation • CVPR 2021 • Minghan Li, Shuai Li, Lida Li, Lei Zhang

To further explore temporal correlation among video frames, we aggregate a temporal fusion module to infer instance masks from each frame to its adjacent frames, which helps our framework to handle challenging videos such as motion blur, partial occlusion and unusual object-to-camera poses.

Ranked #24 on Video Instance Segmentation on YouTube-VIS 2021

Instance Segmentation Segmentation +3

Paper
Code

ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention

1 code implementation • 1 Jan 2024 • Chenhang He, Ruihuang Li, Guowen Zhang, Lei Zhang

Window-based transformers have demonstrated strong ability in large-scale point cloud understanding by capturing context-aware representations with affordable attention computation in a more localized manner.

Blocking

Paper
Code

SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance Segmentation

1 code implementation • CVPR 2023 • Ruihuang Li, Chenhang He, Yabin Zhang, Shuai Li, Liyi Chen, Lei Zhang

Weakly supervised instance segmentation using only bounding box annotations has recently attracted much research attention.

Box-supervised Instance Segmentation Segmentation +2

Paper
Code

TMP: Temporal Motion Propagation for Online Video Super-Resolution

1 code implementation • 15 Dec 2023 • Zhengqiang Zhang, Ruihuang Li, Shi Guo, Yang Cao, Lei Zhang

Online video super-resolution (online-VSR) highly relies on an effective alignment module to aggregate temporal information, while the strict latency requirement makes accurate and efficient alignment very challenging.

Video Super-Resolution

Paper
Code

PMFSNet: Polarized Multi-scale Feature Self-attention Network For Lightweight Medical Image Segmentation

1 code implementation • 15 Jan 2024 • Jiahui Zhong, Wenhong Tian, Yuanlun Xie, Zhijia Liu, Jie Ou, Taoran Tian, Lei Zhang

In this work, we propose PMFSNet, a novel medical imaging segmentation model that effectively balances global and local feature processing while avoiding the computational redundancy typical in larger models.

Image Segmentation Medical Image Segmentation +2

Paper
Code

REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning

1 code implementation • IJCNLP 2019 • Ming Jiang, Junjie Hu, Qiuyuan Huang, Lei Zhang, Jana Diesner, Jianfeng Gao

In this study, we present a fine-grained evaluation method REO for automatically measuring the performance of image captioning systems.

Image Captioning

Paper
Code

TIGEr: Text-to-Image Grounding for Image Caption Evaluation

1 code implementation • IJCNLP 2019 • Ming Jiang, Qiuyuan Huang, Lei Zhang, Xin Wang, Pengchuan Zhang, Zhe Gan, Jana Diesner, Jianfeng Gao

This paper presents a new metric called TIGEr for the automatic evaluation of image captioning systems.

Image Captioning Text Matching

Paper
Code

Unsupervised Adaptation Learning for Hyperspectral Imagery Super-Resolution

1 code implementation • CVPR 2020 • Lei Zhang, Jiangtao Nie, Wei Wei, Yanning Zhang, Shengcai Liao, Ling Shao

Following this idea, we develop a two-stage SR network that leverages two consecutive modules: a fusion module and an adaptation module, to recover the latent HSI in a coarse-to-fine scheme.

Super-Resolution

Paper
Code

Glocal Energy-based Learning for Few-Shot Open-Set Recognition

1 code implementation • CVPR 2023 • Haoyu Wang, Guansong Pang, Peng Wang, Lei Zhang, Wei Wei, Yanning Zhang

Few-shot open-set recognition (FSOR) is a challenging task of great practical value.

Open Set Learning

Paper
Code

CORE: Cooperative Reconstruction for Multi-Agent Perception

1 code implementation • ICCV 2023 • Binglu Wang, Lei Zhang, Zhaozhong Wang, Yongqiang Zhao, Tianfei Zhou

This paper presents CORE, a conceptually simple, effective and communication-efficient model for multi-agent cooperative perception.

3D Object Detection object-detection +1

Paper
Code

LipsFormer: Introducing Lipschitz Continuity to Vision Transformers

1 code implementation • 19 Apr 2023 • Xianbiao Qi, Jianan Wang, Yihao Chen, Yukai Shi, Lei Zhang

In contrast to previous practical tricks that address training instability by learning rate warmup, layer normalization, attention formulation, and weight initialization, we show that Lipschitz continuity is a more essential property to ensure training stability.

Paper
Code

Learning Parallel Dense Correspondence from Spatio-Temporal Descriptors for Efficient and Robust 4D Reconstruction

1 code implementation • CVPR 2021 • Jiapeng Tang, Dan Xu, Kui Jia, Lei Zhang

This paper focuses on the task of 4D shape reconstruction from a sequence of point clouds.

4D reconstruction

Paper
Code

SwinFuse: A Residual Swin Transformer Fusion Network for Infrared and Visible Images

1 code implementation • 25 Apr 2022 • Zhishe Wang, Yanlin Chen, Wenyu Shao, Hui Li, Lei Zhang

The existing deep learning fusion methods mainly concentrate on the convolutional neural networks, and few attempts are made with transformer.

Computational Efficiency

Paper
Code

Hashing-based Non-Maximum Suppression for Crowded Object Detection

1 code implementation • 22 May 2020 • Jianfeng Wang, Xi Yin, Lijuan Wang, Lei Zhang

Considering the intersection-over-union (IoU) as the metric, we propose a simple yet effective hashing algorithm, named IoUHash, which guarantees that the boxes within the same cell are close enough by a lower IoU bound.

object-detection Object Detection +1

Paper
Code

A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment

1 code implementation • 16 Mar 2024 • Tianhe Wu, Kede Ma, Jie Liang, Yujiu Yang, Lei Zhang

While Multimodal Large Language Models (MLLMs) have experienced significant advancement on visual understanding and reasoning, their potentials to serve as powerful, flexible, interpretable, and text-driven models for Image Quality Assessment (IQA) remains largely unexplored.

Image Quality Assessment

Paper
Code

Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation

1 code implementation • ICCV 2023 • Yichen Yuan, Yifan Wang, Lijun Wang, Xiaoqi Zhao, Huchuan Lu, Yu Wang, Weibo Su, Lei Zhang

Recent leading zero-shot video object segmentation (ZVOS) works devote to integrating appearance and motion information by elaborately designing feature fusion modules and identically applying them in multiple feature stages.

Semantic Segmentation Video Object Segmentation +2

Paper
Code

Masked Surfel Prediction for Self-Supervised Point Cloud Learning

1 code implementation • 7 Jul 2022 • Yabin Zhang, Jiehong Lin, Chenhang He, Yongwei Chen, Kui Jia, Lei Zhang

In this work, we make the first attempt, to the best of our knowledge, to consider the local geometry information explicitly into the masked auto-encoding, and propose a novel Masked Surfel Prediction (MaskSurf) method.

Point cloud reconstruction Self-Supervised Learning

Paper
Code

Spatial-Temporal Federated Learning for Lifelong Person Re-identification on Distributed Edges

1 code implementation • 24 Jul 2022 • Lei Zhang, Guanyu Gao, Huaizheng Zhang

Then, the learnt knowledge from edge clients will be aggregated by centralized parameter server, where the knowledge will be selectively and attentively distilled from spatial- and temporal-dimension with carefully designed mechanisms.

Continual Learning Federated Learning +2

Paper
Code

Multi-view Adversarial Discriminator: Mine the Non-causal Factors for Object Detection in Unseen Domains

1 code implementation • CVPR 2023 • Mingjun Xu, Lingyun Qin, WeiJie Chen, ShiLiang Pu, Lei Zhang

In this work, we present an idea to remove non-causal factors from common features by multi-view adversarial training on source domains, because we observe that such insignificant non-causal factors may still be significant in other latent spaces (views) due to the multi-mode structure of data.

Domain Generalization object-detection +1

Paper
Code

Directional Deep Embedding and Appearance Learning for Fast Video Object Segmentation

1 code implementation • 17 Feb 2020 • Yingjie Yin, De Xu, Xingang Wang, Lei Zhang

We propose a directional deep embedding and appearance learning (DDEAL) method, which is free of the online fine-tuning process, for fast VOS.

One-shot visual object segmentation Segmentation +2

Paper
Code

Unfolded Deep Kernel Estimation for Blind Image Super-resolution

1 code implementation • 10 Mar 2022 • Hongyi Zheng, Hongwei Yong, Lei Zhang

Nonetheless, the existing deep unfolding methods cannot explicitly solve the data term of the unfolding objective function, limiting their capability in blur kernel estimation.

Image Super-Resolution

Paper
Code

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

1 code implementation • 3 Apr 2018 • Dianqi Li, Qiuyuan Huang, Xiaodong He, Lei Zhang, Ming-Ting Sun

By contrasting with human-written captions and image-mismatched captions, the caption generator effectively exploits the inherent characteristics of human languages, and generates more discriminative captions.

Generative Adversarial Network

Paper
Code

Virtual Fully-Connected Layer: Training a Large-Scale Face Recognition Dataset With Limited Computational Resources

1 code implementation • CVPR 2021 • Pengyu Li, Biao Wang, Lei Zhang

This is because the classification paradigm needs to train a fully connected layer as the category classifier, and its parameters will be in the hundreds of millions if the training dataset contains millions of identities.

Face Recognition Metric Learning

Paper
Code

Beyond a Video Frame Interpolator: A Space Decoupled Learning Approach to Continuous Image Transition

1 code implementation • 18 Mar 2022 • Tao Yang, Peiran Ren, Xuansong Xie, Xiansheng Hua, Lei Zhang

Most of the existing deep learning based VFI methods adopt off-the-shelf optical flow algorithms to estimate the bidirectional flows and interpolate the missing frames accordingly.

Image Generation Image Morphing +3

Paper
Code

Adaptive Network Combination for Single-Image Reflection Removal: A Domain Generalization Perspective

1 code implementation • 4 Apr 2022 • Ming Liu, Jianan Pan, Zifei Yan, WangMeng Zuo, Lei Zhang

Meanwhile, diverse testing sets are also provided with different types of reflection and scenes.

Domain Generalization Reflection Removal

Paper
Code

Revisiting Prototypical Network for Cross Domain Few-Shot Learning

1 code implementation • CVPR 2023 • Fei Zhou, Peng Wang, Lei Zhang, Wei Wei, Yanning Zhang

Prototypical Network is a popular few-shot solver that aims at establishing a feature metric generalizable to novel few-shot classification (FSC) tasks using deep neural networks.

cross-domain few-shot learning Knowledge Distillation

Paper
Code

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models

1 code implementation • 20 Mar 2024 • Wenqiao Zhang, Tianwei Lin, Jiang Liu, Fangxun Shu, Haoyuan Li, Lei Zhang, He Wanggui, Hao Zhou, Zheqi Lv, Hao Jiang, Juncheng Li, Siliang Tang, Yueting Zhuang

Recent advancements indicate that scaling up Multimodal Large Language Models (MLLMs) effectively enhances performance on downstream multimodal tasks.

Ranked #72 on Visual Question Answering on MM-Vet

Visual Question Answering

Paper
Code

Probability Weighted Compact Feature for Domain Adaptive Retrieval

1 code implementation • CVPR 2020 • Fuxiang Huang, Lei Zhang, Yang Yang, Xichuan Zhou

Most of the existing image retrieval methods only focus on single-domain retrieval, which assumes that the distributions of retrieval databases and queries are similar.

Image Retrieval Quantization +1

Paper
Code

Label Propagation with Augmented Anchors: A Simple Semi-Supervised Learning baseline for Unsupervised Domain Adaptation

1 code implementation • ECCV 2020 • Yabin Zhang, Bin Deng, Kui Jia, Lei Zhang

To make the proposed A$^2$LP useful for UDA, we propose empirical schemes to generate such virtual instances.

Unsupervised Domain Adaptation

Paper
Code

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting

1 code implementation • ICCV 2021 • Binghui Chen, Zhaoyi Yan, Ke Li, Pengyu Li, Biao Wang, WangMeng Zuo, Lei Zhang

In crowd counting, due to the problem of laborious labelling, it is perceived intractability of collecting a new large-scale dataset which has plentiful images with large diversity in density, scene, etc.

Crowd Counting

Paper
Code

Benchmark Dataset and Effective Inter-Frame Alignment for Real-World Video Super-Resolution

1 code implementation • 10 Dec 2022 • Ruohao Wang, Xiaohui Liu, Zhilu Zhang, Xiaohe Wu, Chun-Mei Feng, Lei Zhang, WangMeng Zuo

On the other hand, alignment algorithms in existing VSR methods perform poorly for real-world videos, leading to unsatisfactory results.

Optical Flow Estimation Video Super-Resolution

Paper
Code

IVAC-P2L: Leveraging Irregular Repetition Priors for Improving Video Action Counting

1 code implementation • 18 Mar 2024 • Hang Wang, Zhi-Qi Cheng, Youtian Du, Lei Zhang

Our research addresses the shortfall by introducing a novel approach to VAC, called Irregular Video Action Counting (IVAC).

Paper
Code

Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

1 code implementation • 26 Mar 2024 • Yabin Zhang, Wenjie Zhu, Hui Tang, Zhiyuan Ma, Kaiyang Zhou, Lei Zhang

In this paper, we introduce a versatile adaptation approach that can effectively work under all three settings.

Paper
Code

DR-Unet104 for Multimodal MRI brain tumor segmentation

1 code implementation • 4 Nov 2020 • Jordan Colman, Lei Zhang, Wenting Duan, Xujiong Ye

We verified the effect of introducing the regularisation of dropout with small rate (e. g. 0. 2) on the architecture, and found a dropout of 0. 2 improved the overall performance compared to no dropout, or a dropout of 0. 5.

3D Architecture Brain Tumor Segmentation +3

Paper
Code

FPR: False Positive Rectification for Weakly Supervised Semantic Segmentation

1 code implementation • ICCV 2023 • Liyi Chen, Chenyang Lei, Ruihuang Li, Shuai Li, Zhaoxiang Zhang, Lei Zhang

Without introducing any external supervision and human priors, the proposed FPR effectively suppresses wrong activations from the background objects.

Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation

Paper
Code

Multi-adversarial Faster-RCNN for Unrestricted Object Detection

1 code implementation • ICCV 2019 • Zhenwei He, Lei Zhang

Conventional object detection methods essentially suppose that the training and testing data are collected from a restricted target domain with expensive labeling cost.

Domain Adaptation Object +2

Paper
Code

Semi-Supervised Domain Generalization with Evolving Intermediate Domain

1 code implementation • 19 Nov 2021 • Luojun Lin, Han Xie, Zhishu Sun, WeiJie Chen, Wenxi Liu, Yuanlong Yu, Lei Zhang

From this perspective, we introduce a novel paradigm of DG, termed as Semi-Supervised Domain Generalization (SSDG), to explore how the labeled and unlabeled source domains can interact, and establish two settings, including the close-set and open-set SSDG.

Domain Generalization Semi-Supervised Domain Generalization

Paper
Code

A Survey on Leveraging Pre-trained Generative Adversarial Networks for Image Editing and Restoration

1 code implementation • 21 Jul 2022 • Ming Liu, Yuxiang Wei, Xiaohe Wu, WangMeng Zuo, Lei Zhang

Generative adversarial networks (GANs) have drawn enormous attention due to the simple yet effective training mechanism and superior image generation quality.

Image Generation Image Restoration

Paper
Code

Inferring and Leveraging Parts from Object Shape for Improving Semantic Image Synthesis

1 code implementation • CVPR 2023 • Yuxiang Wei, Zhilong Ji, Xiaohe Wu, Jinfeng Bai, Lei Zhang, WangMeng Zuo

Despite the progress in semantic image synthesis, it remains a challenging problem to generate photo-realistic parts from input semantic map.

Image Generation Object

Paper
Code

Parameter Exchange for Robust Dynamic Domain Generalization

1 code implementation • 23 Nov 2023 • Luojun Lin, Zhifeng Shen, Zhishu Sun, Yuanlong Yu, Lei Zhang, WeiJie Chen

The parameters of dynamic networks can be decoupled into a static and a dynamic component, which are designed to learn domain-invariant and domain-specific features, respectively.

Disentanglement Domain Generalization

Paper
Code

Self-Supervised Video Desmoking for Laparoscopic Surgery

1 code implementation • 17 Mar 2024 • Renlong Wu, Zhilu Zhang, Shuohao Zhang, Longfei Gou, Haobin Chen, Lei Zhang, Hao Chen, WangMeng Zuo

On the other hand, in order to enhance the desmoking performance, we further feed the valuable information from PS frame into models, where a masking strategy and a regularization term are presented to avoid trivial solutions.

Paper
Code

Towards Efficient Data Free Black-Box Adversarial Attack

1 code implementation • CVPR 2022 • Jie Zhang, Bo Li, Jianghe Xu, Shuang Wu, Shouhong Ding, Lei Zhang, Chao Wu

The proposed method can efficiently imitate the target model through a small number of queries and achieve high attack success rate.

Adversarial Attack

Paper
Code

Automatic Network Pruning via Hilbert-Schmidt Independence Criterion Lasso under Information Bottleneck Principle

1 code implementation • ICCV 2023 • Song Guo, Lei Zhang, Xiawu Zheng, Yan Wang, Yuchao Li, Fei Chao, Chenglin Wu, Shengchuan Zhang, Rongrong Ji

In this paper, we try to solve this problem by introducing a principled and unified framework based on Information Bottleneck (IB) theory, which further guides us to an automatic pruning approach.

Network Pruning

Paper
Code

Remove Cosine Window from Correlation Filter-based Visual Trackers: When and How

1 code implementation • 16 May 2019 • Feng Li, Xiaohe Wu, WangMeng Zuo, David Zhang, Lei Zhang

Therefore, we in this paper investigate the feasibility to remove cosine window from CF trackers with spatial regularization.

Paper
Code

MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos

1 code implementation • CVPR 2023 • Minghan Li, Shuai Li, Wangmeng Xiang, Lei Zhang

The proposed MDQE is the first VIS method with per-clip input that achieves state-of-the-art results on challenging videos and competitive performance on simple videos.

Ranked #13 on Video Instance Segmentation on YouTube-VIS 2021

Instance Segmentation Semantic Segmentation +1

Paper
Code

Optimization-Free Test-Time Adaptation for Cross-Person Activity Recognition

1 code implementation • 28 Oct 2023 • Shuoyuan Wang, Jindong Wang, Huajun Xi, Bob Zhang, Lei Zhang, Hongxin Wei

However, the high computational cost of optimization-based TTA algorithms makes it intractable to run on resource-constrained edge devices.

Computational Efficiency Human Activity Recognition +2

Paper
Code

Toward Accurate and Temporally Consistent Video Restoration from Raw Data

1 code implementation • 25 Dec 2023 • Shi Guo, jianqi ma, Xi Yang, Zhengqiang Zhang, Lei Zhang

Extensive experiments demonstrate the leading VJDD performance of our method in term of restoration accuracy, perceptual quality and temporal consistency.

Demosaicking Denoising +2

Paper
Code

Solving Rubik's Cube with a Robot Hand

2 code implementations • 16 Oct 2019 • OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang

We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot.

Meta-Learning Rubik's Cube

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.