Search Results for author: Lei Zhang

Found 577 papers, 254 papers with code

CvT: Introducing Convolutions to Vision Transformers

14 code implementations ICCV 2021 Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang

We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.

Ranked #3 on Image Classification on Flowers-102 (using extra training data)

Image Classification

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

7 code implementations9 Mar 2023 Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang

To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion.

Referring Expression Referring Expression Comprehension +2

MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

11 code implementations27 Jul 2016 Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, Jianfeng Gao

In this paper, we design a benchmark task and provide the associated datasets for recognizing face images and link them to corresponding entity keys in a knowledge base.

Face Recognition Image Captioning

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

15 code implementations7 Mar 2022 Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum

Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results.

Real-Time Object Detection

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

1 code implementation25 Jan 2024 Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang

We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM).

Segmentation

Lite-HRNet: A Lightweight High-Resolution Network

15 code implementations CVPR 2021 Changqian Yu, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, Jingdong Wang

We introduce a lightweight unit, conditional channel weighting, to replace costly pointwise (1x1) convolutions in shuffle blocks.

Ranked #33 on Pose Estimation on COCO test-dev (using extra training data)

Pose Estimation Real-Time Semantic Segmentation +1

Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use Cases

1 code implementation26 Mar 2023 Yunjie Ji, Yong Deng, Yan Gong, Yiping Peng, Qiang Niu, Lei Zhang, Baochang Ma, Xiangang Li

However current research rarely studies the impact of different amounts of instruction data on model performance, especially in the real-world use cases.

Math

WantWords: An Open-source Online Reverse Dictionary System

1 code implementation EMNLP 2020 Fanchao Qi, Lei Zhang, Yanhui Yang, Zhiyuan Liu, Maosong Sun

A reverse dictionary takes descriptions of words as input and outputs words semantically matching the input descriptions.

Reverse Dictionary

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

65 code implementations CVPR 2018 Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang

Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning.

Image Captioning Visual Question Answering

Large-Scale Intelligent Microservices

1 code implementation17 Sep 2020 Mark Hamilton, Nick Gonsalves, Christina Lee, Anand Raman, Brendan Walsh, Siddhartha Prasad, Dalitso Banda, Lucy Zhang, Mei Gao, Lei Zhang, William T. Freeman

Deploying Machine Learning (ML) algorithms within databases is a challenge due to the varied computational footprints of modern ML algorithms and the myriad of database technologies each with its own restrictive syntax.

Anomaly Detection

Osprey: Pixel Understanding with Visual Instruction Tuning

2 code implementations15 Dec 2023 Yuqian Yuan, Wentong Li, Jian Liu, Dongqi Tang, Xinjie Luo, Chi Qin, Lei Zhang, Jianke Zhu

In this paper, we propose Osprey, a mask-text instruction tuning approach, to extend MLLMs by incorporating fine-grained mask regions into language instruction, aiming at achieving pixel-wise visual understanding.

Language Modelling

Gradient Centralization: A New Optimization Technique for Deep Neural Networks

7 code implementations ECCV 2020 Hongwei Yong, Jianqiang Huang, Xian-Sheng Hua, Lei Zhang

It has been shown that using the first and second order statistics (e. g., mean and variance) to perform Z-score standardization on network activations or weight vectors, such as batch normalization (BN) and weight standardization (WS), can improve the training performance.

Fine-Grained Image Classification General Classification

Tag2Text: Guiding Vision-Language Model via Image Tagging

2 code implementations10 Mar 2023 Xinyu Huang, Youcai Zhang, Jinyu Ma, Weiwei Tian, Rui Feng, Yuejie Zhang, Yaqian Li, Yandong Guo, Lei Zhang

This paper presents Tag2Text, a vision language pre-training (VLP) framework, which introduces image tagging into vision-language models to guide the learning of visual-linguistic features.

Language Modelling TAG

Recognize Anything: A Strong Image Tagging Model

2 code implementations6 Jun 2023 Youcai Zhang, Xinyu Huang, Jinyu Ma, Zhaoyang Li, Zhaochuan Luo, Yanchun Xie, Yuzhuo Qin, Tong Luo, Yaqian Li, Shilong Liu, Yandong Guo, Lei Zhang

We are releasing the RAM at \url{https://recognize-anything. github. io/} to foster the advancements of large models in computer vision.

Semantic Parsing

Open-Set Image Tagging with Multi-Grained Text Supervision

2 code implementations23 Oct 2023 Xinyu Huang, Yi-Jie Huang, Youcai Zhang, Weiwei Tian, Rui Feng, Yuejie Zhang, Yanchun Xie, Yaqian Li, Lei Zhang

Specifically, for predefined commonly used tag categories, RAM++ showcases 10. 2 mAP and 15. 4 mAP enhancements over CLIP on OpenImages and ImageNet.

Human-Object Interaction Detection Open Set Learning +1

Mutual Consistency Learning for Semi-supervised Medical Image Segmentation

2 code implementations21 Sep 2021 Yicheng Wu, ZongYuan Ge, Donghao Zhang, Minfeng Xu, Lei Zhang, Yong Xia, Jianfei Cai

In this paper, we propose a novel mutual consistency network (MC-Net+) to effectively exploit the unlabeled data for semi-supervised medical image segmentation.

Image Segmentation Segmentation +2

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

16 code implementations CVPR 2022 Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang

Our method is universal and can be easily plugged into any DETR-like methods by adding dozens of lines of code to achieve a remarkable improvement.

Object Detection

Grounded Language-Image Pre-training

2 code implementations CVPR 2022 Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq-Neng Hwang, Kai-Wei Chang, Jianfeng Gao

The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich.

Described Object Detection Few-Shot Object Detection +1

Semantic-SAM: Segment and Recognize Anything at Any Granularity

1 code implementation10 Jul 2023 Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao

In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.

Image Segmentation Segmentation +1

Visual In-Context Prompting

3 code implementations22 Nov 2023 Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao

In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain.

Segmentation Visual Prompting

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

1 code implementation21 Mar 2024 Qing Jiang, Feng Li, Zhaoyang Zeng, Tianhe Ren, Shilong Liu, Lei Zhang

Recognizing the complementary strengths and weaknesses of both text and visual prompts, we introduce T-Rex2 that synergizes both prompts within a single model through contrastive learning.

Contrastive Learning Descriptive +3

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

7 code implementations ICLR 2022 Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang

We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR.

Object Detection

detrex: Benchmarking Detection Transformers

1 code implementation12 Jun 2023 Tianhe Ren, Shilong Liu, Feng Li, Hao Zhang, Ailing Zeng, Jie Yang, Xingyu Liao, Ding Jia, Hongyang Li, He Cao, Jianan Wang, Zhaoyang Zeng, Xianbiao Qi, Yuhui Yuan, Jianwei Yang, Lei Zhang

To address this issue, we develop a unified, highly modular, and lightweight codebase called detrex, which supports a majority of the mainstream DETR-based instance recognition algorithms, covering various fundamental tasks, including object detection, segmentation, and pose estimation.

Benchmarking object-detection +2

Are Transformers Effective for Time Series Forecasting?

4 code implementations26 May 2022 Ailing Zeng, Muxi Chen, Lei Zhang, Qiang Xu

Recently, there has been a surge of Transformer-based solutions for the long-term time series forecasting (LTSF) task.

Anomaly Detection Temporal Relation Extraction +2

Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

20 code implementations13 Aug 2016 Kai Zhang, WangMeng Zuo, Yunjin Chen, Deyu Meng, Lei Zhang

Discriminative model learning for image denoising has been recently attracting considerable attentions due to its favorable denoising performance.

Color Image Denoising Image Deblocking +3

A Simple Framework for Open-Vocabulary Segmentation and Detection

2 code implementations ICCV 2023 Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang

We present OpenSeeD, a simple Open-vocabulary Segmentation and Detection framework that jointly learns from different segmentation and detection datasets.

Ranked #2 on Instance Segmentation on ADE20K val (using extra training data)

Instance Segmentation Panoptic Segmentation +2

Unified Vision-Language Pre-Training for Image Captioning and VQA

3 code implementations24 Sep 2019 Luowei Zhou, Hamid Palangi, Lei Zhang, Houdong Hu, Jason J. Corso, Jianfeng Gao

The model is unified in that (1) it can be fine-tuned for either vision-language generation (e. g., image captioning) or understanding (e. g., visual question answering) tasks, and (2) it uses a shared multi-layer transformer network for both encoding and decoding, which differs from many existing methods where the encoder and decoder are implemented using separate models.

Image Captioning Question Answering +2

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

4 code implementations ECCV 2020 Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiao-Wei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, Jianfeng Gao

Large-scale pre-training methods of learning cross-modal representations on image-text pairs are becoming popular for vision-language tasks.

 Ranked #1 on Image Retrieval on MS COCO (Recall@10 metric)

Image Captioning Image Retrieval +3

Accelerating Dataset Distillation via Model Augmentation

2 code implementations CVPR 2023 Lei Zhang, Jie Zhang, Bowen Lei, Subhabrata Mukherjee, Xiang Pan, Bo Zhao, Caiwen Ding, Yao Li, Dongkuan Xu

Dataset Distillation (DD), a newly emerging field, aims at generating much smaller but efficient synthetic training datasets from large ones.

VinVL: Revisiting Visual Representations in Vision-Language Models

7 code implementations CVPR 2021 Pengchuan Zhang, Xiujun Li, Xiaowei Hu, Jianwei Yang, Lei Zhang, Lijuan Wang, Yejin Choi, Jianfeng Gao

In our experiments we feed the visual features generated by the new object detection model into a Transformer-based VL fusion model \oscar \cite{li2020oscar}, and utilize an improved approach \short\ to pre-train the VL model and fine-tune it on a wide range of downstream VL tasks.

Image Captioning Image-text matching +4

Blind Face Restoration via Deep Multi-scale Component Dictionaries

1 code implementation ECCV 2020 Xiaoming Li, Chaofeng Chen, Shangchen Zhou, Xianhui Lin, WangMeng Zuo, Lei Zhang

Next, with the degraded input, we match and select the most similar component features from their corresponding dictionaries and transfer the high-quality details to the input via the proposed dictionary feature transfer (DFT) block.

Blind Face Restoration Video Super-Resolution

Deep Plug-and-Play Super-Resolution for Arbitrary Blur Kernels

1 code implementation CVPR 2019 Kai Zhang, WangMeng Zuo, Lei Zhang

In this paper, we propose a principled formulation and framework by extending bicubic degradation based deep SISR with the help of plug-and-play framework to handle LR images with arbitrary blur kernels.

Deblurring Image Restoration +1

Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization

1 code implementation28 Aug 2023 Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, Lei Zhang

Diffusion models have demonstrated impressive performance in various image generation, editing, enhancement and translation tasks.

Image Enhancement Image Generation +3

Learning Image-adaptive 3D Lookup Tables for High Performance Photo Enhancement in Real-time

1 code implementation30 Sep 2020 Hui Zeng, Jianrui Cai, Lida Li, Zisheng Cao, Lei Zhang

The small CNN works on the down-sampled version of the input image to predict content-dependent weights to fuse the multiple basis 3D LUTs into an image-adaptive one, which is employed to transform the color and tone of source images efficiently.

Ranked #5 on Image Enhancement on MIT-Adobe 5k (SSIM on proRGB metric)

4k Image Enhancement +2

A Strong and Reproducible Object Detector with Only Public Datasets

2 code implementations25 Apr 2023 Tianhe Ren, Jianwei Yang, Shilong Liu, Ailing Zeng, Feng Li, Hao Zhang, Hongyang Li, Zhaoyang Zeng, Lei Zhang

This work presents Focal-Stable-DINO, a strong and reproducible object detection model which achieves 64. 6 AP on COCO val2017 and 64. 8 AP on COCO test-dev using only 700M parameters without any test time augmentation.

Ranked #5 on Object Detection on COCO minival (using extra training data)

object-detection Object Detection

Plug-and-Play Image Restoration with Deep Denoiser Prior

4 code implementations31 Aug 2020 Kai Zhang, Yawei Li, WangMeng Zuo, Lei Zhang, Luc van Gool, Radu Timofte

Recent works on plug-and-play image restoration have shown that a denoiser can implicitly serve as the image prior for model-based methods to solve many inverse problems.

Deblurring Demosaicking +1

Learning Deep CNN Denoiser Prior for Image Restoration

2 code implementations CVPR 2017 Kai Zhang, WangMeng Zuo, Shuhang Gu, Lei Zhang

Recent works have revealed that, with the aid of variable splitting techniques, denoiser prior can be plugged in as a modular part of model-based optimization methods to solve other inverse problems (e. g., deblurring).

Color Image Denoising Deblurring +2

One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer

1 code implementation CVPR 2023 Jing Lin, Ailing Zeng, Haoqian Wang, Lei Zhang, Yu Li

It is challenging to perform this task with a single network due to resolution issues, i. e., the face and hands are usually located in extremely small regions.

3D Human Pose Estimation 3D Human Reconstruction +1

Toward Convolutional Blind Denoising of Real Photographs

3 code implementations CVPR 2019 Shi Guo, Zifei Yan, Kai Zhang, WangMeng Zuo, Lei Zhang

While deep convolutional neural networks (CNNs) have achieved impressive success in image denoising with additive white Gaussian noise (AWGN), their performance remains limited on real-world noisy photographs.

Image Denoising Noise Estimation

ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation

1 code implementation ICCV 2023 Yuxiang Wei, Yabo Zhang, Zhilong Ji, Jinfeng Bai, Lei Zhang, WangMeng Zuo

In addition to the unprecedented ability in imaginary creation, large text-to-image models are expected to take customized concepts in image generation.

Text-to-Image Generation

Image-Adaptive YOLO for Object Detection in Adverse Weather Conditions

1 code implementation15 Dec 2021 Wenyu Liu, Gaofeng Ren, Runsheng Yu, Shi Guo, Jianke Zhu, Lei Zhang

Though deep learning-based object detection methods have achieved promising results on the conventional datasets, it is still challenging to locate objects from the low-quality images captured in adverse weather conditions.

Image Enhancement object-detection +1

Improving Nighttime Driving-Scene Segmentation via Dual Image-adaptive Learnable Filters

2 code implementations4 Jul 2022 Wenyu Liu, Wentong Li, Jianke Zhu, Miaomiao Cui, Xuansong Xie, Lei Zhang

With DIAL-Filters, we design both unsupervised and supervised frameworks for nighttime driving-scene segmentation, which can be trained in an end-to-end manner.

Autonomous Driving Scene Segmentation +1

FFDNet: Toward a Fast and Flexible Solution for CNN based Image Denoising

7 code implementations11 Oct 2017 Kai Zhang, WangMeng Zuo, Lei Zhang

Due to the fast inference and good performance, discriminative learning methods have been widely studied in image denoising.

Color Image Denoising Image Denoising

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

1 code implementation CVPR 2021 Jie Liang, Hui Zeng, Lei Zhang

Existing image-to-image translation (I2IT) methods are either constrained to low-resolution images or long inference time due to their heavy computational burden on the convolution of high-resolution feature maps.

4k Attribute +4

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

3 code implementations ICCV 2021 Pengchuan Zhang, Xiyang Dai, Jianwei Yang, Bin Xiao, Lu Yuan, Lei Zhang, Jianfeng Gao

This paper presents a new Vision Transformer (ViT) architecture Multi-Scale Vision Longformer, which significantly enhances the ViT of \cite{dosovitskiy2020image} for encoding high-resolution images using two techniques.

Image Classification Instance Segmentation +2

Box2Mask: Box-supervised Instance Segmentation via Level-set Evolution

2 code implementations3 Dec 2022 Wentong Li, Wenyu Liu, Jianke Zhu, Miaomiao Cui, Risheng Yu, Xiansheng Hua, Lei Zhang

In contrast to fully supervised methods using pixel-wise mask labels, box-supervised instance segmentation takes advantage of simple box annotations, which has recently attracted increasing research attention.

Box-supervised Instance Segmentation Segmentation

Query2Label: A Simple Transformer Way to Multi-Label Classification

2 code implementations22 Jul 2021 Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, Jun Zhu

The use of Transformer is rooted in the need of extracting local discriminative features adaptively for different labels, which is a strongly desired property due to the existence of multiple objects in one image.

Classification Multi-Label Classification

Image Scene Graph Generation (SGG) Benchmark

1 code implementation27 Jul 2021 Xiaotian Han, Jianwei Yang, Houdong Hu, Lei Zhang, Jianfeng Gao, Pengchuan Zhang

There is a surge of interest in image scene graph generation (object, attribute and relationship detection) due to the need of building fine-grained image understanding models that go beyond object detection.

Attribute Graph Generation +6

Progressive Semantic-Aware Style Transformation for Blind Face Restoration

1 code implementation CVPR 2021 Chaofeng Chen, Xiaoming Li, Lingbo Yang, Xianhui Lin, Lei Zhang, Kwan-Yee K. Wong

Compared with previous networks, the proposed PSFR-GAN makes full use of the semantic (parsing maps) and pixel (LQ images) space information from different scales of input pairs.

Blind Face Restoration Face Parsing +2

Improving the Stability of Diffusion Models for Content Consistent Super-Resolution

1 code implementation30 Dec 2023 Lingchen Sun, Rongyuan Wu, Zhengqiang Zhang, Hongwei Yong, Lei Zhang

To improve the stability of diffusion prior-based SR, we propose to employ the diffusion models to refine image structures, while employing the generative adversarial training to enhance image fine details.

Image Super-Resolution Text-to-Image Generation

MicroNet: Improving Image Recognition with Extremely Low FLOPs

1 code implementation ICCV 2021 Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos

This paper aims at addressing the problem of substantial performance degradation at extremely low computational cost (e. g. 5M FLOPs on ImageNet classification).

Feature Pyramid and Hierarchical Boosting Network for Pavement Crack Detection

1 code implementation18 Jan 2019 Fan Yang, Lei Zhang, Sijia Yu, Danil Prokhorov, Xue Mei, Haibin Ling

To demonstrate the superiority and generality of the proposed method, we evaluate the proposed method on five crack datasets and compare it with state-of-the-art crack detection, edge detection, semantic segmentation methods.

Edge Detection Semantic Segmentation

OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering

1 code implementation CVPR 2023 Zhiyuan Ma, Xiangyu Zhu, GuoJun Qi, Zhen Lei, Lei Zhang

In this paper, we propose One-shot Talking face Avatar (OTAvatar), which constructs face avatars by a generalized controllable tri-plane rendering solution so that each personalized avatar can be constructed from only one portrait as the reference.

Object-driven Text-to-Image Synthesis via Adversarial Training

1 code implementation CVPR 2019 Wenbo Li, Pengchuan Zhang, Lei Zhang, Qiuyuan Huang, Xiaodong He, Siwei Lyu, Jianfeng Gao

In this paper, we propose Object-driven Attentive Generative Adversarial Newtorks (Obj-GANs) that allow object-centered text-to-image synthesis for complex scenes.

Image Generation Object

SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution

1 code implementation27 Nov 2023 Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, Lei Zhang

First, we train a degradation-aware prompt extractor, which can generate accurate soft and hard semantic prompts even under strong degradation.

Image Super-Resolution

Semi-supervised Left Atrium Segmentation with Mutual Consistency Training

3 code implementations4 Mar 2021 Yicheng Wu, Minfeng Xu, ZongYuan Ge, Jianfei Cai, Lei Zhang

Such mutual consistency encourages the two decoders to have consistent and low-entropy predictions and enables the model to gradually capture generalized features from these unlabeled challenging regions.

Image Segmentation Left Atrium Segmentation +4

Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution

2 code implementations CVPR 2022 Jie Liang, Hui Zeng, Lei Zhang

In this paper, we demonstrate that it is possible to train a GAN-based SISR model which can stably generate perceptually realistic details while inhibiting visual artifacts.

Image Super-Resolution

HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation

1 code implementation ICCV 2023 Xuan Ju, Ailing Zeng, Chenchen Zhao, Jianan Wang, Lei Zhang, Qiang Xu

While such a plug-and-play approach is appealing, the inevitable and uncertain conflicts between the original images produced from the frozen SD branch and the given condition incur significant challenges for the learnable branch, which essentially conducts image feature editing for condition enforcement.

Denoising Image Generation

PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency

1 code implementation CVPR 2021 Jie Liang, Hui Zeng, Miaomiao Cui, Xuansong Xie, Lei Zhang

HRP requires that more attention should be paid to human regions, while GLC requires that a group of portrait photos should be retouched to a consistent tone.

Photo Retouching

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

1 code implementation5 Dec 2023 Hao Zhang, Hongyang Li, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao, Lei Zhang, Chunyuan Li, Jianwei Yang

To address this issue, we have created GVC data that allows for the combination of grounding and chat capabilities.

UniPose: Detecting Any Keypoints

1 code implementation12 Oct 2023 Jie Yang, Ailing Zeng, Ruimao Zhang, Lei Zhang

This work proposes a unified framework called UniPose to detect keypoints of any articulated (e. g., human and animal), rigid, and soft objects via visual or textual prompts for fine-grained vision understanding and manipulation.

 Ranked #1 on 2D Human Pose Estimation on Human-Art (using extra training data)

2D Human Pose Estimation 2D Pose Estimation +4

Real-world Noisy Image Denoising: A New Benchmark

2 code implementations7 Apr 2018 Jun Xu, Hui Li, Zhetong Liang, David Zhang, Lei Zhang

In order to promote the study on this problem while implementing the concurrent real-world image denoising datasets, we construct a new benchmark dataset which contains comprehensive real-world noisy images of different natural scenes.

Image Denoising

Unsupervised Pre-training for Person Re-identification

1 code implementation CVPR 2021 Dengpan Fu, Dongdong Chen, Jianmin Bao, Hao Yang, Lu Yuan, Lei Zhang, Houqiang Li, Dong Chen

In this paper, we present a large scale unlabeled person re-identification (Re-ID) dataset "LUPerson" and make the first attempt of performing unsupervised pre-training for improving the generalization ability of the learned person Re-ID feature representation.

 Ranked #1 on Person Re-Identification on Market-1501 (using extra training data)

Data Augmentation Person Re-Identification +1

Large-Scale Pre-training for Person Re-identification with Noisy Labels

2 code implementations CVPR 2022 Dengpan Fu, Dongdong Chen, Hao Yang, Jianmin Bao, Lu Yuan, Lei Zhang, Houqiang Li, Fang Wen, Dong Chen

Since theses ID labels automatically derived from tracklets inevitably contain noises, we develop a large-scale Pre-training framework utilizing Noisy Labels (PNL), which consists of three learning modules: supervised Re-ID learning, prototype-based contrastive learning, and label-guided contrastive learning.

Contrastive Learning Multi-Object Tracking +3

Variational Denoising Network: Toward Blind Noise Modeling and Removal

2 code implementations NeurIPS 2019 Zongsheng Yue, Hongwei Yong, Qian Zhao, Lei Zhang, Deyu Meng

On one hand, as other data-driven deep learning methods, our method, namely variational denoising network (VDN), can perform denoising efficiently due to its explicit form of posterior expression.

Image Denoising Noise Estimation +1

Efficient Long-Range Attention Network for Image Super-resolution

1 code implementation13 Mar 2022 Xindong Zhang, Hui Zeng, Shi Guo, Lei Zhang

A highly efficient long-range attention block (ELAB) is then built by simply cascading two shift-conv with a GMSA module, which is further accelerated by using a shared attention mechanism.

Image Super-Resolution

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds

1 code implementation CVPR 2022 Chenhang He, Ruihuang Li, Shuai Li, Lei Zhang

VoxSeT is built upon a voxel-based set attention (VSA) module, which reduces the self-attention in each voxel by two cross-attentions and models features in a hidden space induced by a group of latent codes.

3D Object Detection object-detection

A PID Controller Approach for Stochastic Optimization of Deep Networks

3 code implementations CVPR 2018 Wangpeng An, Haoqian Wang, Qingyun Sun, Jun Xu, Qionghai Dai, Lei Zhang

We first reveal the intrinsic connections between SGD-Momentum and PID based controller, then present the optimization algorithm which exploits the past, current, and change of gradients to update the network parameters.

Stochastic Optimization

Box-supervised Instance Segmentation with Level Set Evolution

1 code implementation19 Jul 2022 Wentong Li, Wenyu Liu, Jianke Zhu, Miaomiao Cui, Xiansheng Hua, Lei Zhang

A simple mask supervised SOLOv2 model is adapted to predict the instance-aware mask map as the level set for each instance.

Box-supervised Instance Segmentation Segmentation

Detection Transformer with Stable Matching

1 code implementation ICCV 2023 Shilong Liu, Tianhe Ren, Jiayu Chen, Zhaoyang Zeng, Hao Zhang, Feng Li, Hongyang Li, Jun Huang, Hang Su, Jun Zhu, Lei Zhang

We point out that the unstable matching in DETR is caused by a multi-optimization path problem, which is highlighted by the one-to-one matching design in DETR.

Position

Dual Adversarial Network: Toward Real-world Noise Removal and Noise Generation

2 code implementations ECCV 2020 Zongsheng Yue, Qian Zhao, Lei Zhang, Deyu Meng

Specifically, we approximate the joint distribution with two different factorized forms, which can be formulated as a denoiser mapping the noisy image to the clean one and a generator mapping the clean image to the noisy one.

Image Denoising Noise Estimation

Towards Diverse Binary Segmentation via A Simple yet General Gated Network

1 code implementation18 Mar 2023 Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu, Lei Zhang

They ignore two key problems when the encoder exchanges information with the decoder: one is the lack of interference control mechanism between them, the other is without considering the disparity of the contributions from different encoder levels.

Segmentation Semantic Segmentation

Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization

1 code implementation CVPR 2022 Yabin Zhang, Minghan Li, Ruihuang Li, Kui Jia, Lei Zhang

In this work, we, for the first time to our best knowledge, propose to perform Exact Feature Distribution Matching (EFDM) by exactly matching the empirical Cumulative Distribution Functions (eCDFs) of image features, which could be implemented by applying the Exact Histogram Matching (EHM) in the image feature space.

Domain Generalization Style Transfer

Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking

1 code implementation CVPR 2018 Feng Li, Cheng Tian, WangMeng Zuo, Lei Zhang, Ming-Hsuan Yang

Compared with SRDCF, STRCF with hand-crafted features provides a 5 times speedup and achieves a gain of 5. 4% and 3. 6% AUC score on OTB-2015 and Temple-Color, respectively.

Visual Object Tracking Visual Tracking

Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation

3 code implementations3 Feb 2023 Jie Yang, Ailing Zeng, Shilong Liu, Feng Li, Ruimao Zhang, Lei Zhang

This paper presents a novel end-to-end framework with Explicit box Detection for multi-person Pose estimation, called ED-Pose, where it unifies the contextual learning between human-level (global) and keypoint-level (local) information.

2D Human Pose Estimation Human Detection +3

A Dual Weighting Label Assignment Scheme for Object Detection

1 code implementation CVPR 2022 Shuai Li, Chenhang He, Ruihuang Li, Lei Zhang

Existing LA methods mostly focus on the design of pos weighting function, while the neg weight is directly derived from the pos weight.

Object object-detection +2

Text Prior Guided Scene Text Image Super-resolution

1 code implementation29 Jun 2021 jianqi ma, Shi Guo, Lei Zhang

Our experiments on the benchmark TextZoom dataset show that TPGSR can not only effectively improve the visual quality of scene text images, but also significantly improve the text recognition accuracy over existing STISR methods.

Image Super-Resolution

Deep Convolutional Dictionary Learning for Image Denoising

1 code implementation CVPR 2021 Hongyi Zheng, Hongwei Yong, Lei Zhang

Inspired by the great success of deep neural networks (DNNs), many unfolding methods have been proposed to integrate traditional image modeling techniques, such as dictionary learning (DicL) and sparse coding, into DNNs for image restoration.

Dictionary Learning Image Denoising +2

Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution

1 code implementation27 Mar 2022 Jie Liang, Hui Zeng, Lei Zhang

Specifically, a tiny regression network is employed to predict the degradation parameters of the input image, while several convolutional experts with the same topology are jointly optimized to specify the network parameters via a non-linear mixture of experts.

Image Super-Resolution

UniVS: Unified and Universal Video Segmentation with Prompts as Queries

1 code implementation28 Feb 2024 Minghan Li, Shuai Li, Xindong Zhang, Lei Zhang

Despite the recent advances in unified image segmentation (IS), developing a unified video segmentation (VS) model remains a challenge.

Ranked #2 on Video Semantic Segmentation on VSPW (using extra training data)

Referring Expression Segmentation Referring Video Object Segmentation +6

A Probabilistic Quality Representation Approach to Deep Blind Image Quality Prediction

1 code implementation28 Aug 2017 Hui Zeng, Lei Zhang, Alan C. Bovik

Recognizing this, we propose a new representation of perceptual image quality, called probabilistic quality representation (PQR), to describe the image subjective score distribution, whereby a more robust loss function can be employed to train a deep BIQA model.

Blind Image Quality Assessment regression

Learning Dual Memory Dictionaries for Blind Face Restoration

1 code implementation15 Oct 2022 Xiaoming Li, Shiguang Zhang, Shangchen Zhou, Lei Zhang, WangMeng Zuo

Generally, it is a challenging and intractable task to improve the photo-realistic performance of blind restoration and adaptively handle the generic and specific restoration scenarios with a single unified model.

Blind Face Restoration

Grid Anchor based Image Cropping: A New Benchmark and An Efficient Model

1 code implementation18 Sep 2019 Hui Zeng, Lida Li, Zisheng Cao, Lei Zhang

The employed evaluation metrics such as intersection-over-union cannot reliably reflect the real performance of a cropping model, either.

Image Cropping

Reliable and Efficient Image Cropping: A Grid Anchor based Approach

1 code implementation CVPR 2019 Hui Zeng, Lida Li, Zisheng Cao, Lei Zhang

Consequently, a grid anchor based cropping benchmark is constructed, where all crops of each image are annotated and more reliable evaluation metrics are defined.

Image Cropping

Real-World Video Super-Resolution: A Benchmark Dataset and a Decomposition Based Learning Scheme

1 code implementation ICCV 2021 Xi Yang, Wangmeng Xiang, Hui Zeng, Lei Zhang

Existing VSR methods are mostly trained and evaluated on synthetic datasets, where the LR videos are uniformly downsampled from their high-resolution (HR) counterparts by some simple operators (e. g., bicubic downsampling).

Video Super-Resolution

Human Guided Ground-truth Generation for Realistic Image Super-resolution

1 code implementation CVPR 2023 Du Chen, Jie Liang, Xindong Zhang, Ming Liu, Hui Zeng, Lei Zhang

A human guided GT image dataset with both positive and negative samples is then constructed, and a loss function is proposed to train the Real-ISR models.

Image Enhancement Image Super-Resolution

Multi-channel Reverse Dictionary Model

1 code implementation18 Dec 2019 Lei Zhang, Fanchao Qi, Zhiyuan Liu, Yasheng Wang, Qun Liu, Maosong Sun

A reverse dictionary takes the description of a target word as input and outputs the target word together with other words that match the description.

Reverse Dictionary Sentence

From Face to Natural Image: Learning Real Degradation for Blind Image Super-Resolution

1 code implementation3 Oct 2022 Xiaoming Li, Chaofeng Chen, Xianhui Lin, WangMeng Zuo, Lei Zhang

Notably, LQ face images, which may have the same degradation process as natural images, can be robustly restored with photo-realistic textures by exploiting their strong structural priors.

Image Generation Image Super-Resolution

Deep Adaptive Inference Networks for Single Image Super-Resolution

1 code implementation8 Apr 2020 Ming Liu, Zhilu Zhang, Liya Hou, WangMeng Zuo, Lei Zhang

Nonetheless, content and resource adaptive model is more preferred, and it is encouraging to apply simpler and efficient networks to the easier regions with less details and the scenarios with restricted efficiency constraints.

Image Super-Resolution

Dense Learning based Semi-Supervised Object Detection

1 code implementation CVPR 2022 Binghui Chen, Pengyu Li, Xiang Chen, Biao Wang, Lei Zhang, Xian-Sheng Hua

Semi-supervised object detection (SSOD) aims to facilitate the training and deployment of object detectors with the help of a large amount of unlabeled data.

Object object-detection +2

CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise

3 code implementations CVPR 2018 Kuang-Huei Lee, Xiaodong He, Lei Zhang, Linjun Yang

We demonstrate the effectiveness of the proposed algorithm on both of the label noise detection task and the image classification on noisy data task on several large-scale datasets.

Ranked #2 on Image Classification on Food-101N (using extra training data)

Classification General Classification +2

Blind Image Super-resolution with Elaborate Degradation Modeling on Noise and Kernel

1 code implementation CVPR 2022 Zongsheng Yue, Qian Zhao, Jianwen Xie, Lei Zhang, Deyu Meng, Kwan-Yee K. Wong

To address the above issues, this paper proposes a model-based blind SISR method under the probabilistic framework, which elaborately models image degradation from the perspectives of noise and blur kernel.

Image Super-Resolution

Chasing Sparsity in Vision Transformers: An End-to-End Exploration

1 code implementation NeurIPS 2021 Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang

For example, our sparsified DeiT-Small at (5%, 50%) sparsity for (data, architecture), improves 0. 28% top-1 accuracy, and meanwhile enjoys 49. 32% FLOPs and 4. 40% running time savings.

Efficient ViTs

SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

1 code implementation ICCV 2021 Jiapeng Tang, Jiabao Lei, Dan Xu, Feiying Ma, Kui Jia, Lei Zhang

To this end, we propose to learn implicit surface reconstruction by sign-agnostic optimization of convolutional occupancy networks, to simultaneously achieve advanced scalability to large-scale scenes, generality to novel shapes, and applicability to raw scans in a unified framework.

Surface Reconstruction

Unsupervised Multi-Class Domain Adaptation: Theory, Algorithms, and Practice

2 code implementations20 Feb 2020 Yabin Zhang, Bin Deng, Hui Tang, Lei Zhang, Kui Jia

By using MCSD as a measure of domain distance, we develop a new domain adaptation bound for multi-class UDA; its data-dependent, probably approximately correct bound is also developed that naturally suggests adversarial learning objectives to align conditional feature distributions across source and target domains.

Domain Adaptation Multi-class Classification

Generative Action Description Prompts for Skeleton-based Action Recognition

3 code implementations ICCV 2023 Wangmeng Xiang, Chao Li, Yuxuan Zhou, Biao Wang, Lei Zhang

More specifically, we employ a pre-trained large-scale language model as the knowledge engine to automatically generate text descriptions for body parts movements of actions, and propose a multi-modal training scheme by utilizing the text encoder to generate feature vectors for different body parts and supervise the skeleton encoder for action representation learning.

Action Recognition Language Modelling +2

AutoLoc: Weakly-supervised Temporal Action Localization

1 code implementation22 Jul 2018 Zheng Shou, Hang Gao, Lei Zhang, Kazuyuki Miyazawa, Shih-Fu Chang

In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance.

Weakly-supervised Temporal Action Localization Weakly Supervised Temporal Action Localization

A Benchmark for Edge-Preserving Image Smoothing

1 code implementation2 Apr 2019 Feida Zhu, Zhetong Liang, Xixi Jia, Lei Zhang, Yizhou Yu

This benchmark includes an image dataset with groundtruth image smoothing results as well as baseline algorithms that can generate competitive edge-preserving smoothing results for a wide range of image contents.

image smoothing

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption

1 code implementation CVPR 2021 Zhengyuan Yang, Yijuan Lu, JianFeng Wang, Xi Yin, Dinei Florencio, Lijuan Wang, Cha Zhang, Lei Zhang, Jiebo Luo

Due to this aligned representation learning, even pre-trained on the same downstream task dataset, TAP already boosts the absolute accuracy on the TextVQA dataset by +5. 4%, compared with a non-TAP baseline.

Caption Generation Language Modelling +5

A Single Stream Network for Robust and Real-time RGB-D Salient Object Detection

1 code implementation ECCV 2020 Xiaoqi Zhao, Lihe Zhang, Youwei Pang, Huchuan Lu, Lei Zhang

In this work, we design a single stream network to directly use the depth map to guide early fusion and middle fusion between RGB and depth, which saves the feature encoder of the depth stream and achieves a lightweight and real-time model.

object-detection RGB-D Salient Object Detection +3

Learning Domain Adaptive Object Detection with Probabilistic Teacher

2 code implementations13 Jun 2022 Meilin Chen, WeiJie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Yunfeng Yan, Donglian Qi, Yueting Zhuang, Di Xie, ShiLiang Pu

In addition, we conduct anchor adaptation in parallel with localization adaptation, since anchor can be regarded as a learnable parameter.

Object object-detection +1

Point2Mask: Point-supervised Panoptic Segmentation via Optimal Transport

1 code implementation ICCV 2023 Wentong Li, Yuqian Yuan, Song Wang, Jianke Zhu, Jianshu Li, Jian Liu, Lei Zhang

Weakly-supervised image segmentation has recently attracted increasing research attentions, aiming to avoid the expensive pixel-wise labeling.

Image Segmentation Panoptic Segmentation

MomentDiff: Generative Video Moment Retrieval from Random to Real

1 code implementation NeurIPS 2023 Pandeng Li, Chen-Wei Xie, Hongtao Xie, Liming Zhao, Lei Zhang, Yun Zheng, Deli Zhao, Yongdong Zhang

Video moment retrieval pursues an efficient and generalized solution to identify the specific temporal segments within an untrimmed video that correspond to a given language description.

Moment Retrieval Retrieval

Learning a Wavelet-like Auto-Encoder to Accelerate Deep Neural Networks

2 code implementations20 Dec 2017 Tianshui Chen, Liang Lin, WangMeng Zuo, Xiaonan Luo, Lei Zhang

In this work, aiming at a general and comprehensive way for neural network acceleration, we develop a Wavelet-like Auto-Encoder (WAE) that decomposes the original input image into two low-resolution channels (sub-images) and incorporate the WAE into the classification neural networks for joint training.

Classification General Classification +1

Neural Interactive Keypoint Detection

1 code implementation ICCV 2023 Jie Yang, Ailing Zeng, Feng Li, Shilong Liu, Ruimao Zhang, Lei Zhang

Click-Pose explores how user feedback can cooperate with a neural keypoint detector to correct the predicted keypoints in an interactive way for a faster and more effective annotation process.

Keypoint Detection

A Benchmark for Chinese-English Scene Text Image Super-resolution

1 code implementation ICCV 2023 jianqi ma, Zhetong Liang, Wangmeng Xiang, Xi Yang, Lei Zhang

Scene Text Image Super-resolution (STISR) aims to recover high-resolution (HR) scene text images with visually pleasant and readable text content from the given low-resolution (LR) input.

Image Super-Resolution

Towards Effective Multiple-in-One Image Restoration: A Sequential and Prompt Learning Strategy

1 code implementation7 Jan 2024 Xiangtao Kong, Chao Dong, Lei Zhang

While single task image restoration (IR) has achieved significant successes, it remains a challenging issue to train a single model which can tackle multiple IR tasks.

Image Restoration

Sharpness-Aware Gradient Matching for Domain Generalization

1 code implementation CVPR 2023 Pengfei Wang, Zhaoxiang Zhang, Zhen Lei, Lei Zhang

In this paper, we present two conditions to ensure that the model could converge to a flat minimum with a small loss, and present an algorithm, named Sharpness-Aware Gradient Matching (SAGM), to meet the two conditions for improving model generalization capability.

Domain Generalization

MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks

1 code implementation28 Apr 2023 Lei Zhang, Yuge Zhang, Kan Ren, Dongsheng Li, Yuqing Yang

In contrast, though human engineers have the incredible ability to understand tasks and reason about solutions, their experience and knowledge are often sparse and difficult to utilize by quantitative approaches.

AutoML

When Unsupervised Domain Adaptation Meets Tensor Representations

1 code implementation ICCV 2017 Hao Lu, Lei Zhang, Zhiguo Cao, Wei Wei, Ke Xian, Chunhua Shen, Anton Van Den Hengel

Domain adaption (DA) allows machine learning methods trained on data sampled from one distribution to be applied to data sampled from another.

Unsupervised Domain Adaptation

Label-efficient Segmentation via Affinity Propagation

1 code implementation NeurIPS 2023 Wentong Li, Yuqian Yuan, Song Wang, Wenyu Liu, Dongqi Tang, Jian Liu, Jianke Zhu, Lei Zhang

In this work, we formulate the affinity modeling as an affinity propagation process, and propose a local and a global pairwise affinity terms to generate accurate soft pseudo labels.

Box-supervised Instance Segmentation Segmentation +2

MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences

1 code implementation CVPR 2023 Chenhang He, Ruihuang Li, Yabin Zhang, Shuai Li, Lei Zhang

Current top-performing multi-frame detectors mostly follow a Detect-and-Fuse framework, which extracts features from each frame of the sequence and fuses them to detect the objects in the current frame.

3D Object Detection Autonomous Driving +1

Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution

1 code implementation1 Dec 2023 Xi Yang, Chenhang He, jianqi ma, Lei Zhang

To ensure the content consistency among adjacent frames, we exploit the temporal dynamics in LR videos to guide the diffusion process by optimizing the latent sampling path with a motion-guided loss, ensuring that the generated HR video maintains a coherent and continuous visual flow.

Image Restoration Video Super-Resolution

PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation

1 code implementation23 Jan 2024 Zhaozhi Xie, Bochen Guan, Weihao Jiang, Muyang Yi, Yue Ding, Hongtao Lu, Lei Zhang

In this paper, we introduce a novel prompt-driven adapter into SAM, namely Prompt Adapter Segment Anything Model (PA-SAM), aiming to enhance the segmentation mask quality of the original SAM.

Image Segmentation Segmentation +1

DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding

1 code implementation28 Nov 2022 Shilong Liu, Yaoyuan Liang, Feng Li, Shijia Huang, Hao Zhang, Hang Su, Jun Zhu, Lei Zhang

As phrase extraction can be regarded as a $1$D text segmentation problem, we formulate PEG as a dual detection problem and propose a novel DQ-DETR model, which introduces dual queries to probe different features from image and text for object prediction and phrase mask prediction.

object-detection Object Detection +4

Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition

1 code implementation27 Jul 2022 Wangmeng Xiang, Chao Li, Biao Wang, Xihan Wei, Xian-Sheng Hua, Lei Zhang

For 3D video-based tasks such as action recognition, however, directly applying spatiotemporal transformers on video data will bring heavy computation and memory burdens due to the largely increased number of patches and the quadratic complexity of self-attention computation.

Action Classification Action Recognition

Joint HDR Denoising and Fusion: A Real-World Mobile HDR Image Dataset

1 code implementation CVPR 2023 Shuaizheng Liu, Xindong Zhang, Lingchen Sun, Zhetong Liang, Hui Zeng, Lei Zhang

In this work, we develop, for the first time to our best knowledge, an HDR image dataset by using mobile phone cameras, namely Mobile-HDR dataset.

Denoising

Neural Architecture Search With Representation Mutual Information

1 code implementation CVPR 2022 Xiawu Zheng, Xiang Fei, Lei Zhang, Chenglin Wu, Fei Chao, Jianzhuang Liu, Wei Zeng, Yonghong Tian, Rongrong Ji

Building upon RMI, we further propose a new search algorithm termed RMI-NAS, facilitating with a theorem to guarantee the global optimal of the searched architecture.

Neural Architecture Search

WSOD^2: Learning Bottom-up and Top-down Objectness Distillation for Weakly-supervised Object Detection

1 code implementation11 Sep 2019 Zhaoyang Zeng, Bei Liu, Jianlong Fu, Hongyang Chao, Lei Zhang

We study on weakly-supervised object detection (WSOD) which plays a vital role in relieving human involvement from object-level annotations.

Object object-detection +3

Conditional Directed Graph Convolution for 3D Human Pose Estimation

1 code implementation16 Jul 2021 WenBo Hu, Changgong Zhang, Fangneng Zhan, Lei Zhang, Tien-Tsin Wong

Based on this representation, we further propose a spatial-temporal conditional directed graph convolution to leverage varying non-local dependence for different poses by conditioning the graph topology on input poses.

3D Human Pose Estimation

One Shot Learning as Instruction Data Prospector for Large Language Models

1 code implementation16 Dec 2023 Yunshui Li, Binyuan Hui, Xiaobo Xia, Jiaxi Yang, Min Yang, Lei Zhang, Shuzheng Si, Junhao Liu, Tongliang Liu, Fei Huang, Yongbin Li

Nuggets assesses the potential of individual instruction examples to act as effective one shot examples, thereby identifying those that can significantly enhance diverse task performance.

One-Shot Learning

Learning Symmetry Consistent Deep CNNs for Face Completion

1 code implementation19 Dec 2018 Xiaoming Li, Ming Liu, Jieru Zhu, WangMeng Zuo, Meng Wang, Guosheng Hu, Lei Zhang

As for missing pixels on both of half-faces, we present a generative reconstruction subnet together with a perceptual symmetry loss to enforce symmetry consistency of recovered structures.

Face Recognition Facial Inpainting

One-to-Few Label Assignment for End-to-End Dense Detection

1 code implementation CVPR 2023 Shuai Li, Minghan Li, Ruihuang Li, Chenhang He, Lei Zhang

The positive and negative weights of these soft anchors are dynamically adjusted during training so that they can contribute more to ``representation learning'' in the early training stage, and contribute more to ``duplicated prediction removal'' in the later stage.

Representation Learning

DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training

1 code implementation CVPR 2023 Yihao Chen, Xianbiao Qi, Jianan Wang, Lei Zhang

In this way, we can reduce the GPU memory consumption of contrastive loss computation from $\bigO(B^2)$ to $\bigO(\frac{B^2}{N})$, where $B$ and $N$ are the batch size and the number of GPUs used for training.

Contrastive Learning

Attention Diversification for Domain Generalization

1 code implementation9 Oct 2022 Rang Meng, Xianfeng Li, WeiJie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Mingli Song, Di Xie, ShiLiang Pu

Under this guidance, a novel Attention Diversification framework is proposed, in which Intra-Model and Inter-Model Attention Diversification Regularization are collaborated to reassign appropriate attention to diverse task-related features.

Domain Generalization

Simultaneous Fidelity and Regularization Learning for Image Restoration

1 code implementation12 Apr 2018 Dongwei Ren, WangMeng Zuo, David Zhang, Lei Zhang, Ming-Hsuan Yang

For blind deconvolution, as estimation error of blur kernel is usually introduced, the subsequent non-blind deconvolution process does not restore the latent image well.

Denoising Image Deconvolution +1

Joint Denoising and Demosaicking with Green Channel Prior for Real-world Burst Images

1 code implementation25 Jan 2021 Shi Guo, Zhetong Liang, Lei Zhang

Considering the fact that the green channel has twice the sampling rate and better quality than the red and blue channels in CFA raw data, we propose to use this green channel prior (GCP) to build a GCP-Net for the JDD-B task.

Demosaicking Denoising +1

Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer

1 code implementation ECCV 2020 Yuanyi Zhong, Jian-Feng Wang, Jian Peng, Lei Zhang

In this paper, we propose an effective knowledge transfer framework to boost the weakly supervised object detection accuracy with the help of an external fully-annotated source dataset, whose categories may not overlap with the target domain.

Object object-detection +2

Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model

1 code implementation20 May 2023 Jie Yang, Bingliang Li, Fengyu Yang, Ailing Zeng, Lei Zhang, Ruimao Zhang

Extensive experiments demonstrate that DiffHOI significantly outperforms the state-of-the-art in regular detection (i. e., 41. 50 mAP) and zero-shot detection.

Ranked #2 on Zero-Shot Human-Object Interaction Detection on HICO-DET (using extra training data)

Human-Object Interaction Detection Zero-Shot Human-Object Interaction Detection

Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

1 code implementation CVPR 2021 Minghan Li, Shuai Li, Lida Li, Lei Zhang

To further explore temporal correlation among video frames, we aggregate a temporal fusion module to infer instance masks from each frame to its adjacent frames, which helps our framework to handle challenging videos such as motion blur, partial occlusion and unusual object-to-camera poses.

Instance Segmentation Segmentation +3

ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention

1 code implementation1 Jan 2024 Chenhang He, Ruihuang Li, Guowen Zhang, Lei Zhang

Window-based transformers have demonstrated strong ability in large-scale point cloud understanding by capturing context-aware representations with affordable attention computation in a more localized manner.

Blocking

TMP: Temporal Motion Propagation for Online Video Super-Resolution

1 code implementation15 Dec 2023 Zhengqiang Zhang, Ruihuang Li, Shi Guo, Yang Cao, Lei Zhang

Online video super-resolution (online-VSR) highly relies on an effective alignment module to aggregate temporal information, while the strict latency requirement makes accurate and efficient alignment very challenging.

Video Super-Resolution

PMFSNet: Polarized Multi-scale Feature Self-attention Network For Lightweight Medical Image Segmentation

1 code implementation15 Jan 2024 Jiahui Zhong, Wenhong Tian, Yuanlun Xie, Zhijia Liu, Jie Ou, Taoran Tian, Lei Zhang

In this work, we propose PMFSNet, a novel medical imaging segmentation model that effectively balances global and local feature processing while avoiding the computational redundancy typical in larger models.

Image Segmentation Medical Image Segmentation +2

REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning

1 code implementation IJCNLP 2019 Ming Jiang, Junjie Hu, Qiuyuan Huang, Lei Zhang, Jana Diesner, Jianfeng Gao

In this study, we present a fine-grained evaluation method REO for automatically measuring the performance of image captioning systems.

Image Captioning

Unsupervised Adaptation Learning for Hyperspectral Imagery Super-Resolution

1 code implementation CVPR 2020 Lei Zhang, Jiangtao Nie, Wei Wei, Yanning Zhang, Shengcai Liao, Ling Shao

Following this idea, we develop a two-stage SR network that leverages two consecutive modules: a fusion module and an adaptation module, to recover the latent HSI in a coarse-to-fine scheme.

Super-Resolution

CORE: Cooperative Reconstruction for Multi-Agent Perception

1 code implementation ICCV 2023 Binglu Wang, Lei Zhang, Zhaozhong Wang, Yongqiang Zhao, Tianfei Zhou

This paper presents CORE, a conceptually simple, effective and communication-efficient model for multi-agent cooperative perception.

3D Object Detection object-detection +1

LipsFormer: Introducing Lipschitz Continuity to Vision Transformers

1 code implementation19 Apr 2023 Xianbiao Qi, Jianan Wang, Yihao Chen, Yukai Shi, Lei Zhang

In contrast to previous practical tricks that address training instability by learning rate warmup, layer normalization, attention formulation, and weight initialization, we show that Lipschitz continuity is a more essential property to ensure training stability.

SwinFuse: A Residual Swin Transformer Fusion Network for Infrared and Visible Images

1 code implementation25 Apr 2022 Zhishe Wang, Yanlin Chen, Wenyu Shao, Hui Li, Lei Zhang

The existing deep learning fusion methods mainly concentrate on the convolutional neural networks, and few attempts are made with transformer.

Computational Efficiency

Hashing-based Non-Maximum Suppression for Crowded Object Detection

1 code implementation22 May 2020 Jianfeng Wang, Xi Yin, Lijuan Wang, Lei Zhang

Considering the intersection-over-union (IoU) as the metric, we propose a simple yet effective hashing algorithm, named IoUHash, which guarantees that the boxes within the same cell are close enough by a lower IoU bound.

object-detection Object Detection +1

A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment

1 code implementation16 Mar 2024 Tianhe Wu, Kede Ma, Jie Liang, Yujiu Yang, Lei Zhang

While Multimodal Large Language Models (MLLMs) have experienced significant advancement on visual understanding and reasoning, their potentials to serve as powerful, flexible, interpretable, and text-driven models for Image Quality Assessment (IQA) remains largely unexplored.

Image Quality Assessment

Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation

1 code implementation ICCV 2023 Yichen Yuan, Yifan Wang, Lijun Wang, Xiaoqi Zhao, Huchuan Lu, Yu Wang, Weibo Su, Lei Zhang

Recent leading zero-shot video object segmentation (ZVOS) works devote to integrating appearance and motion information by elaborately designing feature fusion modules and identically applying them in multiple feature stages.

Semantic Segmentation Video Object Segmentation +2

Masked Surfel Prediction for Self-Supervised Point Cloud Learning

1 code implementation7 Jul 2022 Yabin Zhang, Jiehong Lin, Chenhang He, Yongwei Chen, Kui Jia, Lei Zhang

In this work, we make the first attempt, to the best of our knowledge, to consider the local geometry information explicitly into the masked auto-encoding, and propose a novel Masked Surfel Prediction (MaskSurf) method.

Point cloud reconstruction Self-Supervised Learning

Spatial-Temporal Federated Learning for Lifelong Person Re-identification on Distributed Edges

1 code implementation24 Jul 2022 Lei Zhang, Guanyu Gao, Huaizheng Zhang

Then, the learnt knowledge from edge clients will be aggregated by centralized parameter server, where the knowledge will be selectively and attentively distilled from spatial- and temporal-dimension with carefully designed mechanisms.

Continual Learning Federated Learning +2

Multi-view Adversarial Discriminator: Mine the Non-causal Factors for Object Detection in Unseen Domains

1 code implementation CVPR 2023 Mingjun Xu, Lingyun Qin, WeiJie Chen, ShiLiang Pu, Lei Zhang

In this work, we present an idea to remove non-causal factors from common features by multi-view adversarial training on source domains, because we observe that such insignificant non-causal factors may still be significant in other latent spaces (views) due to the multi-mode structure of data.

Domain Generalization object-detection +1

Directional Deep Embedding and Appearance Learning for Fast Video Object Segmentation

1 code implementation17 Feb 2020 Yingjie Yin, De Xu, Xingang Wang, Lei Zhang

We propose a directional deep embedding and appearance learning (DDEAL) method, which is free of the online fine-tuning process, for fast VOS.

One-shot visual object segmentation Segmentation +2

Unfolded Deep Kernel Estimation for Blind Image Super-resolution

1 code implementation10 Mar 2022 Hongyi Zheng, Hongwei Yong, Lei Zhang

Nonetheless, the existing deep unfolding methods cannot explicitly solve the data term of the unfolding objective function, limiting their capability in blur kernel estimation.

Image Super-Resolution

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

1 code implementation3 Apr 2018 Dianqi Li, Qiuyuan Huang, Xiaodong He, Lei Zhang, Ming-Ting Sun

By contrasting with human-written captions and image-mismatched captions, the caption generator effectively exploits the inherent characteristics of human languages, and generates more discriminative captions.

Generative Adversarial Network

Virtual Fully-Connected Layer: Training a Large-Scale Face Recognition Dataset With Limited Computational Resources

1 code implementation CVPR 2021 Pengyu Li, Biao Wang, Lei Zhang

This is because the classification paradigm needs to train a fully connected layer as the category classifier, and its parameters will be in the hundreds of millions if the training dataset contains millions of identities.

Face Recognition Metric Learning

Beyond a Video Frame Interpolator: A Space Decoupled Learning Approach to Continuous Image Transition

1 code implementation18 Mar 2022 Tao Yang, Peiran Ren, Xuansong Xie, Xiansheng Hua, Lei Zhang

Most of the existing deep learning based VFI methods adopt off-the-shelf optical flow algorithms to estimate the bidirectional flows and interpolate the missing frames accordingly.

Image Generation Image Morphing +3

Revisiting Prototypical Network for Cross Domain Few-Shot Learning

1 code implementation CVPR 2023 Fei Zhou, Peng Wang, Lei Zhang, Wei Wei, Yanning Zhang

Prototypical Network is a popular few-shot solver that aims at establishing a feature metric generalizable to novel few-shot classification (FSC) tasks using deep neural networks.

cross-domain few-shot learning Knowledge Distillation

Probability Weighted Compact Feature for Domain Adaptive Retrieval

1 code implementation CVPR 2020 Fuxiang Huang, Lei Zhang, Yang Yang, Xichuan Zhou

Most of the existing image retrieval methods only focus on single-domain retrieval, which assumes that the distributions of retrieval databases and queries are similar.

Image Retrieval Quantization +1

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting

1 code implementation ICCV 2021 Binghui Chen, Zhaoyi Yan, Ke Li, Pengyu Li, Biao Wang, WangMeng Zuo, Lei Zhang

In crowd counting, due to the problem of laborious labelling, it is perceived intractability of collecting a new large-scale dataset which has plentiful images with large diversity in density, scene, etc.

Crowd Counting

Benchmark Dataset and Effective Inter-Frame Alignment for Real-World Video Super-Resolution

1 code implementation10 Dec 2022 Ruohao Wang, Xiaohui Liu, Zhilu Zhang, Xiaohe Wu, Chun-Mei Feng, Lei Zhang, WangMeng Zuo

On the other hand, alignment algorithms in existing VSR methods perform poorly for real-world videos, leading to unsatisfactory results.

Optical Flow Estimation Video Super-Resolution

IVAC-P2L: Leveraging Irregular Repetition Priors for Improving Video Action Counting

1 code implementation18 Mar 2024 Hang Wang, Zhi-Qi Cheng, Youtian Du, Lei Zhang

Our research addresses the shortfall by introducing a novel approach to VAC, called Irregular Video Action Counting (IVAC).

Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

1 code implementation26 Mar 2024 Yabin Zhang, Wenjie Zhu, Hui Tang, Zhiyuan Ma, Kaiyang Zhou, Lei Zhang

In this paper, we introduce a versatile adaptation approach that can effectively work under all three settings.

DR-Unet104 for Multimodal MRI brain tumor segmentation

1 code implementation4 Nov 2020 Jordan Colman, Lei Zhang, Wenting Duan, Xujiong Ye

We verified the effect of introducing the regularisation of dropout with small rate (e. g. 0. 2) on the architecture, and found a dropout of 0. 2 improved the overall performance compared to no dropout, or a dropout of 0. 5.

3D Architecture Brain Tumor Segmentation +3

Multi-adversarial Faster-RCNN for Unrestricted Object Detection

1 code implementation ICCV 2019 Zhenwei He, Lei Zhang

Conventional object detection methods essentially suppose that the training and testing data are collected from a restricted target domain with expensive labeling cost.

Domain Adaptation Object +2

Semi-Supervised Domain Generalization with Evolving Intermediate Domain

1 code implementation19 Nov 2021 Luojun Lin, Han Xie, Zhishu Sun, WeiJie Chen, Wenxi Liu, Yuanlong Yu, Lei Zhang

From this perspective, we introduce a novel paradigm of DG, termed as Semi-Supervised Domain Generalization (SSDG), to explore how the labeled and unlabeled source domains can interact, and establish two settings, including the close-set and open-set SSDG.

Domain Generalization Semi-Supervised Domain Generalization

A Survey on Leveraging Pre-trained Generative Adversarial Networks for Image Editing and Restoration

1 code implementation21 Jul 2022 Ming Liu, Yuxiang Wei, Xiaohe Wu, WangMeng Zuo, Lei Zhang

Generative adversarial networks (GANs) have drawn enormous attention due to the simple yet effective training mechanism and superior image generation quality.

Image Generation Image Restoration

Inferring and Leveraging Parts from Object Shape for Improving Semantic Image Synthesis

1 code implementation CVPR 2023 Yuxiang Wei, Zhilong Ji, Xiaohe Wu, Jinfeng Bai, Lei Zhang, WangMeng Zuo

Despite the progress in semantic image synthesis, it remains a challenging problem to generate photo-realistic parts from input semantic map.

Image Generation Object

Parameter Exchange for Robust Dynamic Domain Generalization

1 code implementation23 Nov 2023 Luojun Lin, Zhifeng Shen, Zhishu Sun, Yuanlong Yu, Lei Zhang, WeiJie Chen

The parameters of dynamic networks can be decoupled into a static and a dynamic component, which are designed to learn domain-invariant and domain-specific features, respectively.

Disentanglement Domain Generalization

Self-Supervised Video Desmoking for Laparoscopic Surgery

1 code implementation17 Mar 2024 Renlong Wu, Zhilu Zhang, Shuohao Zhang, Longfei Gou, Haobin Chen, Lei Zhang, Hao Chen, WangMeng Zuo

On the other hand, in order to enhance the desmoking performance, we further feed the valuable information from PS frame into models, where a masking strategy and a regularization term are presented to avoid trivial solutions.

Towards Efficient Data Free Black-Box Adversarial Attack

1 code implementation CVPR 2022 Jie Zhang, Bo Li, Jianghe Xu, Shuang Wu, Shouhong Ding, Lei Zhang, Chao Wu

The proposed method can efficiently imitate the target model through a small number of queries and achieve high attack success rate.

Adversarial Attack

Automatic Network Pruning via Hilbert-Schmidt Independence Criterion Lasso under Information Bottleneck Principle

1 code implementation ICCV 2023 Song Guo, Lei Zhang, Xiawu Zheng, Yan Wang, Yuchao Li, Fei Chao, Chenglin Wu, Shengchuan Zhang, Rongrong Ji

In this paper, we try to solve this problem by introducing a principled and unified framework based on Information Bottleneck (IB) theory, which further guides us to an automatic pruning approach.

Network Pruning

Remove Cosine Window from Correlation Filter-based Visual Trackers: When and How

1 code implementation16 May 2019 Feng Li, Xiaohe Wu, WangMeng Zuo, David Zhang, Lei Zhang

Therefore, we in this paper investigate the feasibility to remove cosine window from CF trackers with spatial regularization.

MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos

1 code implementation CVPR 2023 Minghan Li, Shuai Li, Wangmeng Xiang, Lei Zhang

The proposed MDQE is the first VIS method with per-clip input that achieves state-of-the-art results on challenging videos and competitive performance on simple videos.

Instance Segmentation Semantic Segmentation +1

Optimization-Free Test-Time Adaptation for Cross-Person Activity Recognition

1 code implementation28 Oct 2023 Shuoyuan Wang, Jindong Wang, Huajun Xi, Bob Zhang, Lei Zhang, Hongxin Wei

However, the high computational cost of optimization-based TTA algorithms makes it intractable to run on resource-constrained edge devices.

Computational Efficiency Human Activity Recognition +2

Toward Accurate and Temporally Consistent Video Restoration from Raw Data

1 code implementation25 Dec 2023 Shi Guo, jianqi ma, Xi Yang, Zhengqiang Zhang, Lei Zhang

Extensive experiments demonstrate the leading VJDD performance of our method in term of restoration accuracy, perceptual quality and temporal consistency.

Demosaicking Denoising +2

Cannot find the paper you are looking for? You can Submit a new open access paper.