14 code implementations • ICCV 2021 • Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang
We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.
Ranked #3 on Image Classification on Flowers-102 (using extra training data)
7 code implementations • 9 Mar 2023 • Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang
To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion.
Ranked #1 on Zero-Shot Object Detection on MSCOCO
3 code implementations • CVPR 2021 • Xiyang Dai, Yinpeng Chen, Bin Xiao, Dongdong Chen, Mengchen Liu, Lu Yuan, Lei Zhang
In this paper, we present a novel dynamic head framework to unify object detection heads with attentions.
Ranked #3 on Object Detection on COCO 2017 val (AP75 metric)
11 code implementations • 27 Jul 2016 • Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, Jianfeng Gao
In this paper, we design a benchmark task and provide the associated datasets for recognizing face images and link them to corresponding entity keys in a knowledge base.
15 code implementations • 7 Mar 2022 • Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum
Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results.
Ranked #1 on Real-Time Object Detection on COCO 2017 val
1 code implementation • 25 Jan 2024 • Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang
We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM).
20 code implementations • CVPR 2020 • Bowen Cheng, Bin Xiao, Jingdong Wang, Honghui Shi, Thomas S. Huang, Lei Zhang
HigherHRNet even surpasses all top-down methods on CrowdPose test (67. 6% AP), suggesting its robustness in crowded scene.
Ranked #2 on Pose Estimation on UAV-Human
15 code implementations • CVPR 2021 • Changqian Yu, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, Jingdong Wang
We introduce a lightweight unit, conditional channel weighting, to replace costly pointwise (1x1) convolutions in shuffle blocks.
Ranked #33 on Pose Estimation on COCO test-dev (using extra training data)
1 code implementation • International Conference on Computer Vision Workshops 2019 • Dawei Du, Pengfei Zhu, Longyin Wen, Xiao Bian, Haibin Lin, QinGhua Hu, Tao Peng, Jiayu Zheng, Xinyao Wang, Yue Zhang, Liefeng Bo, Hailin Shi, Rui Zhu, Aashish Kumar, Aijin Li, Almaz Zinollayev, Anuar Askergaliyev, Arne Schumann, Binjie Mao, Byeongwon Lee, Chang Liu, Changrui Chen, Chunhong Pan, Chunlei Huo, Da Yu, Dechun Cong, Dening Zeng, Dheeraj Reddy Pailla, Di Li, Dong Wang, Donghyeon Cho, Dongyu Zhang, Furui Bai, George Jose, Guangyu Gao, Guizhong Liu, Haitao Xiong, Hao Qi, Haoran Wang, Heqian Qiu, Hongliang Li, Huchuan Lu, Ildoo Kim, Jaekyum Kim, Jane Shen, Jihoon Lee, Jing Ge, Jingjing Xu, Jingkai Zhou, Jonas Meier, Jun Won Choi, Junhao Hu, Junyi Zhang, Junying Huang, Kaiqi Huang, Keyang Wang, Lars Sommer, Lei Jin, Lei Zhang
Results of 33 object detection algorithms are presented.
9 code implementations • CVPR 2023 • Feng Li, Hao Zhang, Huaizhe xu, Shilong Liu, Lei Zhang, Lionel M. Ni, Heung-Yeung Shum
In this paper we present Mask DINO, a unified object detection and segmentation framework.
Ranked #1 on Panoptic Segmentation on COCO test-dev
1 code implementation • 26 Mar 2023 • Yunjie Ji, Yong Deng, Yan Gong, Yiping Peng, Qiang Niu, Lei Zhang, Baochang Ma, Xiangang Li
However current research rarely studies the impact of different amounts of instruction data on model performance, especially in the real-world use cases.
1 code implementation • EMNLP 2020 • Fanchao Qi, Lei Zhang, Yanhui Yang, Zhiyuan Liu, Maosong Sun
A reverse dictionary takes descriptions of words as input and outputs words semantically matching the input descriptions.
65 code implementations • CVPR 2018 • Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang
Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning.
Ranked #29 on Visual Question Answering (VQA) on VQA v2 test-std
1 code implementation • 17 Sep 2020 • Mark Hamilton, Nick Gonsalves, Christina Lee, Anand Raman, Brendan Walsh, Siddhartha Prasad, Dalitso Banda, Lucy Zhang, Mei Gao, Lei Zhang, William T. Freeman
Deploying Machine Learning (ML) algorithms within databases is a challenge due to the varied computational footprints of modern ML algorithms and the myriad of database technologies each with its own restrictive syntax.
2 code implementations • 15 Dec 2023 • Yuqian Yuan, Wentong Li, Jian Liu, Dongqi Tang, Xinjie Luo, Chi Qin, Lei Zhang, Jianke Zhu
In this paper, we propose Osprey, a mask-text instruction tuning approach, to extend MLLMs by incorporating fine-grained mask regions into language instruction, aiming at achieving pixel-wise visual understanding.
3 code implementations • 15 Sep 2020 • Kai Zhang, Martin Danelljan, Yawei Li, Radu Timofte, Jie Liu, Jie Tang, Gangshan Wu, Yu Zhu, Xiangyu He, Wenjie Xu, Chenghua Li, Cong Leng, Jian Cheng, Guangyang Wu, Wenyi Wang, Xiaohong Liu, Hengyuan Zhao, Xiangtao Kong, Jingwen He, Yu Qiao, Chao Dong, Maitreya Suin, Kuldeep Purohit, A. N. Rajagopalan, Xiaochuan Li, Zhiqiang Lang, Jiangtao Nie, Wei Wei, Lei Zhang, Abdul Muqeet, Jiwon Hwang, Subin Yang, JungHeum Kang, Sung-Ho Bae, Yongwoo Kim, Geun-Woo Jeon, Jun-Ho Choi, Jun-Hyuk Kim, Jong-Seok Lee, Steven Marty, Eric Marty, Dongliang Xiong, Siang Chen, Lin Zha, Jiande Jiang, Xinbo Gao, Wen Lu, Haicheng Wang, Vineeth Bhaskara, Alex Levinshtein, Stavros Tsogkas, Allan Jepson, Xiangzhen Kong, Tongtong Zhao, Shanshan Zhao, Hrishikesh P. S, Densen Puthussery, Jiji C. V, Nan Nan, Shuai Liu, Jie Cai, Zibo Meng, Jiaming Ding, Chiu Man Ho, Xuehui Wang, Qiong Yan, Yuzhi Zhao, Long Chen, Jiangtao Zhang, Xiaotong Luo, Liang Chen, Yanyun Qu, Long Sun, Wenhao Wang, Zhenbing Liu, Rushi Lan, Rao Muhammad Umer, Christian Micheloni
This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results.
7 code implementations • ECCV 2020 • Hongwei Yong, Jianqiang Huang, Xian-Sheng Hua, Lei Zhang
It has been shown that using the first and second order statistics (e. g., mean and variance) to perform Z-score standardization on network activations or weight vectors, such as batch normalization (BN) and weight standardization (WS), can improve the training performance.
2 code implementations • 10 Mar 2023 • Xinyu Huang, Youcai Zhang, Jinyu Ma, Weiwei Tian, Rui Feng, Yuejie Zhang, Yaqian Li, Yandong Guo, Lei Zhang
This paper presents Tag2Text, a vision language pre-training (VLP) framework, which introduces image tagging into vision-language models to guide the learning of visual-linguistic features.
2 code implementations • 6 Jun 2023 • Youcai Zhang, Xinyu Huang, Jinyu Ma, Zhaoyang Li, Zhaochuan Luo, Yanchun Xie, Yuzhuo Qin, Tong Luo, Yaqian Li, Shilong Liu, Yandong Guo, Lei Zhang
We are releasing the RAM at \url{https://recognize-anything. github. io/} to foster the advancements of large models in computer vision.
2 code implementations • 23 Oct 2023 • Xinyu Huang, Yi-Jie Huang, Youcai Zhang, Weiwei Tian, Rui Feng, Yuejie Zhang, Yanchun Xie, Yaqian Li, Lei Zhang
Specifically, for predefined commonly used tag categories, RAM++ showcases 10. 2 mAP and 15. 4 mAP enhancements over CLIP on OpenImages and ImageNet.
3 code implementations • CVPR 2021 • Tao Yang, Peiran Ren, Xuansong Xie, Lei Zhang
The proposed GAN prior embedded network (GPEN) is easy-to-implement, and it can generate visually photo-realistic results.
Ranked #1 on Blind Face Restoration on CelebA-HQ
2 code implementations • 21 Sep 2021 • Yicheng Wu, ZongYuan Ge, Donghao Zhang, Minfeng Xu, Lei Zhang, Yong Xia, Jianfei Cai
In this paper, we propose a novel mutual consistency network (MC-Net+) to effectively exploit the unlabeled data for semi-supervised medical image segmentation.
16 code implementations • CVPR 2022 • Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang
Our method is universal and can be easily plugged into any DETR-like methods by adding dozens of lines of code to achieve a remarkable improvement.
2 code implementations • CVPR 2022 • Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq-Neng Hwang, Kai-Wei Chang, Jianfeng Gao
The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich.
Ranked #1 on 2D Object Detection on RF100
1 code implementation • 10 Jul 2023 • Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao
In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.
3 code implementations • 22 Nov 2023 • Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao
In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain.
1 code implementation • 21 Mar 2024 • Qing Jiang, Feng Li, Zhaoyang Zeng, Tianhe Ren, Shilong Liu, Lei Zhang
Recognizing the complementary strengths and weaknesses of both text and visual prompts, we introduce T-Rex2 that synergizes both prompts within a single model through contrastive learning.
7 code implementations • ICLR 2022 • Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang
We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR.
Ranked #11 on 2D Object Detection on SARDet-100K
1 code implementation • 12 Jun 2023 • Tianhe Ren, Shilong Liu, Feng Li, Hao Zhang, Ailing Zeng, Jie Yang, Xingyu Liao, Ding Jia, Hongyang Li, He Cao, Jianan Wang, Zhaoyang Zeng, Xianbiao Qi, Yuhui Yuan, Jianwei Yang, Lei Zhang
To address this issue, we develop a unified, highly modular, and lightweight codebase called detrex, which supports a majority of the mainstream DETR-based instance recognition algorithms, covering various fundamental tasks, including object detection, segmentation, and pose estimation.
4 code implementations • 26 May 2022 • Ailing Zeng, Muxi Chen, Lei Zhang, Qiang Xu
Recently, there has been a surge of Transformer-based solutions for the long-term time series forecasting (LTSF) task.
Ranked #1 on Time Series Forecasting on ETTh1 (96) Univariate
20 code implementations • 13 Aug 2016 • Kai Zhang, WangMeng Zuo, Yunjin Chen, Deyu Meng, Lei Zhang
Discriminative model learning for image denoising has been recently attracting considerable attentions due to its favorable denoising performance.
2 code implementations • ICCV 2023 • Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang
We present OpenSeeD, a simple Open-vocabulary Segmentation and Detection framework that jointly learns from different segmentation and detection datasets.
Ranked #2 on Instance Segmentation on ADE20K val (using extra training data)
3 code implementations • 24 Sep 2019 • Luowei Zhou, Hamid Palangi, Lei Zhang, Houdong Hu, Jason J. Corso, Jianfeng Gao
The model is unified in that (1) it can be fine-tuned for either vision-language generation (e. g., image captioning) or understanding (e. g., visual question answering) tasks, and (2) it uses a shared multi-layer transformer network for both encoding and decoding, which differs from many existing methods where the encoder and decoder are implemented using separate models.
Ranked #1 on Image Captioning on Flickr30k Captions test
4 code implementations • ECCV 2020 • Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiao-Wei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, Jianfeng Gao
Large-scale pre-training methods of learning cross-modal representations on image-text pairs are becoming popular for vision-language tasks.
Ranked #1 on Image Retrieval on MS COCO (Recall@10 metric)
2 code implementations • CVPR 2023 • Lei Zhang, Jie Zhang, Bowen Lei, Subhabrata Mukherjee, Xiang Pan, Bo Zhao, Caiwen Ding, Yao Li, Dongkuan Xu
Dataset Distillation (DD), a newly emerging field, aims at generating much smaller but efficient synthetic training datasets from large ones.
7 code implementations • CVPR 2021 • Pengchuan Zhang, Xiujun Li, Xiaowei Hu, Jianwei Yang, Lei Zhang, Lijuan Wang, Yejin Choi, Jianfeng Gao
In our experiments we feed the visual features generated by the new object detection model into a Transformer-based VL fusion model \oscar \cite{li2020oscar}, and utilize an improved approach \short\ to pre-train the VL model and fine-tune it on a wide range of downstream VL tasks.
Ranked #2 on Image-text matching on CommercialAdsDataset
1 code implementation • ECCV 2020 • Xiaoming Li, Chaofeng Chen, Shangchen Zhou, Xianhui Lin, WangMeng Zuo, Lei Zhang
Next, with the degraded input, we match and select the most similar component features from their corresponding dictionaries and transfer the high-quality details to the input via the proposed dictionary feature transfer (DFT) block.
1 code implementation • CVPR 2019 • Kai Zhang, WangMeng Zuo, Lei Zhang
In this paper, we propose a principled formulation and framework by extending bicubic degradation based deep SISR with the help of plug-and-play framework to handle LR images with arbitrary blur kernels.
1 code implementation • 28 Aug 2023 • Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, Lei Zhang
Diffusion models have demonstrated impressive performance in various image generation, editing, enhancement and translation tasks.
1 code implementation • 30 Sep 2020 • Hui Zeng, Jianrui Cai, Lida Li, Zisheng Cao, Lei Zhang
The small CNN works on the down-sampled version of the input image to predict content-dependent weights to fuse the multiple basis 3D LUTs into an image-adaptive one, which is employed to transform the color and tone of source images efficiently.
Ranked #5 on Image Enhancement on MIT-Adobe 5k (SSIM on proRGB metric)
2 code implementations • 25 Apr 2023 • Tianhe Ren, Jianwei Yang, Shilong Liu, Ailing Zeng, Feng Li, Hao Zhang, Hongyang Li, Zhaoyang Zeng, Lei Zhang
This work presents Focal-Stable-DINO, a strong and reproducible object detection model which achieves 64. 6 AP on COCO val2017 and 64. 8 AP on COCO test-dev using only 700M parameters without any test time augmentation.
Ranked #5 on Object Detection on COCO minival (using extra training data)
1 code implementation • 9 Nov 2023 • Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang, Jianfeng Gao, Chunyuan Li
LLaVA-Plus is a general-purpose multimodal assistant that expands the capabilities of large multimodal models.
Ranked #1 on LMM real-life tasks on Leaderboard
4 code implementations • 31 Aug 2020 • Kai Zhang, Yawei Li, WangMeng Zuo, Lei Zhang, Luc van Gool, Radu Timofte
Recent works on plug-and-play image restoration have shown that a denoiser can implicitly serve as the image prior for model-based methods to solve many inverse problems.
2 code implementations • CVPR 2017 • Kai Zhang, WangMeng Zuo, Shuhang Gu, Lei Zhang
Recent works have revealed that, with the aid of variable splitting techniques, denoiser prior can be plugged in as a modular part of model-based optimization methods to solve other inverse problems (e. g., deblurring).
Ranked #1 on Color Image Denoising on BSD68 sigma5
1 code implementation • CVPR 2023 • Jing Lin, Ailing Zeng, Haoqian Wang, Lei Zhang, Yu Li
It is challenging to perform this task with a single network due to resolution issues, i. e., the face and hands are usually located in extremely small regions.
Ranked #3 on 3D Human Pose Estimation on UBody
3 code implementations • CVPR 2019 • Shi Guo, Zifei Yan, Kai Zhang, WangMeng Zuo, Lei Zhang
While deep convolutional neural networks (CNNs) have achieved impressive success in image denoising with additive white Gaussian noise (AWGN), their performance remains limited on real-world noisy photographs.
Ranked #4 on Denoising on Darmstadt Noise Dataset
1 code implementation • ICCV 2023 • Yuxiang Wei, Yabo Zhang, Zhilong Ji, Jinfeng Bai, Lei Zhang, WangMeng Zuo
In addition to the unprecedented ability in imaginary creation, large text-to-image models are expected to take customized concepts in image generation.
1 code implementation • 15 Dec 2021 • Wenyu Liu, Gaofeng Ren, Runsheng Yu, Shi Guo, Jianke Zhu, Lei Zhang
Though deep learning-based object detection methods have achieved promising results on the conventional datasets, it is still challenging to locate objects from the low-quality images captured in adverse weather conditions.
2 code implementations • 4 Jul 2022 • Wenyu Liu, Wentong Li, Jianke Zhu, Miaomiao Cui, Xuansong Xie, Lei Zhang
With DIAL-Filters, we design both unsupervised and supervised frameworks for nighttime driving-scene segmentation, which can be trained in an end-to-end manner.
7 code implementations • 11 Oct 2017 • Kai Zhang, WangMeng Zuo, Lei Zhang
Due to the fast inference and good performance, discriminative learning methods have been widely studied in image denoising.
Ranked #1 on Grayscale Image Denoising on BSD68 sigma75
1 code implementation • NeurIPS 2023 • Jing Lin, Ailing Zeng, Shunlin Lu, Yuanhao Cai, Ruimao Zhang, Haoqian Wang, Lei Zhang
In this paper, we present Motion-X, a large-scale 3D expressive whole-body motion dataset.
1 code implementation • CVPR 2018 • Kai Zhang, WangMeng Zuo, Lei Zhang
Recent years have witnessed the unprecedented success of deep convolutional neural networks (CNNs) in single image super-resolution (SISR).
2 code implementations • 4 Nov 2019 • Kai Zhang, Shuhang Gu, Radu Timofte, Zheng Hui, Xiumei Wang, Xinbo Gao, Dongliang Xiong, Shuai Liu, Ruipeng Gang, Nan Nan, Chenghua Li, Xueyi Zou, Ning Kang, Zhan Wang, Hang Xu, Chaofeng Wang, Zheng Li, Lin-Lin Wang, Jun Shi, Wenyu Sun, Zhiqiang Lang, Jiangtao Nie, Wei Wei, Lei Zhang, Yazhe Niu, Peijin Zhuo, Xiangzhen Kong, Long Sun, Wenhao Wang
The challenge had 3 tracks.
1 code implementation • CVPR 2021 • Jie Liang, Hui Zeng, Lei Zhang
Existing image-to-image translation (I2IT) methods are either constrained to low-resolution images or long inference time due to their heavy computational burden on the convolution of high-resolution feature maps.
Ranked #1 on Photo Retouching on MIT-Adobe 5k (1080p)
3 code implementations • ICCV 2021 • Pengchuan Zhang, Xiyang Dai, Jianwei Yang, Bin Xiao, Lu Yuan, Lei Zhang, Jianfeng Gao
This paper presents a new Vision Transformer (ViT) architecture Multi-Scale Vision Longformer, which significantly enhances the ViT of \cite{dosovitskiy2020image} for encoding high-resolution images using two techniques.
Ranked #45 on Instance Segmentation on COCO minival
2 code implementations • 3 Dec 2022 • Wentong Li, Wenyu Liu, Jianke Zhu, Miaomiao Cui, Risheng Yu, Xiansheng Hua, Lei Zhang
In contrast to fully supervised methods using pixel-wise mask labels, box-supervised instance segmentation takes advantage of simple box annotations, which has recently attracted increasing research attention.
2 code implementations • 22 Jul 2021 • Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, Jun Zhu
The use of Transformer is rooted in the need of extracting local discriminative features adaptively for different labels, which is a strongly desired property due to the existence of multiple objects in one image.
Ranked #1 on Multi-Label Classification on PASCAL VOC 2012
1 code implementation • 27 Jul 2021 • Xiaotian Han, Jianwei Yang, Houdong Hu, Lei Zhang, Jianfeng Gao, Pengchuan Zhang
There is a surge of interest in image scene graph generation (object, attribute and relationship detection) due to the need of building fine-grained image understanding models that go beyond object detection.
1 code implementation • CVPR 2021 • Chaofeng Chen, Xiaoming Li, Lingbo Yang, Xianhui Lin, Lei Zhang, Kwan-Yee K. Wong
Compared with previous networks, the proposed PSFR-GAN makes full use of the semantic (parsing maps) and pixel (LQ images) space information from different scales of input pairs.
Ranked #4 on Blind Face Restoration on CelebA-Test
1 code implementation • 30 Dec 2023 • Lingchen Sun, Rongyuan Wu, Zhengqiang Zhang, Hongwei Yong, Lei Zhang
To improve the stability of diffusion prior-based SR, we propose to employ the diffusion models to refine image structures, while employing the generative adversarial training to enhance image fine details.
1 code implementation • ICCV 2021 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos
This paper aims at addressing the problem of substantial performance degradation at extremely low computational cost (e. g. 5M FLOPs on ImageNet classification).
1 code implementation • 18 Jan 2019 • Fan Yang, Lei Zhang, Sijia Yu, Danil Prokhorov, Xue Mei, Haibin Ling
To demonstrate the superiority and generality of the proposed method, we evaluate the proposed method on five crack datasets and compare it with state-of-the-art crack detection, edge detection, semantic segmentation methods.
1 code implementation • CVPR 2023 • Zhiyuan Ma, Xiangyu Zhu, GuoJun Qi, Zhen Lei, Lei Zhang
In this paper, we propose One-shot Talking face Avatar (OTAvatar), which constructs face avatars by a generalized controllable tri-plane rendering solution so that each personalized avatar can be constructed from only one portrait as the reference.
1 code implementation • CVPR 2019 • Wenbo Li, Pengchuan Zhang, Lei Zhang, Qiuyuan Huang, Xiaodong He, Siwei Lyu, Jianfeng Gao
In this paper, we propose Object-driven Attentive Generative Adversarial Newtorks (Obj-GANs) that allow object-centered text-to-image synthesis for complex scenes.
1 code implementation • 27 Nov 2023 • Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, Lei Zhang
First, we train a degradation-aware prompt extractor, which can generate accurate soft and hard semantic prompts even under strong degradation.
3 code implementations • 4 Mar 2021 • Yicheng Wu, Minfeng Xu, ZongYuan Ge, Jianfei Cai, Lei Zhang
Such mutual consistency encourages the two decoders to have consistent and low-entropy predictions and enables the model to gradually capture generalized features from these unlabeled challenging regions.
3 code implementations • 15 Apr 2019 • Qilong Wang, Jiangtao Xie, WangMeng Zuo, Lei Zhang, Peihua Li
The proposed methods are highly modular, readily plugged into existing deep CNNs.
Ranked #1 on Image Classification on iNaturalist (Top 3 Error metric)
2 code implementations • CVPR 2022 • Jie Liang, Hui Zeng, Lei Zhang
In this paper, we demonstrate that it is possible to train a GAN-based SISR model which can stably generate perceptually realistic details while inhibiting visual artifacts.
1 code implementation • ICCV 2023 • Xuan Ju, Ailing Zeng, Chenchen Zhao, Jianan Wang, Lei Zhang, Qiang Xu
While such a plug-and-play approach is appealing, the inevitable and uncertain conflicts between the original images produced from the frozen SD branch and the given condition incur significant challenges for the learnable branch, which essentially conducts image feature editing for condition enforcement.
1 code implementation • CVPR 2021 • Jie Liang, Hui Zeng, Miaomiao Cui, Xuansong Xie, Lei Zhang
HRP requires that more attention should be paid to human regions, while GLC requires that a group of portrait photos should be retouched to a consistent tone.
1 code implementation • 5 Dec 2023 • Hao Zhang, Hongyang Li, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao, Lei Zhang, Chunyuan Li, Jianwei Yang
To address this issue, we have created GVC data that allows for the combination of grounding and chat capabilities.
1 code implementation • 12 Oct 2023 • Jie Yang, Ailing Zeng, Ruimao Zhang, Lei Zhang
This work proposes a unified framework called UniPose to detect keypoints of any articulated (e. g., human and animal), rigid, and soft objects via visual or textual prompts for fine-grained vision understanding and manipulation.
Ranked #1 on 2D Human Pose Estimation on Human-Art (using extra training data)
2 code implementations • 7 Apr 2018 • Jun Xu, Hui Li, Zhetong Liang, David Zhang, Lei Zhang
In order to promote the study on this problem while implementing the concurrent real-world image denoising datasets, we construct a new benchmark dataset which contains comprehensive real-world noisy images of different natural scenes.
1 code implementation • CVPR 2021 • Dengpan Fu, Dongdong Chen, Jianmin Bao, Hao Yang, Lu Yuan, Lei Zhang, Houqiang Li, Dong Chen
In this paper, we present a large scale unlabeled person re-identification (Re-ID) dataset "LUPerson" and make the first attempt of performing unsupervised pre-training for improving the generalization ability of the learned person Re-ID feature representation.
Ranked #1 on Person Re-Identification on Market-1501 (using extra training data)
2 code implementations • CVPR 2022 • Dengpan Fu, Dongdong Chen, Hao Yang, Jianmin Bao, Lu Yuan, Lei Zhang, Houqiang Li, Fang Wen, Dong Chen
Since theses ID labels automatically derived from tracklets inevitably contain noises, we develop a large-scale Pre-training framework utilizing Noisy Labels (PNL), which consists of three learning modules: supervised Re-ID learning, prototype-based contrastive learning, and label-guided contrastive learning.
Ranked #7 on Person Re-Identification on CUHK03
2 code implementations • NeurIPS 2019 • Zongsheng Yue, Hongwei Yong, Qian Zhao, Lei Zhang, Deyu Meng
On one hand, as other data-driven deep learning methods, our method, namely variational denoising network (VDN), can perform denoising efficiently due to its explicit form of posterior expression.
Ranked #10 on Image Denoising on DND
2 code implementations • 25 Aug 2020 • Zongsheng Yue, Hongwei Yong, Qian Zhao, Lei Zhang, Deyu Meng, Kwan-Yen K. Wong
In this proposed model, a pixel-wise non-i. i. d.
1 code implementation • CVPR 2023 • Xuan Ju, Ailing Zeng, Jianan Wang, Qiang Xu, Lei Zhang
Humans have long been recorded in a variety of forms since antiquity.
1 code implementation • 13 Mar 2022 • Xindong Zhang, Hui Zeng, Shi Guo, Lei Zhang
A highly efficient long-range attention block (ELAB) is then built by simply cascading two shift-conv with a GMSA module, which is further accelerated by using a shared attention mechanism.
Ranked #11 on Image Super-Resolution on Manga109 - 4x upscaling
1 code implementation • CVPR 2022 • Chenhang He, Ruihuang Li, Shuai Li, Lei Zhang
VoxSeT is built upon a voxel-based set attention (VSA) module, which reduces the self-attention in each voxel by two cross-attentions and models features in a hidden space induced by a group of latent codes.
3 code implementations • CVPR 2018 • Wangpeng An, Haoqian Wang, Qingyun Sun, Jun Xu, Qionghai Dai, Lei Zhang
We first reveal the intrinsic connections between SGD-Momentum and PID based controller, then present the optimization algorithm which exploits the past, current, and change of gradients to update the network parameters.
1 code implementation • 19 Jul 2022 • Wentong Li, Wenyu Liu, Jianke Zhu, Miaomiao Cui, Xiansheng Hua, Lei Zhang
A simple mask supervised SOLOv2 model is adapted to predict the instance-aware mask map as the level set for each instance.
1 code implementation • ICCV 2023 • Shilong Liu, Tianhe Ren, Jiayu Chen, Zhaoyang Zeng, Hao Zhang, Feng Li, Hongyang Li, Jun Huang, Hang Su, Jun Zhu, Lei Zhang
We point out that the unstable matching in DETR is caused by a multi-optimization path problem, which is highlighted by the one-to-one matching design in DETR.
1 code implementation • 13 Mar 2023 • Feng Li, Ailing Zeng, Shilong Liu, Hao Zhang, Hongyang Li, Lei Zhang, Lionel M. Ni
Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance.
2 code implementations • ECCV 2020 • Zongsheng Yue, Qian Zhao, Lei Zhang, Deyu Meng
Specifically, we approximate the joint distribution with two different factorized forms, which can be formulated as a denoiser mapping the noisy image to the clean one and a generator mapping the clean image to the noisy one.
Ranked #2 on Noise Estimation on SIDD
1 code implementation • 21 Jul 2021 • Zhiqian Chen, Fanglan Chen, Lei Zhang, Taoran Ji, Kaiqun Fu, Liang Zhao, Feng Chen, Lingfei Wu, Charu Aggarwal, Chang-Tien Lu
Deep learning's performance has been extensively recognized recently.
1 code implementation • CVPR 2022 • jianqi ma, Zhetong Liang, Lei Zhang
The semantics of the text are firstly extracted by a text recognition module as text prior information.
3 code implementations • ECCV 2020 • Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu, Lei Zhang
With the help of multilevel gate units, the valuable context information from the encoder can be optimally transmitted to the decoder.
Ranked #15 on Dichotomous Image Segmentation on DIS-TE4
1 code implementation • 18 Mar 2023 • Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu, Lei Zhang
They ignore two key problems when the encoder exchanges information with the decoder: one is the lack of interference control mechanism between them, the other is without considering the disparity of the contributions from different encoder levels.
1 code implementation • CVPR 2022 • Yabin Zhang, Minghan Li, Ruihuang Li, Kui Jia, Lei Zhang
In this work, we, for the first time to our best knowledge, propose to perform Exact Feature Distribution Matching (EFDM) by exactly matching the empirical Cumulative Distribution Functions (eCDFs) of image features, which could be implemented by applying the Exact Histogram Matching (EHM) in the image feature space.
1 code implementation • CVPR 2018 • Feng Li, Cheng Tian, WangMeng Zuo, Lei Zhang, Ming-Hsuan Yang
Compared with SRDCF, STRCF with hand-crafted features provides a 5 times speedup and achieves a gain of 5. 4% and 3. 6% AUC score on OTB-2015 and Temple-Color, respectively.
Ranked #9 on Visual Object Tracking on VOT2017/18
3 code implementations • 3 Feb 2023 • Jie Yang, Ailing Zeng, Shilong Liu, Feng Li, Ruimao Zhang, Lei Zhang
This paper presents a novel end-to-end framework with Explicit box Detection for multi-person Pose estimation, called ED-Pose, where it unifies the contextual learning between human-level (global) and keypoint-level (local) information.
Ranked #2 on 2D Human Pose Estimation on Human-Art
1 code implementation • CVPR 2022 • Shuai Li, Chenhang He, Ruihuang Li, Lei Zhang
Existing LA methods mostly focus on the design of pos weighting function, while the neg weight is directly derived from the pos weight.
1 code implementation • 29 Jun 2021 • jianqi ma, Shi Guo, Lei Zhang
Our experiments on the benchmark TextZoom dataset show that TPGSR can not only effectively improve the visual quality of scene text images, but also significantly improve the text recognition accuracy over existing STISR methods.
1 code implementation • CVPR 2021 • Hongyi Zheng, Hongwei Yong, Lei Zhang
Inspired by the great success of deep neural networks (DNNs), many unfolding methods have been proposed to integrate traditional image modeling techniques, such as dictionary learning (DicL) and sparse coding, into DNNs for image restoration.
1 code implementation • 27 Mar 2022 • Jie Liang, Hui Zeng, Lei Zhang
Specifically, a tiny regression network is employed to predict the degradation parameters of the input image, while several convolutional experts with the same topology are jointly optimized to specify the network parameters via a non-linear mixture of experts.
1 code implementation • 28 Feb 2024 • Minghan Li, Shuai Li, Xindong Zhang, Lei Zhang
Despite the recent advances in unified image segmentation (IS), developing a unified video segmentation (VS) model remains a challenge.
Ranked #2 on Video Semantic Segmentation on VSPW (using extra training data)
Referring Expression Segmentation Referring Video Object Segmentation +6
1 code implementation • 28 Aug 2017 • Hui Zeng, Lei Zhang, Alan C. Bovik
Recognizing this, we propose a new representation of perceptual image quality, called probabilistic quality representation (PQR), to describe the image subjective score distribution, whereby a more robust loss function can be employed to train a deep BIQA model.
1 code implementation • 15 Oct 2022 • Xiaoming Li, Shiguang Zhang, Shangchen Zhou, Lei Zhang, WangMeng Zuo
Generally, it is a challenging and intractable task to improve the photo-realistic performance of blind restoration and adaptively handle the generic and specific restoration scenarios with a single unified model.
1 code implementation • 18 Sep 2019 • Hui Zeng, Lida Li, Zisheng Cao, Lei Zhang
The employed evaluation metrics such as intersection-over-union cannot reliably reflect the real performance of a cropping model, either.
1 code implementation • CVPR 2019 • Hui Zeng, Lida Li, Zisheng Cao, Lei Zhang
Consequently, a grid anchor based cropping benchmark is constructed, where all crops of each image are annotated and more reliable evaluation metrics are defined.
1 code implementation • ICCV 2021 • Xi Yang, Wangmeng Xiang, Hui Zeng, Lei Zhang
Existing VSR methods are mostly trained and evaluated on synthetic datasets, where the LR videos are uniformly downsampled from their high-resolution (HR) counterparts by some simple operators (e. g., bicubic downsampling).
1 code implementation • CVPR 2023 • Du Chen, Jie Liang, Xindong Zhang, Ming Liu, Hui Zeng, Lei Zhang
A human guided GT image dataset with both positive and negative samples is then constructed, and a loss function is proposed to train the Real-ISR models.
1 code implementation • 18 Dec 2019 • Lei Zhang, Fanchao Qi, Zhiyuan Liu, Yasheng Wang, Qun Liu, Maosong Sun
A reverse dictionary takes the description of a target word as input and outputs the target word together with other words that match the description.
1 code implementation • CVPR 2023 • Hao Zhang, Feng Li, Huaizhe xu, Shijia Huang, Shilong Liu, Lionel M. Ni, Lei Zhang
We present a mask-piloted Transformer which improves masked-attention in Mask2Former for image segmentation.
1 code implementation • 3 Oct 2022 • Xiaoming Li, Chaofeng Chen, Xianhui Lin, WangMeng Zuo, Lei Zhang
Notably, LQ face images, which may have the same degradation process as natural images, can be robustly restored with photo-realistic textures by exploiting their strong structural priors.
1 code implementation • 8 Apr 2020 • Ming Liu, Zhilu Zhang, Liya Hou, WangMeng Zuo, Lei Zhang
Nonetheless, content and resource adaptive model is more preferred, and it is encouraging to apply simpler and efficient networks to the easier regions with less details and the scenarios with restricted efficiency constraints.
1 code implementation • CVPR 2022 • Binghui Chen, Pengyu Li, Xiang Chen, Biao Wang, Lei Zhang, Xian-Sheng Hua
Semi-supervised object detection (SSOD) aims to facilitate the training and deployment of object detectors with the help of a large amount of unlabeled data.
1 code implementation • ICCV 2021 • GuanYing Chen, Chaofeng Chen, Shi Guo, Zhetong Liang, Kwan-Yee K. Wong, Lei Zhang
Secondly, we conduct more sophisticated alignment and temporal fusion in the feature space of the coarse HDR video to produce better reconstruction.
1 code implementation • 14 Jul 2020 • Mark Hamilton, Stephanie Fu, Mindren Lu, Johnny Bui, Darius Bopp, Zhenbang Chen, Felix Tran, Margaret Wang, Marina Rogers, Lei Zhang, Chris Hoder, William T. Freeman
We introduce MosAIc, an interactive web app that allows users to find pairs of semantically related artworks that span different cultures, media, and millennia.
Cultural Vocal Bursts Intensity Prediction Image Retrieval +2
3 code implementations • CVPR 2018 • Kuang-Huei Lee, Xiaodong He, Lei Zhang, Linjun Yang
We demonstrate the effectiveness of the proposed algorithm on both of the label noise detection task and the image classification on noisy data task on several large-scale datasets.
Ranked #2 on Image Classification on Food-101N (using extra training data)
1 code implementation • CVPR 2022 • Zongsheng Yue, Qian Zhao, Jianwen Xie, Lei Zhang, Deyu Meng, Kwan-Yee K. Wong
To address the above issues, this paper proposes a model-based blind SISR method under the probabilistic framework, which elaborately models image degradation from the perspectives of noise and blur kernel.
1 code implementation • NeurIPS 2021 • Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang
For example, our sparsified DeiT-Small at (5%, 50%) sparsity for (data, architecture), improves 0. 28% top-1 accuracy, and meanwhile enjoys 49. 32% FLOPs and 4. 40% running time savings.
Ranked #20 on Efficient ViTs on ImageNet-1K (with DeiT-T)
1 code implementation • ICCV 2021 • Jiapeng Tang, Jiabao Lei, Dan Xu, Feiying Ma, Kui Jia, Lei Zhang
To this end, we propose to learn implicit surface reconstruction by sign-agnostic optimization of convolutional occupancy networks, to simultaneously achieve advanced scalability to large-scale scenes, generality to novel shapes, and applicability to raw scans in a unified framework.
2 code implementations • 20 Feb 2020 • Yabin Zhang, Bin Deng, Hui Tang, Lei Zhang, Kui Jia
By using MCSD as a measure of domain distance, we develop a new domain adaptation bound for multi-class UDA; its data-dependent, probably approximately correct bound is also developed that naturally suggests adversarial learning objectives to align conditional feature distributions across source and target domains.
3 code implementations • ICCV 2023 • Wangmeng Xiang, Chao Li, Yuxuan Zhou, Biao Wang, Lei Zhang
More specifically, we employ a pre-trained large-scale language model as the knowledge engine to automatically generate text descriptions for body parts movements of actions, and propose a multi-modal training scheme by utilizing the text encoder to generate feature vectors for different body parts and supervise the skeleton encoder for action representation learning.
Ranked #5 on Skeleton Based Action Recognition on N-UCLA
1 code implementation • 22 Jul 2018 • Zheng Shou, Hang Gao, Lei Zhang, Kazuyuki Miyazawa, Shih-Fu Chang
In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance.
Weakly-supervised Temporal Action Localization Weakly Supervised Temporal Action Localization
1 code implementation • 2 Apr 2019 • Feida Zhu, Zhetong Liang, Xixi Jia, Lei Zhang, Yizhou Yu
This benchmark includes an image dataset with groundtruth image smoothing results as well as baseline algorithms that can generate competitive edge-preserving smoothing results for a wide range of image contents.
1 code implementation • CVPR 2021 • Zhengyuan Yang, Yijuan Lu, JianFeng Wang, Xi Yin, Dinei Florencio, Lijuan Wang, Cha Zhang, Lei Zhang, Jiebo Luo
Due to this aligned representation learning, even pre-trained on the same downstream task dataset, TAP already boosts the absolute accuracy on the TextVQA dataset by +5. 4%, compared with a non-TAP baseline.
1 code implementation • ECCV 2020 • Xiaoqi Zhao, Lihe Zhang, Youwei Pang, Huchuan Lu, Lei Zhang
In this work, we design a single stream network to directly use the depth map to guide early fusion and middle fusion between RGB and depth, which saves the feature encoder of the depth stream and achieves a lightweight and real-time model.
Ranked #15 on Thermal Image Segmentation on RGB-T-Glass-Segmentation
1 code implementation • CVPR 2022 • Ruihuang Li, Shuai Li, Chenhang He, Yabin Zhang, Xu Jia, Lei Zhang
One popular solution to this challenging task is self-training, which selects high-scoring predictions on target samples as pseudo labels for training.
Ranked #9 on Image-to-Image Translation on SYNTHIA-to-Cityscapes
2 code implementations • 13 Jun 2022 • Meilin Chen, WeiJie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Yunfeng Yan, Donglian Qi, Yueting Zhuang, Di Xie, ShiLiang Pu
In addition, we conduct anchor adaptation in parallel with localization adaptation, since anchor can be regarded as a learnable parameter.
1 code implementation • ICCV 2023 • Wentong Li, Yuqian Yuan, Song Wang, Jianke Zhu, Jianshu Li, Jian Liu, Lei Zhang
Weakly-supervised image segmentation has recently attracted increasing research attentions, aiming to avoid the expensive pixel-wise labeling.
1 code implementation • NeurIPS 2023 • Pandeng Li, Chen-Wei Xie, Hongtao Xie, Liming Zhao, Lei Zhang, Yun Zheng, Deli Zhao, Yongdong Zhang
Video moment retrieval pursues an efficient and generalized solution to identify the specific temporal segments within an untrimmed video that correspond to a given language description.
2 code implementations • 20 Dec 2017 • Tianshui Chen, Liang Lin, WangMeng Zuo, Xiaonan Luo, Lei Zhang
In this work, aiming at a general and comprehensive way for neural network acceleration, we develop a Wavelet-like Auto-Encoder (WAE) that decomposes the original input image into two low-resolution channels (sub-images) and incorporate the WAE into the classification neural networks for joint training.
1 code implementation • ICCV 2023 • Jie Yang, Ailing Zeng, Feng Li, Shilong Liu, Ruimao Zhang, Lei Zhang
Click-Pose explores how user feedback can cooperate with a neural keypoint detector to correct the predicted keypoints in an interactive way for a faster and more effective annotation process.
1 code implementation • ICCV 2023 • jianqi ma, Zhetong Liang, Wangmeng Xiang, Xi Yang, Lei Zhang
Scene Text Image Super-resolution (STISR) aims to recover high-resolution (HR) scene text images with visually pleasant and readable text content from the given low-resolution (LR) input.
1 code implementation • 7 Jan 2024 • Xiangtao Kong, Chao Dong, Lei Zhang
While single task image restoration (IR) has achieved significant successes, it remains a challenging issue to train a single model which can tackle multiple IR tasks.
1 code implementation • CVPR 2023 • Pengfei Wang, Zhaoxiang Zhang, Zhen Lei, Lei Zhang
In this paper, we present two conditions to ensure that the model could converge to a flat minimum with a small loss, and present an algorithm, named Sharpness-Aware Gradient Matching (SAGM), to meet the two conditions for improving model generalization capability.
1 code implementation • 28 Apr 2023 • Lei Zhang, Yuge Zhang, Kan Ren, Dongsheng Li, Yuqing Yang
In contrast, though human engineers have the incredible ability to understand tasks and reason about solutions, their experience and knowledge are often sparse and difficult to utilize by quantitative approaches.
1 code implementation • ICCV 2017 • Hao Lu, Lei Zhang, Zhiguo Cao, Wei Wei, Ke Xian, Chunhua Shen, Anton Van Den Hengel
Domain adaption (DA) allows machine learning methods trained on data sampled from one distribution to be applied to data sampled from another.
1 code implementation • NeurIPS 2023 • Wentong Li, Yuqian Yuan, Song Wang, Wenyu Liu, Dongqi Tang, Jian Liu, Jianke Zhu, Lei Zhang
In this work, we formulate the affinity modeling as an affinity propagation process, and propose a local and a global pairwise affinity terms to generate accurate soft pseudo labels.
1 code implementation • CVPR 2023 • Chenhang He, Ruihuang Li, Yabin Zhang, Shuai Li, Lei Zhang
Current top-performing multi-frame detectors mostly follow a Detect-and-Fuse framework, which extracts features from each frame of the sequence and fuses them to detect the objects in the current frame.
1 code implementation • 1 Dec 2023 • Xi Yang, Chenhang He, jianqi ma, Lei Zhang
To ensure the content consistency among adjacent frames, we exploit the temporal dynamics in LR videos to guide the diffusion process by optimizing the latent sampling path with a motion-guided loss, ensuring that the generated HR video maintains a coherent and continuous visual flow.
1 code implementation • 23 Jan 2024 • Zhaozhi Xie, Bochen Guan, Weihao Jiang, Muyang Yi, Yue Ding, Hongtao Lu, Lei Zhang
In this paper, we introduce a novel prompt-driven adapter into SAM, namely Prompt Adapter Segment Anything Model (PA-SAM), aiming to enhance the segmentation mask quality of the original SAM.
1 code implementation • 28 Nov 2022 • Shilong Liu, Yaoyuan Liang, Feng Li, Shijia Huang, Hao Zhang, Hang Su, Jun Zhu, Lei Zhang
As phrase extraction can be regarded as a $1$D text segmentation problem, we formulate PEG as a dual detection problem and propose a novel DQ-DETR model, which introduces dual queries to probe different features from image and text for object prediction and phrase mask prediction.
Ranked #7 on Referring Expression Comprehension on RefCOCO
1 code implementation • 27 Jul 2022 • Wangmeng Xiang, Chao Li, Biao Wang, Xihan Wei, Xian-Sheng Hua, Lei Zhang
For 3D video-based tasks such as action recognition, however, directly applying spatiotemporal transformers on video data will bring heavy computation and memory burdens due to the largely increased number of patches and the quadratic complexity of self-attention computation.
Ranked #9 on Action Recognition on Diving-48
1 code implementation • CVPR 2023 • Shuaizheng Liu, Xindong Zhang, Lingchen Sun, Zhetong Liang, Hui Zeng, Lei Zhang
In this work, we develop, for the first time to our best knowledge, an HDR image dataset by using mobile phone cameras, namely Mobile-HDR dataset.
1 code implementation • CVPR 2022 • Xiawu Zheng, Xiang Fei, Lei Zhang, Chenglin Wu, Fei Chao, Jianzhuang Liu, Wei Zeng, Yonghong Tian, Rongrong Ji
Building upon RMI, we further propose a new search algorithm termed RMI-NAS, facilitating with a theorem to guarantee the global optimal of the searched architecture.
1 code implementation • 11 Sep 2019 • Zhaoyang Zeng, Bei Liu, Jianlong Fu, Hongyang Chao, Lei Zhang
We study on weakly-supervised object detection (WSOD) which plays a vital role in relieving human involvement from object-level annotations.
1 code implementation • 16 Jul 2021 • WenBo Hu, Changgong Zhang, Fangneng Zhan, Lei Zhang, Tien-Tsin Wong
Based on this representation, we further propose a spatial-temporal conditional directed graph convolution to leverage varying non-local dependence for different poses by conditioning the graph topology on input poses.
Ranked #15 on 3D Human Pose Estimation on MPI-INF-3DHP
1 code implementation • CVPR 2022 • Shi Guo, Xi Yang, jianqi ma, Gaofeng Ren, Lei Zhang
Denoising and demosaicking are two essential steps to reconstruct a clean full-color image from the raw data.
1 code implementation • 16 Dec 2023 • Yunshui Li, Binyuan Hui, Xiaobo Xia, Jiaxi Yang, Min Yang, Lei Zhang, Shuzheng Si, Junhao Liu, Tongliang Liu, Fei Huang, Yongbin Li
Nuggets assesses the potential of individual instruction examples to act as effective one shot examples, thereby identifying those that can significantly enhance diverse task performance.
1 code implementation • 19 Dec 2018 • Xiaoming Li, Ming Liu, Jieru Zhu, WangMeng Zuo, Meng Wang, Guosheng Hu, Lei Zhang
As for missing pixels on both of half-faces, we present a generative reconstruction subnet together with a perceptual symmetry loss to enforce symmetry consistency of recovered structures.
Ranked #1 on Facial Inpainting on VggFace2
1 code implementation • CVPR 2023 • Shuai Li, Minghan Li, Ruihuang Li, Chenhang He, Lei Zhang
The positive and negative weights of these soft anchors are dynamically adjusted during training so that they can contribute more to ``representation learning'' in the early training stage, and contribute more to ``duplicated prediction removal'' in the later stage.
1 code implementation • CVPR 2023 • Yihao Chen, Xianbiao Qi, Jianan Wang, Lei Zhang
In this way, we can reduce the GPU memory consumption of contrastive loss computation from $\bigO(B^2)$ to $\bigO(\frac{B^2}{N})$, where $B$ and $N$ are the batch size and the number of GPUs used for training.
1 code implementation • 9 Oct 2022 • Rang Meng, Xianfeng Li, WeiJie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Mingli Song, Di Xie, ShiLiang Pu
Under this guidance, a novel Attention Diversification framework is proposed, in which Intra-Model and Inter-Model Attention Diversification Regularization are collaborated to reassign appropriate attention to diverse task-related features.
1 code implementation • 12 Apr 2018 • Dongwei Ren, WangMeng Zuo, David Zhang, Lei Zhang, Ming-Hsuan Yang
For blind deconvolution, as estimation error of blur kernel is usually introduced, the subsequent non-blind deconvolution process does not restore the latent image well.
1 code implementation • 25 Jan 2021 • Shi Guo, Zhetong Liang, Lei Zhang
Considering the fact that the green channel has twice the sampling rate and better quality than the red and blue channels in CFA raw data, we propose to use this green channel prior (GCP) to build a GCP-Net for the JDD-B task.
1 code implementation • ECCV 2020 • Yuanyi Zhong, Jian-Feng Wang, Jian Peng, Lei Zhang
In this paper, we propose an effective knowledge transfer framework to boost the weakly supervised object detection accuracy with the help of an external fully-annotated source dataset, whose categories may not overlap with the target domain.
1 code implementation • 20 May 2023 • Jie Yang, Bingliang Li, Fengyu Yang, Ailing Zeng, Lei Zhang, Ruimao Zhang
Extensive experiments demonstrate that DiffHOI significantly outperforms the state-of-the-art in regular detection (i. e., 41. 50 mAP) and zero-shot detection.
Ranked #2 on Zero-Shot Human-Object Interaction Detection on HICO-DET (using extra training data)
Human-Object Interaction Detection Zero-Shot Human-Object Interaction Detection
1 code implementation • ICLR 2021 • Zhiyuan Fang, JianFeng Wang, Lijuan Wang, Lei Zhang, Yezhou Yang, Zicheng Liu
This paper is concerned with self-supervised learning for small models.
1 code implementation • CVPR 2021 • Minghan Li, Shuai Li, Lida Li, Lei Zhang
To further explore temporal correlation among video frames, we aggregate a temporal fusion module to infer instance masks from each frame to its adjacent frames, which helps our framework to handle challenging videos such as motion blur, partial occlusion and unusual object-to-camera poses.
Ranked #24 on Video Instance Segmentation on YouTube-VIS 2021
1 code implementation • 1 Jan 2024 • Chenhang He, Ruihuang Li, Guowen Zhang, Lei Zhang
Window-based transformers have demonstrated strong ability in large-scale point cloud understanding by capturing context-aware representations with affordable attention computation in a more localized manner.
1 code implementation • CVPR 2023 • Ruihuang Li, Chenhang He, Yabin Zhang, Shuai Li, Liyi Chen, Lei Zhang
Weakly supervised instance segmentation using only bounding box annotations has recently attracted much research attention.
1 code implementation • 15 Dec 2023 • Zhengqiang Zhang, Ruihuang Li, Shi Guo, Yang Cao, Lei Zhang
Online video super-resolution (online-VSR) highly relies on an effective alignment module to aggregate temporal information, while the strict latency requirement makes accurate and efficient alignment very challenging.
1 code implementation • 15 Jan 2024 • Jiahui Zhong, Wenhong Tian, Yuanlun Xie, Zhijia Liu, Jie Ou, Taoran Tian, Lei Zhang
In this work, we propose PMFSNet, a novel medical imaging segmentation model that effectively balances global and local feature processing while avoiding the computational redundancy typical in larger models.
1 code implementation • IJCNLP 2019 • Ming Jiang, Junjie Hu, Qiuyuan Huang, Lei Zhang, Jana Diesner, Jianfeng Gao
In this study, we present a fine-grained evaluation method REO for automatically measuring the performance of image captioning systems.
1 code implementation • IJCNLP 2019 • Ming Jiang, Qiuyuan Huang, Lei Zhang, Xin Wang, Pengchuan Zhang, Zhe Gan, Jana Diesner, Jianfeng Gao
This paper presents a new metric called TIGEr for the automatic evaluation of image captioning systems.
1 code implementation • CVPR 2020 • Lei Zhang, Jiangtao Nie, Wei Wei, Yanning Zhang, Shengcai Liao, Ling Shao
Following this idea, we develop a two-stage SR network that leverages two consecutive modules: a fusion module and an adaptation module, to recover the latent HSI in a coarse-to-fine scheme.
1 code implementation • CVPR 2023 • Haoyu Wang, Guansong Pang, Peng Wang, Lei Zhang, Wei Wei, Yanning Zhang
Few-shot open-set recognition (FSOR) is a challenging task of great practical value.
1 code implementation • ICCV 2023 • Binglu Wang, Lei Zhang, Zhaozhong Wang, Yongqiang Zhao, Tianfei Zhou
This paper presents CORE, a conceptually simple, effective and communication-efficient model for multi-agent cooperative perception.
1 code implementation • 19 Apr 2023 • Xianbiao Qi, Jianan Wang, Yihao Chen, Yukai Shi, Lei Zhang
In contrast to previous practical tricks that address training instability by learning rate warmup, layer normalization, attention formulation, and weight initialization, we show that Lipschitz continuity is a more essential property to ensure training stability.
1 code implementation • CVPR 2021 • Jiapeng Tang, Dan Xu, Kui Jia, Lei Zhang
This paper focuses on the task of 4D shape reconstruction from a sequence of point clouds.
1 code implementation • 25 Apr 2022 • Zhishe Wang, Yanlin Chen, Wenyu Shao, Hui Li, Lei Zhang
The existing deep learning fusion methods mainly concentrate on the convolutional neural networks, and few attempts are made with transformer.
1 code implementation • 22 May 2020 • Jianfeng Wang, Xi Yin, Lijuan Wang, Lei Zhang
Considering the intersection-over-union (IoU) as the metric, we propose a simple yet effective hashing algorithm, named IoUHash, which guarantees that the boxes within the same cell are close enough by a lower IoU bound.
1 code implementation • 16 Mar 2024 • Tianhe Wu, Kede Ma, Jie Liang, Yujiu Yang, Lei Zhang
While Multimodal Large Language Models (MLLMs) have experienced significant advancement on visual understanding and reasoning, their potentials to serve as powerful, flexible, interpretable, and text-driven models for Image Quality Assessment (IQA) remains largely unexplored.
1 code implementation • ICCV 2023 • Yichen Yuan, Yifan Wang, Lijun Wang, Xiaoqi Zhao, Huchuan Lu, Yu Wang, Weibo Su, Lei Zhang
Recent leading zero-shot video object segmentation (ZVOS) works devote to integrating appearance and motion information by elaborately designing feature fusion modules and identically applying them in multiple feature stages.
1 code implementation • 7 Jul 2022 • Yabin Zhang, Jiehong Lin, Chenhang He, Yongwei Chen, Kui Jia, Lei Zhang
In this work, we make the first attempt, to the best of our knowledge, to consider the local geometry information explicitly into the masked auto-encoding, and propose a novel Masked Surfel Prediction (MaskSurf) method.
1 code implementation • 24 Jul 2022 • Lei Zhang, Guanyu Gao, Huaizheng Zhang
Then, the learnt knowledge from edge clients will be aggregated by centralized parameter server, where the knowledge will be selectively and attentively distilled from spatial- and temporal-dimension with carefully designed mechanisms.
1 code implementation • CVPR 2023 • Mingjun Xu, Lingyun Qin, WeiJie Chen, ShiLiang Pu, Lei Zhang
In this work, we present an idea to remove non-causal factors from common features by multi-view adversarial training on source domains, because we observe that such insignificant non-causal factors may still be significant in other latent spaces (views) due to the multi-mode structure of data.
1 code implementation • 17 Feb 2020 • Yingjie Yin, De Xu, Xingang Wang, Lei Zhang
We propose a directional deep embedding and appearance learning (DDEAL) method, which is free of the online fine-tuning process, for fast VOS.
1 code implementation • 10 Mar 2022 • Hongyi Zheng, Hongwei Yong, Lei Zhang
Nonetheless, the existing deep unfolding methods cannot explicitly solve the data term of the unfolding objective function, limiting their capability in blur kernel estimation.
1 code implementation • 3 Apr 2018 • Dianqi Li, Qiuyuan Huang, Xiaodong He, Lei Zhang, Ming-Ting Sun
By contrasting with human-written captions and image-mismatched captions, the caption generator effectively exploits the inherent characteristics of human languages, and generates more discriminative captions.
1 code implementation • CVPR 2021 • Pengyu Li, Biao Wang, Lei Zhang
This is because the classification paradigm needs to train a fully connected layer as the category classifier, and its parameters will be in the hundreds of millions if the training dataset contains millions of identities.
1 code implementation • 18 Mar 2022 • Tao Yang, Peiran Ren, Xuansong Xie, Xiansheng Hua, Lei Zhang
Most of the existing deep learning based VFI methods adopt off-the-shelf optical flow algorithms to estimate the bidirectional flows and interpolate the missing frames accordingly.
1 code implementation • 4 Apr 2022 • Ming Liu, Jianan Pan, Zifei Yan, WangMeng Zuo, Lei Zhang
Meanwhile, diverse testing sets are also provided with different types of reflection and scenes.
1 code implementation • CVPR 2023 • Fei Zhou, Peng Wang, Lei Zhang, Wei Wei, Yanning Zhang
Prototypical Network is a popular few-shot solver that aims at establishing a feature metric generalizable to novel few-shot classification (FSC) tasks using deep neural networks.
1 code implementation • 20 Mar 2024 • Wenqiao Zhang, Tianwei Lin, Jiang Liu, Fangxun Shu, Haoyuan Li, Lei Zhang, He Wanggui, Hao Zhou, Zheqi Lv, Hao Jiang, Juncheng Li, Siliang Tang, Yueting Zhuang
Recent advancements indicate that scaling up Multimodal Large Language Models (MLLMs) effectively enhances performance on downstream multimodal tasks.
Ranked #72 on Visual Question Answering on MM-Vet
1 code implementation • CVPR 2020 • Fuxiang Huang, Lei Zhang, Yang Yang, Xichuan Zhou
Most of the existing image retrieval methods only focus on single-domain retrieval, which assumes that the distributions of retrieval databases and queries are similar.
1 code implementation • ECCV 2020 • Yabin Zhang, Bin Deng, Kui Jia, Lei Zhang
To make the proposed A$^2$LP useful for UDA, we propose empirical schemes to generate such virtual instances.
1 code implementation • ICCV 2021 • Binghui Chen, Zhaoyi Yan, Ke Li, Pengyu Li, Biao Wang, WangMeng Zuo, Lei Zhang
In crowd counting, due to the problem of laborious labelling, it is perceived intractability of collecting a new large-scale dataset which has plentiful images with large diversity in density, scene, etc.
1 code implementation • 10 Dec 2022 • Ruohao Wang, Xiaohui Liu, Zhilu Zhang, Xiaohe Wu, Chun-Mei Feng, Lei Zhang, WangMeng Zuo
On the other hand, alignment algorithms in existing VSR methods perform poorly for real-world videos, leading to unsatisfactory results.
1 code implementation • 18 Mar 2024 • Hang Wang, Zhi-Qi Cheng, Youtian Du, Lei Zhang
Our research addresses the shortfall by introducing a novel approach to VAC, called Irregular Video Action Counting (IVAC).
1 code implementation • 26 Mar 2024 • Yabin Zhang, Wenjie Zhu, Hui Tang, Zhiyuan Ma, Kaiyang Zhou, Lei Zhang
In this paper, we introduce a versatile adaptation approach that can effectively work under all three settings.
1 code implementation • 4 Nov 2020 • Jordan Colman, Lei Zhang, Wenting Duan, Xujiong Ye
We verified the effect of introducing the regularisation of dropout with small rate (e. g. 0. 2) on the architecture, and found a dropout of 0. 2 improved the overall performance compared to no dropout, or a dropout of 0. 5.
1 code implementation • ICCV 2023 • Liyi Chen, Chenyang Lei, Ruihuang Li, Shuai Li, Zhaoxiang Zhang, Lei Zhang
Without introducing any external supervision and human priors, the proposed FPR effectively suppresses wrong activations from the background objects.
Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation
1 code implementation • ICCV 2019 • Zhenwei He, Lei Zhang
Conventional object detection methods essentially suppose that the training and testing data are collected from a restricted target domain with expensive labeling cost.
1 code implementation • 19 Nov 2021 • Luojun Lin, Han Xie, Zhishu Sun, WeiJie Chen, Wenxi Liu, Yuanlong Yu, Lei Zhang
From this perspective, we introduce a novel paradigm of DG, termed as Semi-Supervised Domain Generalization (SSDG), to explore how the labeled and unlabeled source domains can interact, and establish two settings, including the close-set and open-set SSDG.
1 code implementation • 21 Jul 2022 • Ming Liu, Yuxiang Wei, Xiaohe Wu, WangMeng Zuo, Lei Zhang
Generative adversarial networks (GANs) have drawn enormous attention due to the simple yet effective training mechanism and superior image generation quality.
1 code implementation • CVPR 2023 • Yuxiang Wei, Zhilong Ji, Xiaohe Wu, Jinfeng Bai, Lei Zhang, WangMeng Zuo
Despite the progress in semantic image synthesis, it remains a challenging problem to generate photo-realistic parts from input semantic map.
1 code implementation • 23 Nov 2023 • Luojun Lin, Zhifeng Shen, Zhishu Sun, Yuanlong Yu, Lei Zhang, WeiJie Chen
The parameters of dynamic networks can be decoupled into a static and a dynamic component, which are designed to learn domain-invariant and domain-specific features, respectively.
1 code implementation • 17 Mar 2024 • Renlong Wu, Zhilu Zhang, Shuohao Zhang, Longfei Gou, Haobin Chen, Lei Zhang, Hao Chen, WangMeng Zuo
On the other hand, in order to enhance the desmoking performance, we further feed the valuable information from PS frame into models, where a masking strategy and a regularization term are presented to avoid trivial solutions.
1 code implementation • CVPR 2022 • Jie Zhang, Bo Li, Jianghe Xu, Shuang Wu, Shouhong Ding, Lei Zhang, Chao Wu
The proposed method can efficiently imitate the target model through a small number of queries and achieve high attack success rate.
1 code implementation • ICCV 2023 • Song Guo, Lei Zhang, Xiawu Zheng, Yan Wang, Yuchao Li, Fei Chao, Chenglin Wu, Shengchuan Zhang, Rongrong Ji
In this paper, we try to solve this problem by introducing a principled and unified framework based on Information Bottleneck (IB) theory, which further guides us to an automatic pruning approach.
1 code implementation • 16 May 2019 • Feng Li, Xiaohe Wu, WangMeng Zuo, David Zhang, Lei Zhang
Therefore, we in this paper investigate the feasibility to remove cosine window from CF trackers with spatial regularization.
1 code implementation • CVPR 2023 • Minghan Li, Shuai Li, Wangmeng Xiang, Lei Zhang
The proposed MDQE is the first VIS method with per-clip input that achieves state-of-the-art results on challenging videos and competitive performance on simple videos.
Ranked #13 on Video Instance Segmentation on YouTube-VIS 2021
1 code implementation • 28 Oct 2023 • Shuoyuan Wang, Jindong Wang, Huajun Xi, Bob Zhang, Lei Zhang, Hongxin Wei
However, the high computational cost of optimization-based TTA algorithms makes it intractable to run on resource-constrained edge devices.
1 code implementation • 25 Dec 2023 • Shi Guo, jianqi ma, Xi Yang, Zhengqiang Zhang, Lei Zhang
Extensive experiments demonstrate the leading VJDD performance of our method in term of restoration accuracy, perceptual quality and temporal consistency.
2 code implementations • 16 Oct 2019 • OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang
We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot.