Search Results for author: Lei Zhang

Found 584 papers, 257 papers with code

Momentum Batch Normalization for Deep Learning with Small Batch Size

no code implementations • ECCV 2020 • Hongwei Yong, Jianqiang Huang, Deyu Meng, Xian-Sheng Hua, Lei Zhang

To make a deeper understanding of BN, in this work we prove that BN actually introduces a certain level of noise into the sample mean and variance during the training process, while the noise level depends only on the batch size.

Paper
Add Code

LST-Net: Learning a Convolutional Neural Network with a Learnable Sparse Transform

no code implementations • ECCV 2020 • Lida Li, Kun Wang, Shuai Li, Xiangchu Feng, Lei Zhang

The 2D convolutional (Conv2d) layer is the fundamental element to a deep convolutional neural network (CNN).

Paper
Add Code

A Decoupled Learning Scheme for Real-world Burst Denoising from Raw Images

no code implementations • ECCV 2020 • Zhetong Liang, Shi Guo, Hong Gu, Huaqi Zhang, Lei Zhang

On one hand, most of the models are trained on video sequences with synthetic noise.

Denoising

Paper
Add Code

PCLMix: Weakly Supervised Medical Image Segmentation via Pixel-Level Contrastive Learning and Dynamic Mix Augmentation

1 code implementation • 10 May 2024 • Yu Lei, Haolun Luo, Lituan Wang, Zhenwei Zhang, Lei Zhang

In weakly supervised medical image segmentation, the absence of structural priors and the discreteness of class feature distribution present a challenge, i. e., how to accurately propagate supervision signals from local to global regions without excessively spreading them to other irrelevant regions?

Paper
Code

MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation

1 code implementation • 9 May 2024 • Yuxiang Wei, Zhilong Ji, Jinfeng Bai, Hongzhi Zhang, Lei Zhang, WangMeng Zuo

In this work, we present MasterWeaver, a test-time tuning-free method designed to generate personalized images with both faithful identity fidelity and flexible editability.

Text-to-Image Generation

Paper
Code

Vision Mamba: A Comprehensive Survey and Taxonomy

1 code implementation • 7 May 2024 • Xiao Liu, Chenxu Zhang, Lei Zhang

In the field of deep learning, state space models are used to process sequence data, such as time series analysis, natural language processing (NLP) and video understanding.

Time Series Analysis Video Understanding

Paper
Code

Sora Detector: A Unified Hallucination Detection for Large Text-to-Video Models

1 code implementation • 7 May 2024 • Zhixuan Chu, Lei Zhang, Yichen Sun, Siqiao Xue, Zhibo Wang, Zhan Qin, Kui Ren

Leveraging the state-of-the-art keyframe extraction techniques and multimodal large language models, SoraDetector first evaluates the consistency between extracted video content summary and textual prompts, then constructs static and dynamic knowledge graphs (KGs) from frames to detect hallucination both in single frames and across frames.

Hallucination Knowledge Graphs

Paper
Code

Three-Dimension Collision-Free Trajectory Planning of UAVs Based on ADS-B Information in Low-Altitude Urban Airspace

no code implementations • 29 Apr 2024 • Chao Dong, Yifan Zhang, Ziye Jia, Yiyang Liao, Lei Zhang, Qihui Wu

Consequently, we leverage ADS-B for surveillance and information broadcasting, and divide the aerial airspace into multiple sub-airspaces to improve flight safety in UAV trajectory planning.

Trajectory Planning

Paper
Add Code

Joint ADS-B in 5G for Hierarchical Aerial Networks: Performance Analysis and Optimization

no code implementations • 29 Apr 2024 • Ziye Jia, Yiyang Liao, Chao Dong, Lijun He, Qihui Wu, Lei Zhang

Specifically, a hierarchical structure is proposed, in which the high-altitude central UAV is equipped with ADS-B and the low-altitude central UAV utilizes 5G modules to transmit flight information.

Edge-computing

Paper
Add Code

DIG3D: Marrying Gaussian Splatting with Deformable Transformer for Single Image 3D Reconstruction

no code implementations • 25 Apr 2024 • Jiamin Wu, Kenkun Liu, Han Gao, Xiaoke Jiang, Lei Zhang

By harnessing the benefits of 3D Gaussians, our approach offers an efficient and accurate solution for 3D reconstruction from single-view images.

3D Object Reconstruction 3D Reconstruction +2

Paper
Add Code

Multi-fingered Robotic Hand Grasping in Cluttered Environments through Hand-object Contact Semantic Mapping

no code implementations • 12 Apr 2024 • Lei Zhang, Kaixin Bai, Guowen Huang, Zhaopeng Chen, Jianwei Zhang

The integration of optimization method and generative models has significantly advanced dexterous manipulation techniques for five-fingered hand grasping.

Grasp Generation

Paper
Add Code

Responsible Visual Editing

1 code implementation • 8 Apr 2024 • Minheng Ni, Yeli Shen, Lei Zhang, WangMeng Zuo

To mitigate the negative implications of harmful images on research, we create a transparent and public dataset, AltBear, which expresses harmful information using teddy bears instead of humans.

Paper
Code

Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks

no code implementations • 4 Apr 2024 • Lei Zhang, YuHang Zhou, Yi Yang, Xinbo Gao

Despite providing high-performance solutions for computer vision tasks, the deep neural network (DNN) model has been proved to be extremely vulnerable to adversarial attacks.

Adversarial Defense Adversarial Robustness +1

Paper
Add Code

Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

1 code implementation • 26 Mar 2024 • Yabin Zhang, Wenjie Zhu, Hui Tang, Zhiyuan Ma, Kaiyang Zhou, Lei Zhang

In this paper, we introduce a versatile adaptation approach that can effectively work under all three settings.

Paper
Code

An Open-World, Diverse, Cross-Spatial-Temporal Benchmark for Dynamic Wild Person Re-Identification

1 code implementation • 22 Mar 2024 • Lei Zhang, Xiaowei Fu, Fuxiang Huang, Yi Yang, Xinbo Gao

Person re-identification (ReID) has made great strides thanks to the data-driven deep learning techniques.

Person Re-Identification

Paper
Code

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

1 code implementation • 21 Mar 2024 • Qing Jiang, Feng Li, Zhaoyang Zeng, Tianhe Ren, Shilong Liu, Lei Zhang

Recognizing the complementary strengths and weaknesses of both text and visual prompts, we introduce T-Rex2 that synergizes both prompts within a single model through contrastive learning.

Contrastive Learning Descriptive +3

1,887

Paper
Code

MMIDR: Teaching Large Language Model to Interpret Multimodal Misinformation via Knowledge Distillation

1 code implementation • 21 Mar 2024 • Longzheng Wang, Xiaohan Xu, Lei Zhang, Jiarui Lu, Yongxiu Xu, Hongbo Xu, Minghao Tang, Chuang Zhang

Automatic detection of multimodal misinformation has gained a widespread attention recently.

Data Augmentation Decision Making +5

Paper
Code

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models

1 code implementation • 20 Mar 2024 • Wenqiao Zhang, Tianwei Lin, Jiang Liu, Fangxun Shu, Haoyuan Li, Lei Zhang, He Wanggui, Hao Zhou, Zheqi Lv, Hao Jiang, Juncheng Li, Siliang Tang, Yueting Zhuang

Recent advancements indicate that scaling up Multimodal Large Language Models (MLLMs) effectively enhances performance on downstream multimodal tasks.

Ranked #77 on Visual Question Answering on MM-Vet

Visual Question Answering

Paper
Code

Compress3D: a Compressed Latent Space for 3D Generation from a Single Image

no code implementations • 20 Mar 2024 • BoWen Zhang, Tianyu Yang, Yu Li, Lei Zhang, Xi Zhao

In this paper, we present a triplane autoencoder, which encodes 3D models into a compact triplane latent space to effectively compress both the 3D geometry and texture information.

3D Generation

Paper
Add Code

TAPTR: Tracking Any Point with Transformers as Detection

no code implementations • 19 Mar 2024 • Hongyang Li, Hao Zhang, Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Lei Zhang

Based on the observation that point tracking bears a great resemblance to object detection and tracking, we borrow designs from DETR-like algorithms to address the task of TAP.

object-detection Object Detection +2

Paper
Add Code

Robust Overfitting Does Matter: Test-Time Adversarial Purification With FGSM

no code implementations • 18 Mar 2024 • Linyu Tang, Lei Zhang

Current defense strategies usually train DNNs for a specific adversarial attack method and can achieve good robustness in defense against this type of adversarial attack.

Adversarial Attack Adversarial Defense +1

Paper
Add Code

IVAC-P2L: Leveraging Irregular Repetition Priors for Improving Video Action Counting

1 code implementation • 18 Mar 2024 • Hang Wang, Zhi-Qi Cheng, Youtian Du, Lei Zhang

Our research addresses the shortfall by introducing a novel approach to VAC, called Irregular Video Action Counting (IVAC).

Paper
Code

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

no code implementations • 18 Mar 2024 • Bojia Zi, Shihao Zhao, Xianbiao Qi, Jianan Wang, Yukai Shi, Qianyu Chen, Bin Liang, Kam-Fai Wong, Lei Zhang

To this end, this paper proposes a novel text-guided video inpainting model that achieves better consistency, controllability and compatibility.

Image Inpainting Video Alignment +2

Paper
Add Code

Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification

1 code implementation • 18 Mar 2024 • Kaijie Ren, Lei Zhang

To address this issue, we propose a novel Implicit Discriminative Knowledge Learning (IDKL) network to uncover and leverage the implicit discriminative information contained within the modality-specific.

Person Re-Identification

Paper
Code

Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models

1 code implementation • 17 Mar 2024 • Ruibin Li, Ruihuang Li, Song Guo, Lei Zhang

Text-driven diffusion models have significantly advanced the image editing performance by using text prompts as inputs.

Image Generation

Paper
Code

Self-Supervised Video Desmoking for Laparoscopic Surgery

1 code implementation • 17 Mar 2024 • Renlong Wu, Zhilu Zhang, Shuohao Zhang, Longfei Gou, Haobin Chen, Lei Zhang, Hao Chen, WangMeng Zuo

On the other hand, in order to enhance the desmoking performance, we further feed the valuable information from PS frame into models, where a masking strategy and a regularization term are presented to avoid trivial solutions.

Paper
Code

A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment

1 code implementation • 16 Mar 2024 • Tianhe Wu, Kede Ma, Jie Liang, Yujiu Yang, Lei Zhang

While Multimodal Large Language Models (MLLMs) have experienced significant advancement on visual understanding and reasoning, their potentials to serve as powerful, flexible, interpretable, and text-driven models for Image Quality Assessment (IQA) remains largely unexplored.

Image Quality Assessment

Paper
Code

Ctrl123: Consistent Novel View Synthesis via Closed-Loop Transcription

no code implementations • 16 Mar 2024 • Hongxiang Zhao, Xili Dai, Jianan Wang, Shengbang Tong, Jingyuan Zhang, Weida Wang, Lei Zhang, Yi Ma

This consequently limits the performance of downstream tasks, such as image-to-multiview generation and 3D reconstruction.

3D Reconstruction Novel View Synthesis

Paper
Add Code

Vosh: Voxel-Mesh Hybrid Representation for Real-Time View Synthesis

no code implementations • 11 Mar 2024 • Chenhao Zhang, Yongyang Zhou, Lei Zhang

The neural radiance field (NeRF) has emerged as a prominent methodology for synthesizing realistic images of novel views.

Paper
Add Code

Parameterized quantum comb and simpler circuits for reversing unknown qubit-unitary operations

no code implementations • 6 Mar 2024 • Yin Mo, Lei Zhang, Yu-Ao Chen, Yingjian Liu, Tengxiang Lin, Xin Wang

Quantum comb is an essential tool for characterizing complex quantum protocols in quantum information processing.

Quantum Machine Learning

Paper
Add Code

SGD with Partial Hessian for Deep Neural Networks Optimization

1 code implementation • 5 Mar 2024 • Ying Sun, Hongwei Yong, Lei Zhang

Compared with first-order optimizers, it adopts a certain amount of information from the Hessian matrix to assist optimization, while compared with the existing second-order optimizers, it keeps the good generalization performance of first-order optimizers.

Image Classification Second-order methods

Paper
Code

DragTex: Generative Point-Based Texture Editing on 3D Mesh

no code implementations • 4 Mar 2024 • Yudi Zhang, Qi Xu, Lei Zhang

Creating 3D textured meshes using generative artificial intelligence has garnered significant attention recently.

Decoder Texture Synthesis

Paper
Add Code

Learning Causal Features for Incremental Object Detection

no code implementations • 1 Mar 2024 • Zhenwei He, Lei Zhang

\keywords{Object detection, incremental learning, causal feature.

Incremental Learning Object +2

Paper
Add Code

UniVS: Unified and Universal Video Segmentation with Prompts as Queries

1 code implementation • 28 Feb 2024 • Minghan Li, Shuai Li, Xindong Zhang, Lei Zhang

Despite the recent advances in unified image segmentation (IS), developing a unified video segmentation (VS) model remains a challenge.

Ranked #2 on Video Semantic Segmentation on VSPW (using extra training data)

Decoder Referring Expression Segmentation +7

131

Paper
Code

Infrared and visible Image Fusion with Language-driven Loss in CLIP Embedding Space

1 code implementation • 26 Feb 2024 • Yuhao Wang, Lingjuan Miao, Zhiqiang Zhou, Lei Zhang, Yajun Qiao

A language-driven fusion model is then constructed in the embedding space, by establishing the relationship among the embedded vectors to represent the fusion objective and input image modalities.

Infrared And Visible Image Fusion

Paper
Code

ConSept: Continual Semantic Segmentation via Adapter-based Vision Transformer

no code implementations • 26 Feb 2024 • Bowen Dong, Guanglei Yang, WangMeng Zuo, Lei Zhang

Empirical investigations on the adaptation of existing frameworks to vanilla ViT reveal that incorporating visual adapters into ViTs or fine-tuning ViTs with distillation terms is advantageous for enhancing the segmentation capability of novel classes.

Continual Semantic Segmentation Segmentation +1

Paper
Add Code

Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing

no code implementations • 22 Feb 2024 • Hao Li, Mengqi Huang, Lei Zhang, Bo Hu, Yi Liu, Zhendong Mao

GAN-based image attribute editing firstly leverages GAN Inversion to project real images into the latent space of GAN and then manipulates corresponding latent codes.

Attribute

Paper
Add Code

A Collision-Aware Cable Grasping Method in Cluttered Environment

no code implementations • 22 Feb 2024 • Lei Zhang, Kaixin Bai, Qiang Li, Zhaopeng Chen, Jianwei Zhang

We introduce a Cable Grasping-Convolutional Neural Network designed to facilitate robust cable grasping in cluttered environments.

Paper
Add Code

CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models

no code implementations • 20 Feb 2024 • Yizhi Li, Ge Zhang, Xingwei Qu, Jiali Li, Zhaoqun Li, Zekun Wang, Hao Li, Ruibin Yuan, Yinghao Ma, Kai Zhang, Wangchunshu Zhou, Yiming Liang, Lei Zhang, Lei Ma, Jiajun Zhang, Zuowen Li, Stephen W. Huang, Chenghua Lin, Wenhu Chen, Jie Fu

The advancement of large language models (LLMs) has enhanced the ability to generalize across a wide range of unseen natural language processing (NLP) tasks through instruction-following.

Instruction Following

Paper
Add Code

EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs

1 code implementation • 19 Feb 2024 • Song Guo, Fan Wu, Lei Zhang, Xiawu Zheng, Shengchuan Zhang, Fei Chao, Yiyu Shi, Rongrong Ji

For instance, on the Wikitext2 dataset with LlamaV1-7B at 70% sparsity, our proposed EBFT achieves a perplexity of 16. 88, surpassing the state-of-the-art DSnoT with a perplexity of 75. 14.

Paper
Code

Invariance-powered Trustworthy Defense via Remove Then Restore

no code implementations • 1 Feb 2024 • Xiaowei Fu, YuHang Zhou, Lina Ma, Lei Zhang

Based on this finding, a Pixel Surgery and Semantic Regeneration (PSSR) model following the targeted therapy mechanism is developed, which has three merits: 1) To remove the salient attack, a score-based Pixel Surgery module is proposed, which retains the trivial attack as a kind of invariance information.

Paper
Add Code

LanDA: Language-Guided Multi-Source Domain Adaptation

no code implementations • 25 Jan 2024 • Zhenbin Wang, Lei Zhang, Lituan Wang, Minjuan Zhu

Multi-Source Domain Adaptation (MSDA) aims to mitigate changes in data distribution when transferring knowledge from multiple labeled source domains to an unlabeled target domain.

Domain Adaptation

Paper
Add Code

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

1 code implementation • 25 Jan 2024 • Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang

We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM).

Segmentation

13,621

Paper
Code

PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation

1 code implementation • 23 Jan 2024 • Zhaozhi Xie, Bochen Guan, Weihao Jiang, Muyang Yi, Yue Ding, Hongtao Lu, Lei Zhang

In this paper, we introduce a novel prompt-driven adapter into SAM, namely Prompt Adapter Segment Anything Model (PA-SAM), aiming to enhance the segmentation mask quality of the original SAM.

Decoder Image Segmentation +2

Paper
Code

CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators

no code implementations • 23 Jan 2024 • Songyun Qu, Shixin Zhao, Bing Li, Yintao He, Xuyi Cai, Lei Zhang, Ying Wang

Based on the proposed abstraction, CIM-MLC can compile tasks onto a wide range of CIM accelerators having different devices, architectures, and programming interfaces.

Scheduling

Paper
Add Code

Symbol as Points: Panoptic Symbol Spotting via Point-based Representation

1 code implementation • 19 Jan 2024 • Wenlong Liu, Tianyu Yang, YuHan Wang, QiZhi Yu, Lei Zhang

Finally, we propose a KNN interpolation mechanism for the mask attention module of the spotting head to better handle primitive mask downsampling, which is primitive-level in contrast to pixel-level for the image.

Point Cloud Segmentation Vector Graphics

Paper
Code

PMFSNet: Polarized Multi-scale Feature Self-attention Network For Lightweight Medical Image Segmentation

1 code implementation • 15 Jan 2024 • Jiahui Zhong, Wenhong Tian, Yuanlun Xie, Zhijia Liu, Jie Ou, Taoran Tian, Lei Zhang

In this work, we propose PMFSNet, a novel medical imaging segmentation model that effectively balances global and local feature processing while avoiding the computational redundancy typical in larger models.

Image Segmentation Medical Image Segmentation +2

Paper
Code

Towards Effective Multiple-in-One Image Restoration: A Sequential and Prompt Learning Strategy

1 code implementation • 7 Jan 2024 • Xiangtao Kong, Chao Dong, Lei Zhang

While single task image restoration (IR) has achieved significant successes, it remains a challenging issue to train a single model which can tackle multiple IR tasks.

Image Restoration

Paper
Code

Pontryagin Neural Operator for Solving Parametric General-Sum Differential Games

no code implementations • 3 Jan 2024 • Lei Zhang, Mukesh Ghimire, Zhe Xu, Wenlong Zhang, Yi Ren

To address these challenges, we propose in this paper a Pontryagin-mode neural operator that outperforms existing state-of-the-art (SOTA) on safety performance across games with parametric state constraints.

Paper
Add Code

ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention

1 code implementation • 1 Jan 2024 • Chenhang He, Ruihuang Li, Guowen Zhang, Lei Zhang

Window-based transformers have demonstrated strong ability in large-scale point cloud understanding by capturing context-aware representations with affordable attention computation in a more localized manner.

Blocking

Paper
Code

Improving the Stability of Diffusion Models for Content Consistent Super-Resolution

1 code implementation • 30 Dec 2023 • Lingchen Sun, Rongyuan Wu, Zhengqiang Zhang, Hongwei Yong, Lei Zhang

To improve the stability of diffusion prior-based SR, we propose to employ the diffusion models to refine image structures, while employing the generative adversarial training to enhance image fine details.

Decoder Image Super-Resolution +1

344

Paper
Code

A Closed-Loop Multi-perspective Visual Servoing Approach with Reinforcement Learning

no code implementations • 25 Dec 2023 • Lei Zhang, Jiacheng Pei, Kaixin Bai, Zhaopeng Chen, Jianwei Zhang

Traditional visual servoing methods suffer from serving between scenes from multiple perspectives, which humans can complete with visual signals alone.

OpenAI Gym reinforcement-learning

Paper
Add Code

Toward Accurate and Temporally Consistent Video Restoration from Raw Data

1 code implementation • 25 Dec 2023 • Shi Guo, jianqi ma, Xi Yang, Zhengqiang Zhang, Lei Zhang

Extensive experiments demonstrate the leading VJDD performance of our method in term of restoration accuracy, perceptual quality and temporal consistency.

Demosaicking Denoising +2

Paper
Code

Perception-Distortion Balanced Super-Resolution: A Multi-Objective Optimization Perspective

1 code implementation • 24 Dec 2023 • Lingchen Sun, Jie Liang, Shuaizheng Liu, Hongwei Yong, Lei Zhang

High perceptual quality and low distortion degree are two important goals in image restoration tasks such as super-resolution (SR).

Image Restoration Super-Resolution

Paper
Code

One Shot Learning as Instruction Data Prospector for Large Language Models

1 code implementation • 16 Dec 2023 • Yunshui Li, Binyuan Hui, Xiaobo Xia, Jiaxi Yang, Min Yang, Lei Zhang, Shuzheng Si, Junhao Liu, Tongliang Liu, Fei Huang, Yongbin Li

Nuggets assesses the potential of individual instruction examples to act as effective one shot examples, thereby identifying those that can significantly enhance diverse task performance.

One-Shot Learning

Paper
Code

Marathon: A Race Through the Realm of Long Context with Large Language Models

no code implementations • 15 Dec 2023 • Lei Zhang, Yunshui Li, Ziqiang Liu, Jiaxi Yang, Junhao Liu, Min Yang

Although there are currently many benchmarks available for evaluating the long context understanding and reasoning capability of large language models, with the expansion of the context window in these models, the existing long context benchmarks are no longer sufficient for evaluating the long context understanding and reasoning capability of large language models.

Long-Context Understanding Multiple-choice

Paper
Add Code

TMP: Temporal Motion Propagation for Online Video Super-Resolution

1 code implementation • 15 Dec 2023 • Zhengqiang Zhang, Ruihuang Li, Shi Guo, Yang Cao, Lei Zhang

Online video super-resolution (online-VSR) highly relies on an effective alignment module to aggregate temporal information, while the strict latency requirement makes accurate and efficient alignment very challenging.

Video Super-Resolution

Paper
Code

Osprey: Pixel Understanding with Visual Instruction Tuning

2 code implementations • 15 Dec 2023 • Yuqian Yuan, Wentong Li, Jian Liu, Dongqi Tang, Xinjie Luo, Chi Qin, Lei Zhang, Jianke Zhu

In this paper, we propose Osprey, a mask-text instruction tuning approach, to extend MLLMs by incorporating fine-grained mask regions into language instruction, aiming at achieving pixel-wise visual understanding.

Language Modelling

3,400

Paper
Code

Stable Score Distillation for High-Quality 3D Generation

no code implementations • 14 Dec 2023 • Boshi Tang, Jianan Wang, Zhiyong Wu, Lei Zhang

Although Score Distillation Sampling (SDS) has exhibited remarkable performance in conditional 3D content generation, a comprehensive understanding of its formulation is still lacking, hindering the development of 3D generation.

3D Generation

Paper
Add Code

Vision-Language Models as a Source of Rewards

no code implementations • 14 Dec 2023 • Kate Baumli, Satinder Baveja, Feryal Behbahani, Harris Chan, Gheorghe Comanici, Sebastian Flennerhag, Maxime Gazeau, Kristian Holsheimer, Dan Horgan, Michael Laskin, Clare Lyle, Hussain Masoom, Kay McKinney, Volodymyr Mnih, Alexander Neitz, Fabio Pardo, Jack Parker-Holder, John Quan, Tim Rocktäschel, Himanshu Sahni, Tom Schaul, Yannick Schroecker, Stephen Spencer, Richie Steigerwald, Luyu Wang, Lei Zhang

Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning.

reinforcement-learning

Paper
Add Code

Guided Image Restoration via Simultaneous Feature and Image Guided Fusion

no code implementations • 14 Dec 2023 • Xinyi Liu, Qian Zhao, Jie Liang, Hui Zeng, Deyu Meng, Lei Zhang

Currently, joint image filtering-inspired deep learning-based methods represent the state-of-the-art for GIR tasks.

Depth Map Super-Resolution Image Restoration

Paper
Add Code

Compress & Align: Curating Image-Text Data with Human Knowledge

no code implementations • 11 Dec 2023 • Lei Zhang, Fangxun Shu, Sucheng Ren, Bingchen Zhao, Hao Jiang, Cihang Xie

The massive growth of image-text data through web crawling inherently presents the challenge of variability in data quality.

Image Captioning Text Retrieval

Paper
Add Code

RCA-NOC: Relative Contrastive Alignment for Novel Object Captioning

no code implementations • ICCV 2023 • Jiashuo Fan, Yaoyuan Liang, Leyao Liu, ShaoLun Huang, Lei Zhang

We evaluate our approach on two datasets and show that our proposed RCA-NOC approach outperforms state-of-the-art methods by a large margin, demonstrating its effectiveness in improving vision-language representation for novel object captioning.

Contrastive Learning Object +1

Paper
Add Code

Dynamic Weighted Combiner for Mixed-Modal Image Retrieval

1 code implementation • 11 Dec 2023 • Fuxiang Huang, Lei Zhang, Xiaowei Fu, Suqi Song

First, we propose an Editable Modality De-equalizer (EMD) by taking into account the contribution disparity between modalities, containing two modality feature editors and an adaptive weighted combiner.

Image Retrieval Retrieval

Paper
Code

Audio-Visual LLM for Video Understanding

no code implementations • 11 Dec 2023 • Fangxun Shu, Lei Zhang, Hao Jiang, Cihang Xie

This paper presents Audio-Visual LLM, a Multimodal Large Language Model that takes both visual and auditory inputs for holistic video understanding.

AudioCaps Language Modelling +2

Paper
Add Code

OpenSD: Unified Open-Vocabulary Segmentation and Detection

no code implementations • 10 Dec 2023 • Shuai Li, Minghan Li, Pengfei Wang, Lei Zhang

To address these challenges, we present a universal transformer-based framework, abbreviated as OpenSD, which utilizes the same architecture and network parameters to handle open-vocabulary segmentation and detection tasks.

Ranked #8 on Zero Shot Segmentation on Segmentation in the Wild

Decoder Segmentation +1

Paper
Add Code

PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction

no code implementations • 7 Dec 2023 • Yinhuai Wang, Jing Lin, Ailing Zeng, Zhengyi Luo, Jian Zhang, Lei Zhang

To make up for the lack of dynamic HOI scenarios in this area, we introduce the BallPlay dataset that contains eight whole-body basketball skills.

Human-Object Interaction Detection Object

Paper
Add Code

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

1 code implementation • 5 Dec 2023 • Hao Zhang, Hongyang Li, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao, Lei Zhang, Chunyuan Li, Jianwei Yang

To address this issue, we have created GVC data that allows for the combination of grounding and chat capabilities.

Decoder

254

Paper
Code

Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution

1 code implementation • 1 Dec 2023 • Xi Yang, Chenhang He, jianqi ma, Lei Zhang

To ensure the content consistency among adjacent frames, we exploit the temporal dynamics in LR videos to guide the diffusion process by optimizing the latent sampling path with a motion-guided loss, ensuring that the generated HR video maintains a coherent and continuous visual flow.

Decoder Image Restoration +1

Paper
Code

Value Approximation for Two-Player General-Sum Differential Games with State Constraints

1 code implementation • 28 Nov 2023 • Lei Zhang, Mukesh Ghimire, Wenlong Zhang, Zhe Xu, Yi Ren

Solving Hamilton-Jacobi-Isaacs (HJI) PDEs numerically enables equilibrial feedback control in two-player differential games, yet faces the curse of dimensionality (CoD).

Physics-informed machine learning

Paper
Code

SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution

1 code implementation • 27 Nov 2023 • Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, Lei Zhang

First, we train a degradation-aware prompt extractor, which can generate accurate soft and hard semantic prompts even under strong degradation.

Image Super-Resolution

296

Paper
Code

RIDE: Real-time Intrusion Detection via Explainable Machine Learning Implemented in a Memristor Hardware Architecture

no code implementations • 27 Nov 2023 • Jingdi Chen, Lei Zhang, Joseph Riem, Gina Adam, Nathaniel D. Bastian, Tian Lan

Deep Learning (DL) based methods have shown great promise in network intrusion detection by identifying malicious network traffic behavior patterns with high accuracy, but their applications to real-time, packet-level detections in high-speed communication networks are challenging due to the high computation time and resource requirements of Deep Neural Networks (DNNs), as well as lack of explainability.

Network Intrusion Detection

Paper
Add Code

E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation

no code implementations • 25 Nov 2023 • Fengyi Fu, Lei Zhang, Quan Wang, Zhendong Mao

Then we propose an emotion correlation enhanced decoder, with a novel correlation-aware aggregation and soft/hard strategy, respectively improving the emotion perception and response generation.

Decoder Dialogue Generation +1

Paper
Add Code

Parameter Exchange for Robust Dynamic Domain Generalization

1 code implementation • 23 Nov 2023 • Luojun Lin, Zhifeng Shen, Zhishu Sun, Yuanlong Yu, Lei Zhang, WeiJie Chen

The parameters of dynamic networks can be decoupled into a static and a dynamic component, which are designed to learn domain-invariant and domain-specific features, respectively.

Disentanglement Domain Generalization

Paper
Code

T-Rex: Counting by Visual Prompting

no code implementations • 22 Nov 2023 • Qing Jiang, Feng Li, Tianhe Ren, Shilong Liu, Zhaoyang Zeng, Kent Yu, Lei Zhang

Guided by the visual feedback from T-Rex, users can also interactively refine the counting results by prompting on missing or falsely-detected objects.

Object Object Counting +4

Paper
Add Code

Visual In-Context Prompting

3 code implementations • 22 Nov 2023 • Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao

In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain.

Decoder Segmentation +1

1,952

Paper
Code

AcademicGPT: Empowering Academic Research

no code implementations • 21 Nov 2023 • Shufa Wei, Xiaolong Xu, Xianbiao Qi, Xi Yin, Jun Xia, Jingyi Ren, Peijun Tang, Yuxiang Zhong, Yihao Chen, Xiaoqin Ren, Yuxin Liang, Liankai Huang, Kai Xie, Weikang Gui, Wei Tan, Shuanglong Sun, Yongquan Hu, Qinxian Liu, Nanjin Li, Chihao Dai, Lihua Wang, Xiaohui Liu, Lei Zhang, Yutao Xie

Our training corpus mainly consists of academic papers, thesis, content from some academic domain, high-quality Chinese data and others.

General Knowledge Question Answering

Paper
Add Code

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

1 code implementation • 9 Nov 2023 • Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang, Jianfeng Gao, Chunyuan Li

LLaVA-Plus is a general-purpose multimodal assistant that expands the capabilities of large multimodal models.

Ranked #1 on LMM real-life tasks on Leaderboard

Instruction Following LLM real-life tasks +3

634

Paper
Code

MatNexus: A Comprehensive Text Mining and Analysis Suite for Materials Discover

no code implementations • 7 Nov 2023 • Lei Zhang, Markus Stricker

MatNexus is a specialized software for the automated collection, processing, and analysis of text from scientific articles.

Retrieval Word Embeddings

Paper
Add Code

Optimization-Free Test-Time Adaptation for Cross-Person Activity Recognition

1 code implementation • 28 Oct 2023 • Shuoyuan Wang, Jindong Wang, Huajun Xi, Bob Zhang, Lei Zhang, Hongxin Wei

However, the high computational cost of optimization-based TTA algorithms makes it intractable to run on resource-constrained edge devices.

Computational Efficiency Human Activity Recognition +2

Paper
Code

A Wireless AI-Generated Content (AIGC) Provisioning Framework Empowered by Semantic Communication

no code implementations • 26 Oct 2023 • Runze Cheng, Yao Sun, Dusit Niyato, Lan Zhang, Lei Zhang, Muhammad Ali Imran

Generative AI applications are recently catering to a vast user base by creating diverse and high-quality AI-generated content (AIGC).

Decoder

Paper
Add Code

Adapt Anything: Tailor Any Image Classifiers across Domains And Categories Using Text-to-Image Diffusion Models

no code implementations • 25 Oct 2023 • WeiJie Chen, Haoyu Wang, Shicai Yang, Lei Zhang, Wei Wei, Yanning Zhang, Luojun Lin, Di Xie, Yueting Zhuang

Such a one-for-all adaptation paradigm allows us to adapt anything in the world using only one text-to-image generator as well as the corresponding unlabeled target data.

Domain Adaptation Image Classification

Paper
Add Code

Open-Set Image Tagging with Multi-Grained Text Supervision

2 code implementations • 23 Oct 2023 • Xinyu Huang, Yi-Jie Huang, Youcai Zhang, Weiwei Tian, Rui Feng, Yuejie Zhang, Yanchun Xie, Yaqian Li, Lei Zhang

Specifically, for predefined commonly used tag categories, RAM++ showcases 10. 2 mAP and 15. 4 mAP enhancements over CLIP on OpenImages and ImageNet.

Human-Object Interaction Detection Open Set Learning +1

2,458

Paper
Code

HumanTOMATO: Text-aligned Whole-body Motion Generation

no code implementations • 19 Oct 2023 • Shunlin Lu, Ling-Hao Chen, Ailing Zeng, Jing Lin, Ruimao Zhang, Lei Zhang, Heung-Yeung Shum

This work targets a novel text-driven whole-body motion generation task, which takes a given textual description as input and aims at generating high-quality, diverse, and coherent facial expressions, hand gestures, and body motions simultaneously.

Paper
Add Code

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts

no code implementations • 18 Oct 2023 • Xinhua Cheng, Tianyu Yang, Jianan Wang, Yu Li, Lei Zhang, Jian Zhang, Li Yuan

Recent text-to-3D generation methods achieve impressive 3D content creation capacity thanks to the advances in image diffusion models and optimizing strategies.

3D Generation Text to 3D

Paper
Add Code

Label-efficient Segmentation via Affinity Propagation

1 code implementation • NeurIPS 2023 • Wentong Li, Yuqian Yuan, Song Wang, Wenyu Liu, Dongqi Tang, Jian Liu, Jianke Zhu, Lei Zhang

In this work, we formulate the affinity modeling as an affinity propagation process, and propose a local and a global pairwise affinity terms to generate accurate soft pseudo labels.

Box-supervised Instance Segmentation Segmentation +2

Paper
Code

TOSS:High-quality Text-guided Novel View Synthesis from a Single Image

no code implementations • 16 Oct 2023 • Yukai Shi, Jianan Wang, He Cao, Boshi Tang, Xianbiao Qi, Tianyu Yang, Yukun Huang, Shilong Liu, Lei Zhang, Heung-Yeung Shum

In this paper, we present TOSS, which introduces text to the task of novel view synthesis (NVS) from just a single RGB image.

Image-to-Image Translation Novel View Synthesis

Paper
Add Code

Transport-Hub-Aware Spatial-Temporal Adaptive Graph Transformer for Traffic Flow Prediction

1 code implementation • 12 Oct 2023 • Xiao Xu, Lei Zhang, Bailong Liu, Zhizhen Liang, Xuefei Zhang

Finally, we design an extra spatial-temporal knowledge distillation module for incremental learning of traffic flow prediction tasks.

Incremental Learning Knowledge Distillation

Paper
Code

UniPose: Detecting Any Keypoints

1 code implementation • 12 Oct 2023 • Jie Yang, Ailing Zeng, Ruimao Zhang, Lei Zhang

This work proposes a unified framework called UniPose to detect keypoints of any articulated (e. g., human and animal), rigid, and soft objects via visual or textual prompts for fine-grained vision understanding and manipulation.

Ranked #1 on 2D Human Pose Estimation on Human-Art (using extra training data)

2D Human Pose Estimation 2D Pose Estimation +4

239

Paper
Code

Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval

no code implementations • 12 Oct 2023 • Pandeng Li, Hongtao Xie, Jiannan Ge, Lei Zhang, Shaobo Min, Yongdong Zhang

Hence, we address this problem by decomposing video information into reconstruction-dependent and semantic-dependent information, which disentangles the semantic extraction from reconstruction constraint.

Retrieval Semantic Retrieval +3

Paper
Add Code

Consistent123: Improve Consistency for One Image to 3D Object Synthesis

no code implementations • 12 Oct 2023 • Haohan Weng, Tianyu Yang, Jianan Wang, Yu Li, Tong Zhang, C. L. Philip Chen, Lei Zhang

Large image diffusion models enable novel view synthesis with high quality and excellent zero-shot capability.

3D Generation 3D Reconstruction +3

Paper
Add Code

X-Transfer: A Transfer Learning-Based Framework for GAN-Generated Fake Image Detection

no code implementations • 7 Oct 2023 • Lei Zhang, Hao Chen, Shu Hu, Bin Zhu, Ching Sheng Lin, Xi Wu, Jinrong Hu, Xin Wang

Generative adversarial networks (GANs) have remarkably advanced in diverse domains, especially image generation and editing.

Fake Image Detection Image Generation +1

Paper
Add Code

Enhancing Accuracy in Deep Learning Using Random Matrix Theory

no code implementations • 4 Oct 2023 • Leonid Berlyand, Etienne Sandier, Yitzchak Shmalo, Lei Zhang

We explore the applications of random matrix theory (RMT) in the training of deep neural networks (DNNs), focusing on layer pruning that is reducing the number of DNN parameters (weights).

Paper
Add Code

CPPF: A contextual and post-processing-free model for automatic speech recognition

no code implementations • 14 Sep 2023 • Lei Zhang, Zhengkun Tian, Xiang Chen, Jiaming Sun, Hongyu Xiang, Ke Ding, Guanglu Wan

To address this issue, we draw inspiration from the multifaceted capabilities of LLMs and Whisper, and focus on integrating multiple ASR text processing tasks related to speech recognition into the ASR model.

Automatic Speech Recognition speech-recognition +1

Paper
Add Code

Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices

no code implementations • 5 Sep 2023 • Bojia Zi, Xianbiao Qi, Lingzhi Wang, Jianan Wang, Kam-Fai Wong, Lei Zhang

In this paper, we present Delta-LoRA, which is a novel parameter-efficient approach to fine-tune large language models (LLMs).

Paper
Add Code

Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization

1 code implementation • 28 Aug 2023 • Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, Lei Zhang

Diffusion models have demonstrated impressive performance in various image generation, editing, enhancement and translation tasks.

Image Enhancement Image Generation +3

795

Paper
Code

Neural Interactive Keypoint Detection

1 code implementation • ICCV 2023 • Jie Yang, Ailing Zeng, Feng Li, Shilong Liu, Ruimao Zhang, Lei Zhang

Click-Pose explores how user feedback can cooperate with a neural keypoint detector to correct the predicted keypoints in an interactive way for a faster and more effective annotation process.

Decoder Keypoint Detection

Paper
Code

Uncovering User Interest from Biased and Noised Watch Time in Video Recommendation

1 code implementation • 16 Aug 2023 • Haiyuan Zhao, Lei Zhang, Jun Xu, Guohao Cai, Zhenhua Dong, Ji-Rong Wen

In the video recommendation, watch time is commonly adopted as an indicator of user interest.

Paper
Code

Exploring Winograd Convolution for Cost-effective Neural Network Fault Tolerance

no code implementations • 16 Aug 2023 • Xinghua Xue, Cheng Liu, Bo Liu, Haitong Huang, Ying Wang, Tao Luo, Lei Zhang, Huawei Li, Xiaowei Li

When it is applied on fault-tolerant neural networks enhanced with fault-aware retraining and constrained activation functions, the resulting model accuracy generally shows significant improvement in presence of various faults.

Computational Efficiency

Paper
Add Code

Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation

1 code implementation • ICCV 2023 • Yichen Yuan, Yifan Wang, Lijun Wang, Xiaoqi Zhao, Huchuan Lu, Yu Wang, Weibo Su, Lei Zhang

Recent leading zero-shot video object segmentation (ZVOS) works devote to integrating appearance and motion information by elaborately designing feature fusion modules and identically applying them in multiple feature stages.

Semantic Segmentation Video Object Segmentation +2

Paper
Code

A Benchmark for Chinese-English Scene Text Image Super-resolution

1 code implementation • ICCV 2023 • jianqi ma, Zhetong Liang, Wangmeng Xiang, Xi Yang, Lei Zhang

Scene Text Image Super-resolution (STISR) aims to recover high-resolution (HR) scene text images with visually pleasant and readable text content from the given low-resolution (LR) input.

Image Super-Resolution

Paper
Code

Point2Mask: Point-supervised Panoptic Segmentation via Optimal Transport

1 code implementation • ICCV 2023 • Wentong Li, Yuqian Yuan, Song Wang, Jianke Zhu, Jianshu Li, Jian Liu, Lei Zhang

Weakly-supervised image segmentation has recently attracted increasing research attentions, aiming to avoid the expensive pixel-wise labeling.

Image Segmentation Panoptic Segmentation

Paper
Code

DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting

no code implementations • ICCV 2023 • Hongyang Li, Hao Zhang, Zhaoyang Zeng, Shilong Liu, Feng Li, Tianhe Ren, Lei Zhang

Existing feature lifting approaches, such as Lift-Splat-based and 2D attention-based, either use estimated depth to get pseudo LiDAR features and then splat them to a 3D space, which is a one-pass operation without feature refinement, or ignore depth and lift features by 2D attention mechanisms, which achieve finer semantics while suffering from a depth ambiguity problem.

3D Object Detection object-detection

Paper
Add Code

CORE: Cooperative Reconstruction for Multi-Agent Perception

1 code implementation • ICCV 2023 • Binglu Wang, Lei Zhang, Zhaozhong Wang, Yongqiang Zhao, Tianfei Zhou

This paper presents CORE, a conceptually simple, effective and communication-efficient model for multi-agent cooperative perception.

3D Object Detection object-detection +1

Paper
Code

Neural Quantile Optimization for Edge-Cloud Computing

no code implementations • 11 Jul 2023 • Bin Du, He Zhang, Xiangle Cheng, Lei Zhang

The network structure reflects the edge-cloud computing topology and is trained to minimize the expectation of the cost function for unconstrained continuous optimization problems.

Cloud Computing

Paper
Add Code

Semantic-SAM: Segment and Recognize Anything at Any Granularity

1 code implementation • 10 Jul 2023 • Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao

In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.

Image Segmentation Segmentation +1

1,952

Paper
Code

Lightweight Improved Residual Network for Efficient Inverse Tone Mapping

1 code implementation • 8 Jul 2023 • Liqi Xue, Tianyi Xu, Yongbao Song, Yan Liu, Lei Zhang, XianTong Zhen, Jun Xu

But the majority of media images on the internet remain in 8-bit standard dynamic range (SDR) format.

Image Reconstruction inverse tone mapping +2

Paper
Code

MomentDiff: Generative Video Moment Retrieval from Random to Real

1 code implementation • NeurIPS 2023 • Pandeng Li, Chen-Wei Xie, Hongtao Xie, Liming Zhao, Lei Zhang, Yun Zheng, Deli Zhao, Yongdong Zhang

Video moment retrieval pursues an efficient and generalized solution to identify the specific temporal segments within an untrimmed video that correspond to a given language description.

Moment Retrieval Retrieval

Paper
Code

Impact of UAVs Equipped with ADS-B on the Civil Aviation Monitoring System

no code implementations • 4 Jul 2023 • Yiyang Liao, Lei Zhang, Ziye Jia, Chao Dong, Yifan Zhang, Qihui Wu, Huiling Hu, Bin Wang

However, due to the limited frequency of ADS-B technique, UAVs equipped with ADS-B devices result in the loss of packets to both UAVs and civil aviation.

Blocking Position

Paper
Add Code

Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset

1 code implementation • NeurIPS 2023 • Jing Lin, Ailing Zeng, Shunlin Lu, Yuanhao Cai, Ruimao Zhang, Haoqian Wang, Lei Zhang

In this paper, we present Motion-X, a large-scale 3D expressive whole-body motion dataset.

Human Mesh Recovery text annotation

445

Paper
Code

Steganographic Capacity of Deep Learning Models

no code implementations • 25 Jun 2023 • Lei Zhang, Dong Li, Olha Jurečková, Mark Stamp

We find that the steganographic capacity of the learning models tested is surprisingly high, and that in each case, there is a clear threshold after which model performance rapidly degrades.

Malware Classification

Paper
Add Code

DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation

no code implementations • 21 Jun 2023 • Yukun Huang, Jianan Wang, Yukai Shi, Boshi Tang, Xianbiao Qi, Lei Zhang

Text-to-image diffusion models pre-trained on billions of image-text pairs have recently enabled 3D content creation by optimizing a randomly initialized differentiable 3D representation with score distillation.

3D Generation Text to 3D

Paper
Add Code

Understanding Optimization of Deep Learning via Jacobian Matrix and Lipschitz Constant

no code implementations • 15 Jun 2023 • Xianbiao Qi, Jianan Wang, Lei Zhang

This article provides a comprehensive understanding of optimization in deep learning, with a primary focus on the challenges of gradient vanishing and gradient exploding, which normally lead to diminished model representational ability and training instability, respectively.

Paper
Add Code

detrex: Benchmarking Detection Transformers

1 code implementation • 12 Jun 2023 • Tianhe Ren, Shilong Liu, Feng Li, Hao Zhang, Ailing Zeng, Jie Yang, Xingyu Liao, Ding Jia, Hongyang Li, He Cao, Jianan Wang, Zhaoyang Zeng, Xianbiao Qi, Yuhui Yuan, Jianwei Yang, Lei Zhang

To address this issue, we develop a unified, highly modular, and lightweight codebase called detrex, which supports a majority of the mainstream DETR-based instance recognition algorithms, covering various fundamental tasks, including object detection, segmentation, and pose estimation.

Benchmarking object-detection +2

1,834

Paper
Code

Recognize Anything: A Strong Image Tagging Model

2 code implementations • 6 Jun 2023 • Youcai Zhang, Xinyu Huang, Jinyu Ma, Zhaoyang Li, Zhaochuan Luo, Yanchun Xie, Yuzhuo Qin, Tong Luo, Yaqian Li, Shilong Liu, Yandong Guo, Lei Zhang

We are releasing the RAM at \url{https://recognize-anything. github. io/} to foster the advancements of large models in computer vision.

Semantic Parsing

2,458

Paper
Code

Efficient and Interpretable Compressive Text Summarisation with Unsupervised Dual-Agent Reinforcement Learning

1 code implementation • 6 Jun 2023 • Peggy Tang, Junbin Gao, Lei Zhang, Zhiyong Wang

Recently, compressive text summarisation offers a balance between the conciseness issue of extractive summarisation and the factual hallucination issue of abstractive summarisation.

Hallucination reinforcement-learning

Paper
Code

Inferring and Leveraging Parts from Object Shape for Improving Semantic Image Synthesis

1 code implementation • CVPR 2023 • Yuxiang Wei, Zhilong Ji, Xiaohe Wu, Jinfeng Bai, Lei Zhang, WangMeng Zuo

Despite the progress in semantic image synthesis, it remains a challenging problem to generate photo-realistic parts from input semantic map.

Image Generation Object

Paper
Code

Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model

1 code implementation • 20 May 2023 • Jie Yang, Bingliang Li, Fengyu Yang, Ailing Zeng, Lei Zhang, Ruimao Zhang

Extensive experiments demonstrate that DiffHOI significantly outperforms the state-of-the-art in regular detection (i. e., 41. 50 mAP) and zero-shot detection.

Ranked #2 on Zero-Shot Human-Object Interaction Detection on HICO-DET (using extra training data)

Human-Object Interaction Detection Zero-Shot Human-Object Interaction Detection

Paper
Code

Cognition Guided Human-Object Relationship Detection

no code implementations • journal 2023 • Zhitao Zeng, Pengwen Dai, Xuan Zhang, Lei Zhang, Xiaochun Cao

Human-object relationship detection reveals the fine-grained relationship between humans and objects, helping the comprehensive understanding of videos.

Decoder Human-Object Relationship Detection +2

Paper
Add Code

MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks

1 code implementation • 28 Apr 2023 • Lei Zhang, Yuge Zhang, Kan Ren, Dongsheng Li, Yuqing Yang

In contrast, though human engineers have the incredible ability to understand tasks and reason about solutions, their experience and knowledge are often sparse and difficult to utilize by quantitative approaches.

AutoML Code Generation

Paper
Code

A marker-less human motion analysis system for motion-based biomarker discovery in knee disorders

no code implementations • 26 Apr 2023 • Kai Armstrong, Lei Zhang, Yan Wen, Alexander P. Willmott, Paul Lee, Xujioing Ye

In recent years the NHS has been having increased difficulty seeing all low-risk patients, this includes but not limited to suspected osteoarthritis (OA) patients.

Paper
Add Code

A Strong and Reproducible Object Detector with Only Public Datasets

3 code implementations • 25 Apr 2023 • Tianhe Ren, Jianwei Yang, Shilong Liu, Ailing Zeng, Feng Li, Hao Zhang, Hongyang Li, Zhaoyang Zeng, Lei Zhang

This work presents Focal-Stable-DINO, a strong and reproducible object detection model which achieves 64. 6 AP on COCO val2017 and 64. 8 AP on COCO test-dev using only 700M parameters without any test time augmentation.

Ranked #5 on Object Detection on COCO minival (using extra training data)

object-detection Object Detection

651

Paper
Code

Glocal Energy-based Learning for Few-Shot Open-Set Recognition

1 code implementation • CVPR 2023 • Haoyu Wang, Guansong Pang, Peng Wang, Lei Zhang, Wei Wei, Yanning Zhang

Few-shot open-set recognition (FSOR) is a challenging task of great practical value.

Open Set Learning

Paper
Code

LipsFormer: Introducing Lipschitz Continuity to Vision Transformers

1 code implementation • 19 Apr 2023 • Xianbiao Qi, Jianan Wang, Yihao Chen, Yukai Shi, Lei Zhang

In contrast to previous practical tricks that address training instability by learning rate warmup, layer normalization, attention formulation, and weight initialization, we show that Lipschitz continuity is a more essential property to ensure training stability.

Paper
Code

DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training

1 code implementation • CVPR 2023 • Yihao Chen, Xianbiao Qi, Jianan Wang, Lei Zhang

In this way, we can reduce the GPU memory consumption of contrastive loss computation from $\bigO(B^2)$ to $\bigO(\frac{B^2}{N})$, where $B$ and $N$ are the batch size and the number of GPUs used for training.

Contrastive Learning

Paper
Code

Language Guided Local Infiltration for Interactive Image Retrieval

no code implementations • 16 Apr 2023 • Fuxiang Huang, Lei Zhang

Interactive Image Retrieval (IIR) aims to retrieve images that are generally similar to the reference image but under the requested text modification.

Image Retrieval Retrieval +1

Paper
Add Code

Detection Transformer with Stable Matching

2 code implementations • ICCV 2023 • Shilong Liu, Tianhe Ren, Jiayu Chen, Zhaoyang Zeng, Hao Zhang, Feng Li, Hongyang Li, Jun Huang, Hang Su, Jun Zhu, Lei Zhang

We point out that the unstable matching in DETR is caused by a multi-optimization path problem, which is highlighted by the one-to-one matching design in DETR.

Decoder Position

183

Paper
Code

HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation

1 code implementation • ICCV 2023 • Xuan Ju, Ailing Zeng, Chenchen Zhao, Jianan Wang, Lei Zhang, Qiang Xu

While such a plug-and-play approach is appealing, the inevitable and uncertain conflicts between the original images produced from the frozen SD branch and the given condition incur significant challenges for the learnable branch, which essentially conducts image feature editing for condition enforcement.

Denoising Image Generation

250

Paper
Code

Multi-view Adversarial Discriminator: Mine the Non-causal Factors for Object Detection in Unseen Domains

1 code implementation • CVPR 2023 • Mingjun Xu, Lingyun Qin, WeiJie Chen, ShiLiang Pu, Lei Zhang

In this work, we present an idea to remove non-causal factors from common features by multi-view adversarial training on source domains, because we observe that such insignificant non-causal factors may still be significant in other latent spaces (views) due to the multi-mode structure of data.

Domain Generalization object-detection +1

Paper
Code

Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer

no code implementations • 31 Mar 2023 • Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Lei Zhang, Ran He

Unsupervised Domain Adaptation (UDA) can effectively address domain gap issues in real-world image Super-Resolution (SR) by accessing both the source and target data.

Image Super-Resolution Source-Free Domain Adaptation +1

Paper
Add Code

One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer

1 code implementation • CVPR 2023 • Jing Lin, Ailing Zeng, Haoqian Wang, Lei Zhang, Yu Li

It is challenging to perform this task with a single network due to resolution issues, i. e., the face and hands are usually located in extremely small regions.

Ranked #3 on 3D Human Pose Estimation on UBody

3D Human Pose Estimation 3D Human Reconstruction +2

573

Paper
Code

OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering

1 code implementation • CVPR 2023 • Zhiyuan Ma, Xiangyu Zhu, GuoJun Qi, Zhen Lei, Lei Zhang

In this paper, we propose One-shot Talking face Avatar (OTAvatar), which constructs face avatars by a generalized controllable tri-plane rendering solution so that each personalized avatar can be constructed from only one portrait as the reference.

293

Paper
Code

BoxVIS: Video Instance Segmentation with Box Annotations

1 code implementation • 26 Mar 2023 • Minghan Li, Lei Zhang

As a result, the amount of pixel-wise annotations in existing video instance segmentation (VIS) datasets is small, limiting the generalization capability of trained VIS models.

Ranked #16 on Video Instance Segmentation on YouTube-VIS 2021

Instance Segmentation Semantic Segmentation +2

Paper
Code

Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use Cases

1 code implementation • 26 Mar 2023 • Yunjie Ji, Yong Deng, Yan Gong, Yiping Peng, Qiang Niu, Lei Zhang, Baochang Ma, Xiangang Li

However current research rarely studies the impact of different amounts of instruction data on model performance, especially in the real-world use cases.

Math

7,579

Paper
Code

MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos

1 code implementation • CVPR 2023 • Minghan Li, Shuai Li, Wangmeng Xiang, Lei Zhang

The proposed MDQE is the first VIS method with per-clip input that achieves state-of-the-art results on challenging videos and competitive performance on simple videos.

Ranked #13 on Video Instance Segmentation on YouTube-VIS 2021

Instance Segmentation Semantic Segmentation +1

Paper
Code

Human Guided Ground-truth Generation for Realistic Image Super-resolution

1 code implementation • CVPR 2023 • Du Chen, Jie Liang, Xindong Zhang, Ming Liu, Hui Zeng, Lei Zhang

A human guided GT image dataset with both positive and negative samples is then constructed, and a loss function is proposed to train the Real-ISR models.

Image Enhancement Image Super-Resolution

110

Paper
Code

One-to-Few Label Assignment for End-to-End Dense Detection

1 code implementation • CVPR 2023 • Shuai Li, Minghan Li, Ruihuang Li, Chenhang He, Lei Zhang

The positive and negative weights of these soft anchors are dynamically adjusted during training so that they can contribute more to ``representation learning'' in the early training stage, and contribute more to ``duplicated prediction removal'' in the later stage.

Decoder Representation Learning

Paper
Code

Sharpness-Aware Gradient Matching for Domain Generalization

1 code implementation • CVPR 2023 • Pengfei Wang, Zhaoxiang Zhang, Zhen Lei, Lei Zhang

In this paper, we present two conditions to ensure that the model could converge to a flat minimum with a small loss, and present an algorithm, named Sharpness-Aware Gradient Matching (SAGM), to meet the two conditions for improving model generalization capability.

Domain Generalization

Paper
Code

Towards Diverse Binary Segmentation via A Simple yet General Gated Network

1 code implementation • 18 Mar 2023 • Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu, Lei Zhang

They ignore two key problems when the encoder exchanges information with the decoder: one is the lack of interference control mechanism between them, the other is without considering the disparity of the contributions from different encoder levels.

Decoder Segmentation +1

159

Paper
Code

MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences

1 code implementation • CVPR 2023 • Chenhang He, Ruihuang Li, Yabin Zhang, Shuai Li, Lei Zhang

Current top-performing multi-frame detectors mostly follow a Detect-and-Fuse framework, which extracts features from each frame of the sequence and fuses them to detect the objects in the current frame.

3D Object Detection Autonomous Driving +1

Paper
Code

SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance Segmentation

1 code implementation • CVPR 2023 • Ruihuang Li, Chenhang He, Yabin Zhang, Shuai Li, Liyi Chen, Lei Zhang

Weakly supervised instance segmentation using only bounding box annotations has recently attracted much research attention.

Box-supervised Instance Segmentation Segmentation +2

Paper
Code

DynaMask: Dynamic Mask Selection for Instance Segmentation

1 code implementation • CVPR 2023 • Ruihuang Li, Chenhang He, Shuai Li, Yabin Zhang, Lei Zhang

The representative instance segmentation methods mostly segment different object instances with a mask of the fixed resolution, e. g., 28*28 grid.

Instance Segmentation Segmentation +1

Paper
Code

A Simple Framework for Open-Vocabulary Segmentation and Detection

2 code implementations • ICCV 2023 • Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang

We present OpenSeeD, a simple Open-vocabulary Segmentation and Detection framework that jointly learns from different segmentation and detection datasets.

Ranked #2 on Instance Segmentation on ADE20K val (using extra training data)

Instance Segmentation Panoptic Segmentation +2

1,251

Paper
Code

MP-Former: Mask-Piloted Transformer for Image Segmentation

1 code implementation • CVPR 2023 • Hao Zhang, Feng Li, Huaizhe xu, Shijia Huang, Shilong Liu, Lionel M. Ni, Lei Zhang

We present a mask-piloted Transformer which improves masked-attention in Mask2Former for image segmentation.

Decoder Image Segmentation +2

107

Paper
Code

Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR

1 code implementation • 13 Mar 2023 • Feng Li, Ailing Zeng, Shilong Liu, Hao Zhang, Hongyang Li, Lei Zhang, Lionel M. Ni

Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance.

object-detection Object Detection

176

Paper
Code

Synthesizing Realistic Image Restoration Training Pairs: A Diffusion Approach

no code implementations • 13 Mar 2023 • Tao Yang, Peiran Ren, Xuansong Xie, Lei Zhang

In supervised image restoration tasks, one key issue is how to obtain the aligned high-quality (HQ) and low-quality (LQ) training image pairs.

Denoising Image Restoration +1

Paper
Add Code

Tag2Text: Guiding Vision-Language Model via Image Tagging

2 code implementations • 10 Mar 2023 • Xinyu Huang, Youcai Zhang, Jinyu Ma, Weiwei Tian, Rui Feng, Yuejie Zhang, Yaqian Li, Yandong Guo, Lei Zhang

This paper presents Tag2Text, a vision language pre-training (VLP) framework, which introduces image tagging into vision-language models to guide the learning of visual-linguistic features.

Language Modelling TAG

2,458

Paper
Code

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

7 code implementations • 9 Mar 2023 • Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang

To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion.

Ranked #1 on Zero-Shot Object Detection on MSCOCO

Decoder Referring Expression +3

125,862

Paper
Code

Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes

1 code implementation • CVPR 2023 • Xuan Ju, Ailing Zeng, Jianan Wang, Qiang Xu, Lei Zhang

Humans have long been recorded in a variety of forms since antiquity.

3D Human Pose Estimation Human Detection +1

193

Paper
Code

Spatial-Frequency Attention for Image Denoising

no code implementations • 27 Feb 2023 • Shi Guo, Hongwei Yong, Xindong Zhang, jianqi ma, Lei Zhang

In this paper, we propose the spatial-frequency attention network (SFANet) to enhance the network's ability in exploiting long-range dependency.

Image Denoising

Paper
Add Code

ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation

1 code implementation • ICCV 2023 • Yuxiang Wei, Yabo Zhang, Zhilong Ji, Jinfeng Bai, Lei Zhang, WangMeng Zuo

In addition to the unprecedented ability in imaginary creation, large text-to-image models are expected to take customized concepts in image generation.

Text-to-Image Generation

482

Paper
Code

Introducing Depth into Transformer-based 3D Object Detection

no code implementations • 25 Feb 2023 • Hao Zhang, Hongyang Li, Ailing Zeng, Feng Li, Shilong Liu, Xingyu Liao, Lei Zhang

To address the second issue, we introduce an auxiliary learning task called Depth-aware Negative Suppression loss.

3D Object Detection Auxiliary Learning +3

Paper
Add Code

Towards a Sustainable Internet-of-Underwater-Things based on AUVs, SWIPT, and Reinforcement Learning

no code implementations • 21 Feb 2023 • Kenechi G. Omeke, Michael Mollel, Syed T. Shah, Lei Zhang, Qammer H. Abbasi, Muhammad Ali Imran

In this paper, we propose a sustainable scheme to improve the throughput and lifetime of underwater networks, enabling them to potentially operate indefinitely.

Decision Making Reinforcement Learning (RL)

Paper
Add Code

Variation Enhanced Attacks Against RRAM-based Neuromorphic Computing System

no code implementations • 20 Feb 2023 • Hao Lv, Bing Li, Lei Zhang, Cheng Liu, Ying Wang

The RRAM-based neuromorphic computing system has amassed explosive interests for its superior data processing capability and energy efficiency than traditional architectures, and thus being widely used in many data-centric applications.

Adversarial Attack

Paper
Add Code

Dual Graph Multitask Framework for Imbalanced Delivery Time Estimation

no code implementations • 15 Feb 2023 • Lei Zhang, Mingliang Wang, Xin Zhou, Xingyu Wu, Yiming Cao, Yonghui Xu, Lizhen Cui, Zhiqi Shen

To address the issue, we propose a novel Dual Graph Multitask framework for imbalanced Delivery Time Estimation (DGM-DTE).

Paper
Add Code

DRGCN: Dynamic Evolving Initial Residual for Deep Graph Convolutional Networks

1 code implementation • 10 Feb 2023 • Lei Zhang, Xiaodong Yan, Jianshan He, Ruopeng Li, Wei Chu

Our experimental results show that our model effectively relieves the problem of over-smoothing in deep GCNs and outperforms the state-of-the-art (SOTA) methods on various benchmark datasets.

Paper
Code

Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation

3 code implementations • 3 Feb 2023 • Jie Yang, Ailing Zeng, Shilong Liu, Feng Li, Ruimao Zhang, Lei Zhang

This paper presents a novel end-to-end framework with Explicit box Detection for multi-person Pose estimation, called ED-Pose, where it unifies the contextual learning between human-level (global) and keypoint-level (local) information.

Ranked #2 on 2D Human Pose Estimation on Human-Art

2D Human Pose Estimation Decoder +4

139

Paper
Code

Adversarial Style Augmentation for Domain Generalization

no code implementations • 30 Jan 2023 • Yabin Zhang, Bin Deng, Ruihuang Li, Kui Jia, Lei Zhang

By updating the model against the adversarial statistics perturbation during training, we allow the model to explore the worst-case domain and hence improve its generalization performance.

domain classification Domain Generalization +1

Paper
Add Code

Towards Precise Model-free Robotic Grasping with Sim-to-Real Transfer Learning

no code implementations • 28 Jan 2023 • Lei Zhang, Kaixin Bai, Zhaopeng Chen, Yunlei Shi, Jianwei Zhang

In physical robotic experiments, our grasping framework grasped single known objects and novel complex-shaped household objects with a success rate of 90. 91%.

Data Augmentation Robotic Grasping +1

Paper
Add Code

Towards Accurate Acne Detection via Decoupled Sequential Detection Head

no code implementations • 28 Jan 2023 • Xin Wei, Lei Zhang, Jianwei Zhang, Junyou Wang, Wenjie Liu, Jiaqi Li, Xian Jiang

In addition, we build a high-quality acne detection dataset named ACNE-DET to verify the effectiveness of DSDH.

Paper
Add Code

Human-Timescale Adaptation in an Open-Ended Task Space

no code implementations • 18 Jan 2023 • Adaptive Agent Team, Jakob Bauer, Kate Baumli, Satinder Baveja, Feryal Behbahani, Avishkar Bhoopchand, Nathalie Bradley-Schmieg, Michael Chang, Natalie Clay, Adrian Collister, Vibhavari Dasagi, Lucy Gonzalez, Karol Gregor, Edward Hughes, Sheleem Kashem, Maria Loks-Thompson, Hannah Openshaw, Jack Parker-Holder, Shreya Pathak, Nicolas Perez-Nieves, Nemanja Rakicevic, Tim Rocktäschel, Yannick Schroecker, Jakub Sygnowski, Karl Tuyls, Sarah York, Alexander Zacherl, Lei Zhang

Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (RL).

In-Context Learning Meta Reinforcement Learning +3

Paper
Add Code

A General Regret Bound of Preconditioned Gradient Method for DNN Training

1 code implementation • CVPR 2023 • Hongwei Yong, Ying Sun, Lei Zhang

Though the full-matrix preconditioned gradient methods theoretically have a lower regret bound, they are impractical for use to train DNNs because of the high complexity.

Image Classification object-detection +1

Paper
Code

FPR: False Positive Rectification for Weakly Supervised Semantic Segmentation

1 code implementation • ICCV 2023 • Liyi Chen, Chenyang Lei, Ruihuang Li, Shuai Li, Zhaoxiang Zhang, Lei Zhang

Without introducing any external supervision and human priors, the proposed FPR effectively suppresses wrong activations from the background objects.

Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation

Paper
Code

Lite DETR: An Interleaved Multi-Scale Encoder for Efficient DETR

no code implementations • CVPR 2023 • Feng Li, Ailing Zeng, Shilong Liu, Hao Zhang, Hongyang Li, Lei Zhang, Lionel M. Ni

Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance.

object-detection Object Detection

Paper
Add Code

Revisiting Prototypical Network for Cross Domain Few-Shot Learning

1 code implementation • CVPR 2023 • Fei Zhou, Peng Wang, Lei Zhang, Wei Wei, Yanning Zhang

Prototypical Network is a popular few-shot solver that aims at establishing a feature metric generalizable to novel few-shot classification (FSC) tasks using deep neural networks.

cross-domain few-shot learning Knowledge Distillation

Paper
Code

Automatic Network Pruning via Hilbert-Schmidt Independence Criterion Lasso under Information Bottleneck Principle

1 code implementation • ICCV 2023 • Song Guo, Lei Zhang, Xiawu Zheng, Yan Wang, Yuchao Li, Fei Chao, Chenglin Wu, Shengchuan Zhang, Rongrong Ji

In this paper, we try to solve this problem by introducing a principled and unified framework based on Information Bottleneck (IB) theory, which further guides us to an automatic pruning approach.

Network Pruning

Paper
Code

Joint HDR Denoising and Fusion: A Real-World Mobile HDR Image Dataset

1 code implementation • CVPR 2023 • Shuaizheng Liu, Xindong Zhang, Lingchen Sun, Zhetong Liang, Hui Zeng, Lei Zhang

In this work, we develop, for the first time to our best knowledge, an HDR image dataset by using mobile phone cameras, namely Mobile-HDR dataset.

Denoising

Paper
Code

Towards Fairness-aware Adversarial Network Pruning

no code implementations • ICCV 2023 • Lei Zhang, Zhibo Wang, Xiaowei Dong, Yunhe Feng, Xiaoyi Pang, Zhifei Zhang, Kui Ren

Network pruning aims to compress models while minimizing loss in accuracy.

Fairness Network Pruning

Paper
Add Code

Exploring Vision Transformers as Diffusion Learners

no code implementations • 28 Dec 2022 • He Cao, Jianan Wang, Tianhe Ren, Xianbiao Qi, Yihao Chen, Yuan YAO, Lei Zhang

We further provide a hypothesis on the implication of disentangling the generative backbone as an encoder-decoder structure and show proof-of-concept experiments verifying the effectiveness of a stronger encoder for generative tasks with ASymmetriC ENcoder Decoder (ASCEND).

Decoder

Paper
Add Code

Accelerating Dataset Distillation via Model Augmentation

2 code implementations • CVPR 2023 • Lei Zhang, Jie Zhang, Bowen Lei, Subhabrata Mukherjee, Xiang Pan, Bo Zhao, Caiwen Ding, Yao Li, Dongkuan Xu

Dataset Distillation (DD), a newly emerging field, aims at generating much smaller but efficient synthetic training datasets from large ones.

1,185

Paper
Code

Multi-adversarial Faster-RCNN with Paradigm Teacher for Unrestricted Object Detection

no code implementations • International Journal of Computer Vision 2022 • Zhenwei He, Lei Zhang, Xinbo Gao, David Zhang

Our proposed MAF has two distinct contributions: (1) The Hierarchical Domain Feature Alignment (HDFA) module is introduced to minimize the image-level domain disparity, where Scale Reduction Module (SRM) reduces the feature map size without information loss and increases the training efficiency.

Domain Adaptation Knowledge Distillation +2

Paper
Add Code

Benchmark Dataset and Effective Inter-Frame Alignment for Real-World Video Super-Resolution

1 code implementation • 10 Dec 2022 • Ruohao Wang, Xiaohui Liu, Zhilu Zhang, Xiaohe Wu, Chun-Mei Feng, Lei Zhang, WangMeng Zuo

On the other hand, alignment algorithms in existing VSR methods perform poorly for real-world videos, leading to unsatisfactory results.

Optical Flow Estimation Video Super-Resolution

Paper
Code

Box2Mask: Box-supervised Instance Segmentation via Level-set Evolution

2 code implementations • 3 Dec 2022 • Wentong Li, Wenyu Liu, Jianke Zhu, Miaomiao Cui, Risheng Yu, Xiansheng Hua, Lei Zhang

In contrast to fully supervised methods using pixel-wise mask labels, box-supervised instance segmentation takes advantage of simple box annotations, which has recently attracted increasing research attention.

Ranked #1 on Box-supervised Instance Segmentation on PASCAL VOC 2012 val

Box-supervised Instance Segmentation Decoder +1

400

Paper
Code

Inconsistency Ranking-based Noisy Label Detection for High-quality Data

1 code implementation • 1 Dec 2022 • Ruibin Yuan, Hanzhi Yin, Yi Wang, Yifan He, Yushi Ye, Lei Zhang, Zhizheng Wu

We apply this technique to the automatic speaker verification (ASV) task as a proof of concept.

Metric Learning Speaker Recognition +1

Paper
Code

DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding

1 code implementation • 28 Nov 2022 • Shilong Liu, Yaoyuan Liang, Feng Li, Shijia Huang, Hao Zhang, Hang Su, Jun Zhu, Lei Zhang

As phrase extraction can be regarded as a $1$D text segmentation problem, we formulate PEG as a dual detection problem and propose a novel DQ-DETR model, which introduces dual queries to probe different features from image and text for object prediction and phrase mask prediction.

Ranked #7 on Referring Expression Comprehension on RefCOCO

object-detection Object Detection +4

Paper
Code

Parameter-Efficient Transformer with Hybrid Axial-Attention for Medical Image Segmentation

no code implementations • 17 Nov 2022 • Yiyue Hu, Lei Zhang, Nan Mu, Lei Liu

To this end, we propose a parameter-efficient transformer to explore intrinsic inductive bias via position information for medical image segmentation.

feature selection Image Segmentation +4

Paper
Add Code

A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation

no code implementations • 15 Nov 2022 • Shijia Huang, Feng Li, Hao Zhang, Shilong Liu, Lei Zhang, LiWei Wang

Our mutual supervision contains two directions.

Reference Expression Generation Referring Expression +2

Paper
Add Code

Point-DAE: Denoising Autoencoders for Self-supervised Point Cloud Learning

1 code implementation • 13 Nov 2022 • Yabin Zhang, Jiehong Lin, Ruihuang Li, Kui Jia, Lei Zhang

We also validate the effectiveness of affine transformation corruption with the Transformer backbones, where we decompose the reconstruction of the complete point cloud into the reconstructions of detailed local patches and rough global shape, alleviating the position leakage problem in the reconstruction.

3D Object Detection Decoder +3

Paper
Code

Rethinking the transfer learning for FCN based polyp segmentation in colonoscopy

1 code implementation • 4 Nov 2022 • Yan Wen, Lei Zhang, Xiangli Meng, Xujiong Ye

Besides the complex nature of colonoscopy frames with intrinsic frame formation artefacts such as light reflections and the diversity of polyp types/shapes, the publicly available polyp segmentation training datasets are limited, small and imbalanced.

Segmentation Transfer Learning

Paper
Code

Mining Word Boundaries in Speech as Naturally Annotated Word Segmentation Data

no code implementations • 31 Oct 2022 • Lei Zhang, Zhenghua Li, Shilin Zhou, Chen Gong, Zhefeng Wang, Baoxing Huai, Min Zhang

Inspired by early research on exploring naturally annotated data for Chinese word segmentation (CWS), and also by recent research on integration of speech and text processing, this work for the first time proposes to mine word boundaries from parallel speech/text data.

Chinese Word Segmentation

Paper
Add Code

Mitigating spectral bias for the multiscale operator learning with hierarchical attention

no code implementations • 19 Oct 2022 • Xinliang Liu, Bo Xu, Lei Zhang

Neural operators have emerged as a powerful tool for learning the mapping between infinite-dimensional parameter and solution spaces of partial differential equations (PDEs).

Operator learning

Paper
Add Code

Motion correction in MRI using deep learning and a novel hybrid loss function

1 code implementation • 19 Oct 2022 • Lei Zhang, Xiaoke Wang, Michael Rawson, Radu Balan, Edward H. Herskovits, Elias Melhem, Linda Chang, Ze Wang, Thomas Ernst

Evaluation used simulated T1 and T2-weighted axial, coronal, and sagittal images unseen during training, as well as T1-weighted images with motion artifacts from real scans.

SSIM

Paper
Code

TLDW: Extreme Multimodal Summarisation of News Videos

no code implementations • 16 Oct 2022 • Peggy Tang, Kun Hu, Lei Zhang, Jiebo Luo, Zhiyong Wang

Multimodal summarisation with multimodal output is drawing increasing attention due to the rapid growth of multimedia data.

Sentence

Paper
Add Code

Learning Dual Memory Dictionaries for Blind Face Restoration

1 code implementation • 15 Oct 2022 • Xiaoming Li, Shiguang Zhang, Shangchen Zhou, Lei Zhang, WangMeng Zuo

Generally, it is a challenging and intractable task to improve the photo-realistic performance of blind restoration and adaptively handle the generic and specific restoration scenarios with a single unified model.

Blind Face Restoration

118

Paper
Code

Attention Diversification for Domain Generalization

1 code implementation • 9 Oct 2022 • Rang Meng, Xianfeng Li, WeiJie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Mingli Song, Di Xie, ShiLiang Pu

Under this guidance, a novel Attention Diversification framework is proposed, in which Intra-Model and Inter-Model Attention Diversification Regularization are collaborated to reassign appropriate attention to diverse task-related features.

Domain Generalization

Paper
Code

From Face to Natural Image: Learning Real Degradation for Blind Image Super-Resolution

1 code implementation • 3 Oct 2022 • Xiaoming Li, Chaofeng Chen, Xianhui Lin, WangMeng Zuo, Lei Zhang

Notably, LQ face images, which may have the same degradation process as natural images, can be robustly restored with photo-realistic textures by exploiting their strong structural priors.

Image Generation Image Super-Resolution

105

Paper
Code

Skin Lesion Recognition with Class-Hierarchy Regularized Hyperbolic Embeddings

no code implementations • 13 Sep 2022 • Zhen Yu, Toan Nguyen, Yaniv Gal, Lie Ju, Shekhar S. Chandra, Lei Zhang, Paul Bonnington, Victoria Mar, Zhiyong Wang, ZongYuan Ge

Accordingly, the learned prototypes preserve the semantic class relations in the embedding space and we can predict the label of an image by assigning its feature to the nearest hyperbolic class prototype.

Paper
Add Code

Statistical Foundation Behind Machine Learning and Its Impact on Computer Vision

no code implementations • 6 Sep 2022 • Lei Zhang, Heung-Yeung Shum

This paper revisits the principle of uniform convergence in statistical learning, discusses how it acts as the foundation behind machine learning, and attempts to gain a better understanding of the essential problem that current deep learning algorithms are solving.

Representation Learning

Paper
Add Code

Recurrent LSTM-based UAV Trajectory Prediction with ADS-B Information

no code implementations • 1 Sep 2022 • Yifan Zhang, Ziye Jia, Chao Dong, Yuntian Liu, Lei Zhang, Qihui Wu

It is noted that the recurrent neural network (RNN) is available for the UAV trajectory prediction, in which the long short-term memory (LSTM) is specialized in dealing with the time-series data.

Time Series Analysis Trajectory Prediction

Paper
Add Code

Generative Action Description Prompts for Skeleton-based Action Recognition

3 code implementations • ICCV 2023 • Wangmeng Xiang, Chao Li, Yuxuan Zhou, Biao Wang, Lei Zhang

More specifically, we employ a pre-trained large-scale language model as the knowledge engine to automatically generate text descriptions for body parts movements of actions, and propose a multi-modal training scheme by utilizing the text encoder to generate feature vectors for different body parts and supervise the skeleton encoder for action representation learning.

Ranked #5 on Skeleton Based Action Recognition on N-UCLA

Action Recognition Language Modelling +2

Paper
Code

Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition

1 code implementation • 27 Jul 2022 • Wangmeng Xiang, Chao Li, Biao Wang, Xihan Wei, Xian-Sheng Hua, Lei Zhang

For 3D video-based tasks such as action recognition, however, directly applying spatiotemporal transformers on video data will bring heavy computation and memory burdens due to the largely increased number of patches and the quadratic complexity of self-attention computation.

Ranked #9 on Action Recognition on Diving-48

Action Classification Action Recognition

Paper
Code

Spatial-Temporal Federated Learning for Lifelong Person Re-identification on Distributed Edges

1 code implementation • 24 Jul 2022 • Lei Zhang, Guanyu Gao, Huaizheng Zhang

Then, the learnt knowledge from edge clients will be aggregated by centralized parameter server, where the knowledge will be selectively and attentively distilled from spatial- and temporal-dimension with carefully designed mechanisms.

Continual Learning Federated Learning +2

Paper
Code

Auto Machine Learning for Medical Image Analysis by Unifying the Search on Data Augmentation and Neural Architecture

no code implementations • 21 Jul 2022 • Jianwei Zhang, Dong Li, Lituan Wang, Lei Zhang

To address the problem, an improved augmentation search strategy, named Augmented Density Matching, was proposed by randomly sampling policies from a prior distribution for training.

AutoML Data Augmentation

Paper
Add Code

A Survey on Leveraging Pre-trained Generative Adversarial Networks for Image Editing and Restoration

1 code implementation • 21 Jul 2022 • Ming Liu, Yuxiang Wei, Xiaohe Wu, WangMeng Zuo, Lei Zhang

Generative adversarial networks (GANs) have drawn enormous attention due to the simple yet effective training mechanism and superior image generation quality.

Image Generation Image Restoration

Paper
Code

Box-supervised Instance Segmentation with Level Set Evolution

1 code implementation • 19 Jul 2022 • Wentong Li, Wenyu Liu, Jianke Zhu, Miaomiao Cui, Xiansheng Hua, Lei Zhang

A simple mask supervised SOLOv2 model is adapted to predict the instance-aware mask map as the level set for each instance.

Box-supervised Instance Segmentation Segmentation

182

Paper
Code

Mind the Gap: Polishing Pseudo labels for Accurate Semi-supervised Object Detection

1 code implementation • 17 Jul 2022 • Lei Zhang, Yuxuan Sun, Wei Wei

Instead of directly exploiting the pseudo labels produced by the teacher detector, we take the first attempt at reducing their deviation from ground truth using dual polishing learning, where two differently structured polishing networks are elaborately developed and trained using synthesized paired pseudo labels and the corresponding ground truth for categories and bounding boxes on the given annotated objects, respectively.

Ranked #10 on Semi-Supervised Object Detection on COCO 5% labeled data

object-detection Object Detection +2

Paper
Code

MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with Unknown Number of Sound Sources

1 code implementation • 15 Jul 2022 • Haoran Yin, Meng Ge, Yanjie Fu, Gaoyan Zhang, Longbiao Wang, Lei Zhang, Lin Qiu, Jianwu Dang

These algorithms are usually achieved by mapping the multi-channel audio input to the single output (i. e. overall spatial pseudo-spectrum (SPS) of all sources), that is called MISO.

Paper
Code

E2FIF: Push the limit of Binarized Deep Imagery Super-resolution using End-to-end Full-precision Information Flow

1 code implementation • 14 Jul 2022 • Zhiqiang Lang, Chongxing Song, Lei Zhang, Wei Wei

Binary neural network (BNN) provides a promising solution to deploy parameter-intensive deep single image super-resolution (SISR) models onto real devices with limited storage and computational resources.

Image Super-Resolution

Paper
Code

Domain Gap Estimation for Source Free Unsupervised Domain Adaptation with Many Classifiers

no code implementations • 12 Jul 2022 • Ziyang Zong, Jun He, Lei Zhang, Hai Huan

However, for source free UDA, the source domain data can not be accessed during adaptation, which poses great challenge of measuring the domain gap.

Unsupervised Domain Adaptation

Paper
Add Code

Learning High-quality Proposals for Acne Detection

1 code implementation • 8 Jul 2022 • Jianwei Zhang, Lei Zhang, Junyou Wang, Xin Wei, Jiaqi Li, Xian Jiang, Dan Du

Acne detection is crucial for interpretative diagnosis and precise treatment of skin disease.

Classification Region Proposal +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.