Search Results for author: Zhen Li

Found 196 papers, 92 papers with code

Don’t Take It Literally: An Edit-Invariant Sequence Loss for Text Generation

1 code implementation NAACL 2022 Guangyi Liu, Zichao Yang, Tianhua Tao, Xiaodan Liang, Junwei Bao, Zhen Li, Xiaodong He, Shuguang Cui, Zhiting Hu

Such training objective is sub-optimal when the target sequence is not perfect, e. g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available.

Machine Translation Style Transfer +2

Reciprocal Learning of Knowledge Retriever and Response Ranker for Knowledge-Grounded Conversations

no code implementations COLING 2022 Jiazhan Feng, Chongyang Tao, Zhen Li, Chang Liu, Tao Shen, Dongyan Zhao

In this paper, we propose a reciprocal learning approach to jointly optimize a knowledge retriever and a response ranker for knowledge-grounded response retrieval without ground-truth knowledge labels.

Retrieval

ETSM: Automating Dissection Trajectory Suggestion and Confidence Map-Based Safety Margin Prediction for Robot-assisted Endoscopic Submucosal Dissection

no code implementations28 Nov 2024 Mengya Xu, Wenjin Mo, Guankun Wang, Huxin Gao, An Wang, Long Bai, Chaoyang Lyu, Xiaoxiao Yang, Zhen Li, Hongliang Ren

Robot-assisted Endoscopic Submucosal Dissection (ESD) improves the surgical procedure by providing a more comprehensive view through advanced robotic instruments and bimanual operation, thereby enhancing dissection efficiency and accuracy.

Decision Making regression +1

PDZSeg: Adapting the Foundation Model for Dissection Zone Segmentation with Visual Prompts in Robot-assisted Endoscopic Submucosal Dissection

no code implementations27 Nov 2024 Mengya Xu, Wenjin Mo, Guankun Wang, Huxin Gao, An Wang, Zhen Li, Xiaoxiao Yang, Hongliang Ren

Conclusion: The PDZSeg model effectively utilizes visual prompts to enhance segmentation performance and user experience, supported by the novel ESD-DZSeg dataset as a benchmark for dissection zone segmentation in ESD.

Segmentation

D$^2$-World: An Efficient World Model through Decoupled Dynamic Flow

1 code implementation26 Nov 2024 Haiming Zhang, Xu Yan, Ying Xue, Zixuan Guo, Shuguang Cui, Zhen Li, Bingbing Liu

This technical report summarizes the second-place solution for the Predictive World Model Challenge held at the CVPR-2024 Workshop on Foundation Models for Autonomous Systems.

Privacy-Preserving Federated Foundation Model for Generalist Ultrasound Artificial Intelligence

no code implementations25 Nov 2024 Yuncheng Jiang, Chun-Mei Feng, Jinke Ren, Jun Wei, Zixun Zhang, Yiwen Hu, Yunbi Liu, Rui Sun, Xuemei Tang, Juan Du, Xiang Wan, Yong Xu, Bo Du, Xin Gao, Guangyu Wang, Shaohua Zhou, Shuguang Cui, Rick Siow Mong Goh, Yong liu, Zhen Li

Notably, UltraFedFM surpasses the diagnostic accuracy of mid-level ultrasonographers and matches the performance of expert-level sonographers in the joint diagnosis of 8 common systemic diseases.

Federated Learning Lesion Segmentation +1

VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving

no code implementations22 Nov 2024 Haiming Zhang, Wending Zhou, Yiyao Zhu, Xu Yan, Jiantao Gao, Dongfeng Bai, Yingjie Cai, Bingbing Liu, Shuguang Cui, Zhen Li

This paper introduces VisionPAD, a novel self-supervised pre-training paradigm designed for vision-centric algorithms in autonomous driving.

3D Object Detection Autonomous Driving +2

CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection

1 code implementation10 Oct 2024 Guankun Wang, Han Xiao, Huxin Gao, Renrui Zhang, Long Bai, Xiaoxiao Yang, Zhen Li, Hongsheng Li, Hongliang Ren

In this paper, we design a hierarchical decomposition of ESD motion granularity and introduce a multi-level surgical motion dataset (CoPESD) for training LVLMs as the robotic \textbf{Co}-\textbf{P}ilot of \textbf{E}ndoscopic \textbf{S}ubmucosal \textbf{D}issection.

Instruction Following

Emu3: Next-Token Prediction is All You Need

2 code implementations27 Sep 2024 Xinlong Wang, Xiaosong Zhang, Zhengxiong Luo, Quan Sun, Yufeng Cui, Jinsheng Wang, Fan Zhang, Yueze Wang, Zhen Li, Qiying Yu, Yingli Zhao, Yulong Ao, Xuebin Min, Tao Li, Boya Wu, Bo Zhao, BoWen Zhang, Liangdong Wang, Guang Liu, Zheqi He, Xi Yang, Jingjing Liu, Yonghua Lin, Tiejun Huang, Zhongyuan Wang

While next-token prediction is considered a promising path towards artificial general intelligence, it has struggled to excel in multimodal tasks, which are still dominated by diffusion models (e. g., Stable Diffusion) and compositional approaches (e. g., CLIP combined with LLMs).

Visual Question Answering

A Scalable Matrix Visualization for Understanding Tree Ensemble Classifiers

no code implementations5 Sep 2024 Zhen Li, Weikai Yang, Jun Yuan, Jing Wu, Changjian Chen, Yao Ming, Fan Yang, HUI ZHANG, Shixia Liu

To ensure the inclusion of anomalous rules, we develop an anomaly-biased model reduction method to prioritize these rules at each hierarchical level.

Let Video Teaches You More: Video-to-Image Knowledge Distillation using DEtection TRansformer for Medical Video Lesion Detection

no code implementations26 Aug 2024 Yuncheng Jiang, Zixun Zhang, Jun Wei, Chun-Mei Feng, Guanbin Li, Xiang Wan, Shuguang Cui, Zhen Li

The teacher network aims at extracting temporal contexts from multiple frames and transferring them to the student network, and the student network is an image-based model dedicated to fast prediction in inference.

Knowledge Distillation Lesion Detection

Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development

no code implementations19 Aug 2024 Yuncheng Jiang, Yiwen Hu, Zixun Zhang, Jun Wei, Chun-Mei Feng, Xuemei Tang, Xiang Wan, Yong liu, Shuguang Cui, Zhen Li

Based on this dataset, we further introduce a benchmark model for colorectal cancer segmentation, named the Adaptive Sparse-context TRansformer (ASTR).

Segmentation

Improve ROI with Causal Learning and Conformal Prediction

no code implementations1 Jul 2024 Meng Ai, Zhuo Chen, Jibin Wang, Jing Shang, Tao Tao, Zhen Li

However, the DRP model confronts issues like covariate shift and insufficient training data, hindering its real-world effectiveness.

Conformal Prediction Decision Making +1

DV-3DLane: End-to-end Multi-modal 3D Lane Detection with Dual-view Representation

1 code implementation23 Jun 2024 Yueru Luo, Shuguang Cui, Zhen Li

3) A 3D dual-view deformable attention mechanism, which aggregates discriminative features from both PV and BEV spaces into queries for accurate 3D lane detection.

3D Lane Detection Autonomous Driving

GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents

1 code implementation16 Jun 2024 Dongping Chen, Yue Huang, Siyuan Wu, Jingyu Tang, Liuyi Chen, Yilin Bai, Zhigang He, Chenlong Wang, Huichi Zhou, Yiqiang Li, Tianshuo Zhou, Yue Yu, Chujie Gao, Qihui Zhang, Yi Gui, Zhen Li, Yao Wan, Pan Zhou, Jianfeng Gao, Lichao Sun

We evaluate the capabilities of current state-of-the-art MLLMs, including ImageLLMs and VideoLLMs, in understanding various types of GUI content, especially dynamic and sequential content.

Fully Test-Time Adaptation for Monocular 3D Object Detection

no code implementations30 May 2024 Hongbin Lin, Yifan Zhang, Shuaicheng Niu, Shuguang Cui, Zhen Li

However, applying this paradigm in Mono 3Det poses significant challenges due to OOD test data causing a remarkable decline in object detection scores.

Monocular 3D Object Detection object-detection +1

SA-GS: Semantic-Aware Gaussian Splatting for Large Scene Reconstruction with Geometry Constrain

no code implementations27 May 2024 Butian Xiong, Xiaoyu Ye, Tze Ho Elden Tse, Kai Han, Shuguang Cui, Zhen Li

We provide extensive experiments on publicly available large-scale scene reconstruction datasets with highly accurate point clouds as ground truth and our novel dataset.

3D geometry

SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge

no code implementations23 May 2024 Chuanhao Li, Zhen Li, Chenchen Jing, Shuo Liu, Wenqi Shao, Yuwei Wu, Ping Luo, Yu Qiao, Kaipeng Zhang

In this paper, we propose a plug-and-play framework, for augmenting existing LVLMs in handling visual question answering (VQA) about up-to-date knowledge, dubbed SearchLVLMs.

Question Answering RAG +1

SEGAN: semi-supervised learning approach for missing data imputation

1 code implementation21 May 2024 Xiaohua Pan, Weifeng Wu, Peiran Liu, Zhen Li, Peng Lu, Peijian Cao, Jianfeng Zhang, Xianfei Qiu, Yangyang Wu

In addition, the SE-GAN model introduces a missing hint matrix to allow the discriminator to more effectively distinguish between known data and data filled by the generator.

Imputation

Generalize Polyp Segmentation via Inpainting across Diverse Backgrounds and Pseudo-Mask Refinement

1 code implementation21 May 2024 Jiajian Ma, Fangqi Lu, Silin Huang, Song Wu, Zhen Li

Inpainting lesions within different normal backgrounds is a potential method of addressing the generalization problem, which is crucial for polyp segmentation models.

Data Augmentation

Magnetic-Guided Flexible Origami Robot toward Long-Term Phototherapy of H. pylori in the Stomach

no code implementations12 May 2024 Sishen Yuan, Baijia Liang, Po Wa Wong, Mingjing Xu, Chi Hsuan Li, Zhen Li, Hongliang Ren

Helicobacter pylori, a pervasive bacterial infection associated with gastrointestinal disorders such as gastritis, peptic ulcer disease, and gastric cancer, impacts approximately 50% of the global population.

Instance-free Text to Point Cloud Localization with Relative Position Awareness

no code implementations27 Apr 2024 Lichao Wang, Zhihao Yuan, Jinke Ren, Shuguang Cui, Zhen Li

In this paper, we address two key limitations of existing approaches: 1) their reliance on ground-truth instances as input; and 2) their neglect of the relative positions among potential instances.

Position Visual Place Recognition

Extracting Clean and Balanced Subset for Noisy Long-tailed Classification

no code implementations10 Apr 2024 Zhuo Li, He Zhao, Zhen Li, Tongliang Liu, Dandan Guo, Xiang Wan

To solve the joint issue of long-tailed distribution and label noise, most previous works usually aim to design a noise detector to distinguish the noisy and clean samples.

Pseudo Label

VISION2UI: A Real-World Dataset with Layout for Code Generation from UI Designs

no code implementations9 Apr 2024 Yi Gui, Zhen Li, Yao Wan, Yemin Shi, Hongyu Zhang, Yi Su, Shaoling Dong, Xing Zhou, Wenbin Jiang

Automatically generating UI code from webpage design visions can significantly alleviate the burden of developers, enabling beginner developers or designers to directly generate Web pages from design diagrams.

Code Generation

GauU-Scene V2: Assessing the Reliability of Image-Based Metrics with Expansive Lidar Image Dataset Using 3DGS and NeRF

no code implementations7 Apr 2024 Butian Xiong, Nanjun Zheng, Junhua Liu, Zhen Li

The experimental results on our multimodal dataset highlight the unreliability of current image-based metrics and reveal significant drawbacks in geometric reconstruction using the current Gaussian Splatting-based method, further illustrating the necessity of our dataset for assessing geometry reconstruction tasks.

SSIM

Random Walk in Random Permutation Set Theory

no code implementations5 Apr 2024 Jiefeng Zhou, Zhen Li, Yong Deng

Random walk is an explainable approach for modeling natural processes at the molecular level.

Learnable WSN Deployment of Evidential Collaborative Sensing Model

no code implementations23 Mar 2024 Ruijie Liu, Tianxiang Zhan, Zhen Li, Yong Deng

A learnable sensor deployment network (LSDNet) considering both sensor contribution and detection capability, is proposed for achieving the optimal deployment of WSNs.

Bridging scales in multiscale bubble growth dynamics with correlated fluctuations using neural operator learning

no code implementations20 Mar 2024 Minglei Lu, Chensen Lin, Martian Maxey, George Karniadakis, Zhen Li

In order to bridge the gap between microscale stochastic fluid models and continuum-based fluid models for bubble dynamics, we develop a composite neural operator model to unify the analysis of nonlinear bubble dynamics across microscale and macroscale regimes by integrating a many-body dissipative particle dynamics (mDPD) model with a continuum-based Rayleigh-Plesset (RP) model through a novel neural network architecture, which consists of a deep operator network for learning the mean behavior of bubble growth subject to pressure variations and a long short-term memory network for learning the statistical features of correlated fluctuations in microscale bubble dynamics.

Operator learning

Limit of the Maximum Random Permutation Set Entropy

no code implementations10 Mar 2024 Jiefeng Zhou, Zhen Li, Kang Hao Cheong, Yong Deng

In this paper, a new concept, the envelope of entropy function, is defined.

Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding

no code implementations29 Feb 2024 Guangyi Liu, Yu Wang, Zeyu Feng, Qiyu Wu, Liping Tang, Yuan Gao, Zhen Li, Shuguang Cui, Julian McAuley, Zichao Yang, Eric P. Xing, Zhiting Hu

The vast applications of deep generative models are anchored in three core capabilities -- generating new instances, reconstructing inputs, and learning compact representations -- across various data types, such as discrete text/protein sequences and continuous images.

Decoder Denoising

Random Graph Set and Evidence Pattern Reasoning Model

no code implementations20 Feb 2024 Tianxiang Zhan, Zhen Li, Yong Deng

Therefore, Random Graph Set (RGS) were proposed to model complex relationships and represent more event types.

Decision Making

GauU-Scene: A Scene Reconstruction Benchmark on Large Scale 3D Reconstruction Dataset Using Gaussian Splatting

no code implementations25 Jan 2024 Butian Xiong, Zhuo Li, Zhen Li

We introduce a novel large-scale scene reconstruction benchmark using the newly developed 3D representation approach, Gaussian Splatting, on our expansive U-Scene dataset.

3D Reconstruction

Leveraging Large Language Models for NLG Evaluation: Advances and Challenges

1 code implementation13 Jan 2024 Zhen Li, Xiaohan Xu, Tao Shen, Can Xu, Jia-Chen Gu, Yuxuan Lai, Chongyang Tao, Shuai Ma

In the rapidly evolving domain of Natural Language Generation (NLG) evaluation, introducing Large Language Models (LLMs) has opened new avenues for assessing generated content quality, e. g., coherence, creativity, and context relevance.

nlg evaluation Specificity +1

VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering

1 code implementation19 Dec 2023 Chun-Mei Feng, Yang Bai, Tao Luo, Zhen Li, Salman Khan, WangMeng Zuo, Xinxing Xu, Rick Siow Mong Goh, Yong liu

By feeding the retrieved image and question to the VQA model, one can find the images inconsistent with relative caption when the answer by VQA is inconsistent with the answer in the QA pair.

Image Retrieval Question Answering +2

CrossBind: Collaborative Cross-Modal Identification of Protein Nucleic-Acid-Binding Residues

1 code implementation19 Dec 2023 Linglin Jing, Sheng Xu, Yifan Wang, Yuzhe Zhou, Tao Shen, Zhigang Ji, Hui Fang, Zhen Li, Siqi Sun

Accurate identification of protein nucleic-acid-binding residues poses a significant challenge with important implications for various biological processes and drug design.

Contrastive Learning Protein Language Model

RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation

no code implementations19 Dec 2023 Haiming Zhang, Xu Yan, Dongfeng Bai, Jiantao Gao, Pan Wang, Bingbing Liu, Shuguang Cui, Zhen Li

3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.

Knowledge Distillation

GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance

no code implementations12 Dec 2023 Haiming Zhang, Zhihao Yuan, Chaoda Zheng, Xu Yan, Baoyuan Wang, Guanbin Li, Song Wu, Shuguang Cui, Zhen Li

Our proposed GSmoothFace model mainly consists of the Audio to Expression Prediction (A2EP) module and the Target Adaptive Face Translation (TAFT) module.

Face Model Talking Face Generation

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

1 code implementation CVPR 2024 Zhihao Yuan, Jinke Ren, Chun-Mei Feng, Hengshuang Zhao, Shuguang Cui, Zhen Li

Building on this, we design a visual program that consists of three types of modules, i. e., view-independent, view-dependent, and functional modules.

3D visual grounding Object

ScribblePolyp: Scribble-Supervised Polyp Segmentation through Dual Consistency Alignment

no code implementations9 Nov 2023 Zixun Zhang, Yuncheng Jiang, Jun Wei, Hannah Cui, Zhen Li

The second branch leverages affinity propagation to refine predictions into a soft version, extending additional supervision to unlabeled pixels.

Variational Quantum Linear Solver-based Combination Rules in Dempster–Shafer Theory

1 code implementation journal 2023 Hao Luo, Qianli Zhou, Zhen Li, Yong Deng

Dempster–Shafer Theory (DST), as a method of handling uncertain information, is widely used in decisionmaking and information fusion.

SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection

1 code implementation ICCV 2023 Yiran Qin, Chaoqun Wang, Zijian Kang, Ningning Ma, Zhen Li, Ruimao Zhang

In this paper, we propose a novel training strategy called SupFusion, which provides an auxiliary feature level supervision for effective LiDAR-Camera fusion and significantly boosts detection performance.

3D Object Detection object-detection

ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic Diffusion Models

1 code implementation3 Sep 2023 Yuhao Du, Yuncheng Jiang, Shuangyi Tan, Xusheng Wu, Qi Dou, Zhen Li, Guanbin Li, Xiang Wan

Colonoscopy analysis, particularly automatic polyp segmentation and detection, is essential for assisting clinical diagnosis and treatment.

Segmentation

LATR: 3D Lane Detection from Monocular Images with Transformer

1 code implementation ICCV 2023 Yueru Luo, Chaoda Zheng, Xu Yan, Tang Kun, Chao Zheng, Shuguang Cui, Zhen Li

On the one hand, each query is generated based on 2D lane-aware features and adopts a hybrid embedding to enhance lane information.

3D Lane Detection Autonomous Driving

SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training

1 code implementation ICCV 2023 Hong Yan, Yang Liu, Yushen Wei, Zhen Li, Guanbin Li, Liang Lin

Moreover, these methods ignore how to utilize the fine-grained dependencies among different skeleton joints to pre-train an efficient skeleton sequence learning model that can generalize well across different datasets.

Action Recognition Decoder +3

YONA: You Only Need One Adjacent Reference-frame for Accurate and Fast Video Polyp Detection

1 code implementation6 Jun 2023 Yuncheng Jiang, Zixun Zhang, Ruimao Zhang, Guanbin Li, Shuguang Cui, Zhen Li

YONA fully exploits the information of one previous adjacent frame and conducts polyp detection on the current frame without multi-frame collaborations.

Contrastive Learning

Differential Convolutional Fuzzy Time Series Forecasting

no code implementations15 May 2023 Tianxiang Zhan, Yuanpeng He, Yong Deng, Zhen Li

Thanks to the learnable ability of the neural network, the length of fuzzy rules established in FTSF is expended to an arbitrary length that the expert is not able to handle by the expert system.

Time Series Time Series Forecasting

Hierarchical Weight Averaging for Deep Neural Networks

no code implementations23 Apr 2023 Xiaozhe Gu, Zixun Zhang, Yuncheng Jiang, Tao Luo, Ruimao Zhang, Shuguang Cui, Zhen Li

Despite the simplicity, stochastic gradient descent (SGD)-like algorithms are successful in training deep neural networks (DNNs).

Decouple Non-parametric Knowledge Distillation For End-to-end Speech Translation

no code implementations20 Apr 2023 Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu, Zhen Li

Existing techniques often attempt to make knowledge transfer from a powerful machine translation (MT) to speech translation (ST) model with some elaborate techniques, which often requires transcription as extra input during training.

Knowledge Distillation Machine Translation +3

Semantic Human Parsing via Scalable Semantic Transfer over Multiple Label Domains

no code implementations CVPR 2023 Jie Yang, Chaoqun Wang, Zhen Li, Junle Wang, Ruimao Zhang

This paper presents Scalable Semantic Transfer (SST), a novel training paradigm, to explore how to leverage the mutual benefits of the data from different label domains (i. e. various levels of label granularity) to train a powerful human parsing network.

Human Parsing Representation Learning

Fair-CDA: Continuous and Directional Augmentation for Group Fairness

no code implementations1 Apr 2023 Rui Sun, Fengwei Zhou, Zhenhua Dong, Chuanlong Xie, Lanqing Hong, Jiawei Li, Rui Zhang, Zhen Li, Zhenguo Li

By adjusting the perturbation strength in the direction of the paths, our proposed augmentation is controllable and auditable.

Data Augmentation Disentanglement +1

Deep neural operator for learning transient response of interpenetrating phase composites subject to dynamic loading

no code implementations30 Mar 2023 Minglei Lu, Ali Mohammadi, Zhaoxu Meng, Xuhui Meng, Gang Li, Zhen Li

After an offline training, the DNO model can act as surrogate of physics-based FEA to predict the transient mechanical response in terms of reaction force and stress distribution of the IPCs to various strain loads in one second at an accuracy of 98%.

Incremental Learning

An Effective Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

1 code implementation21 Mar 2023 Chaoda Zheng, Xu Yan, Haiming Zhang, Baoyuan Wang, Shenghui Cheng, Shuguang Cui, Zhen Li

Due to the motion-centric nature, our method shows its impressive generalizability with limited training labels and provides good differentiability for end-to-end cycle training.

3D Single Object Tracking Autonomous Driving +3

SRFormerV2: Taking a Closer Look at Permuted Self-Attention for Image Super-Resolution

1 code implementation ICCV 2023 Yupeng Zhou, Zhen Li, Chun-Le Guo, Li Liu, Ming-Ming Cheng, Qibin Hou

Without any bells and whistles, we show that our SRFormer achieves a 33. 86dB PSNR score on the Urban100 dataset, which is 0. 46dB higher than that of SwinIR but uses fewer parameters and computations.

Image Super-Resolution

Adaptive Context Selection for Polyp Segmentation

1 code implementation12 Jan 2023 Ruifei Zhang, Guanbin Li, Zhen Li, Shuguang Cui, Dahong Qian, Yizhou Yu

To tackle these issues, we propose an adaptive context selection based encoder-decoder framework which is composed of Local Context Attention (LCA) module, Global Context Module (GCM) and Adaptive Selection Module (ASM).

Decoder Segmentation

Benchmarking the Robustness of LiDAR Semantic Segmentation Models

1 code implementation3 Jan 2023 Xu Yan, Chaoda Zheng, Ying Xue, Zhen Li, Shuguang Cui, Dengxin Dai

In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions.

Autonomous Driving Benchmarking +3

Learning Transformation-Predictive Representations for Detection and Description of Local Features

no code implementations CVPR 2023 ZiHao Wang, Chunxu Wu, Yifei Yang, Zhen Li

The task of key-points detection and description is to estimate the stable location and discriminative representation of local features, which is essential for image matching.

Contrastive Learning

RankMatch: Fostering Confidence and Consistency in Learning with Noisy Labels

no code implementations ICCV 2023 Ziyi Zhang, Weikai Chen, Chaowei Fang, Zhen Li, Lechao Chen, Liang Lin, Guanbin Li

Confidence-wise, we propose a novel sample selection strategy based on confidence representation voting instead of the widely-used small-loss criterion.

Learning with noisy labels Representation Learning +1

BEV@DC: Bird's-Eye View Assisted Training for Depth Completion

no code implementations CVPR 2023 Wending Zhou, Xu Yan, Yinghong Liao, Yuankai Lin, Jin Huang, Gangming Zhao, Shuguang Cui, Zhen Li

In practice, the proposed BEV@DC model comprehensively takes advantage of LiDARs with rich geometric details in training, employing an enhanced depth completion manner in inference, which takes only images (RGB and depth) as input.

Autonomous Driving Depth Completion

Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language

1 code implementation CVPR 2023 Chuanhao Li, Zhen Li, Chenchen Jing, Yunde Jia, Yuwei Wu

Compositional generalization is critical to simulate the compositional capability of humans, and has received much attention in the vision-and-language (V&L) community.

Question Answering Self-Supervised Learning +2

DOSnet as a Non-Black-Box PDE Solver: When Deep Learning Meets Operator Splitting

no code implementations11 Dec 2022 Yuan Lan, Zhen Li, Jie Sun, Yang Xiang

Deep neural networks (DNNs) recently emerged as a promising tool for analyzing and solving complex differential equations arising in science and engineering applications.

BoxPolyp:Boost Generalized Polyp Segmentation Using Extra Coarse Bounding Box Annotations

1 code implementation7 Dec 2022 Jun Wei, Yiwen Hu, Guanbin Li, Shuguang Cui, S Kevin Zhou, Zhen Li

In practice, box annotations are applied to alleviate the over-fitting issue of previous polyp segmentation models, which generate fine-grained polyp area through the iterative boosted segmentation model.

Segmentation

Geometry-Aware Network for Domain Adaptive Semantic Segmentation

no code implementations2 Dec 2022 Yinghong Liao, Wending Zhou, Xu Yan, Shuguang Cui, Yizhou Yu, Zhen Li

Moreover, to improve the 2D classifier in the target domain, we perform domain-invariant geometric adaptation from source to target and unify the 2D semantic and 3D geometric segmentation results in two domains.

Depth Estimation Depth Prediction +4

Multi-view Inverse Rendering for Large-scale Real-world Indoor Scenes

1 code implementation CVPR 2023 Zhen Li, Lingli Wang, Mofang Cheng, Cihui Pan, Jiaqi Yang

We present a efficient multi-view inverse rendering method for large-scale real-world indoor scenes that reconstructs global illumination and physically-reasonable SVBRDFs.

Inverse Rendering Mixed Reality +3

Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning

2 code implementations12 Nov 2022 Ziyi Zhang, Weikai Chen, Hui Cheng, Zhen Li, Siyuan Li, Liang Lin, Guanbin Li

We investigate a practical domain adaptation task, called source-free domain adaptation (SFUDA), where the source-pretrained model is adapted to the target domain without access to the source data.

Contrastive Learning Source-Free Domain Adaptation

Let Images Give You More:Point Cloud Cross-Modal Training for Shape Analysis

2 code implementations9 Oct 2022 Xu Yan, Heshen Zhan, Chaoda Zheng, Jiantao Gao, Ruimao Zhang, Shuguang Cui, Zhen Li

Specifically, this paper introduces a simple but effective point cloud cross-modality training (PointCMT) strategy, which utilizes view-images, i. e., rendered or projected 2D images of the 3D object, to boost point cloud analysis.

3D Point Cloud Classification Knowledge Distillation +1

Low Error-Rate Approximate Multiplier Design for DNNs with Hardware-Driven Co-Optimization

no code implementations8 Oct 2022 Yao Lu, Jide Zhang, Su Zheng, Zhen Li, Lingli Wang

In this paper, two approximate 3*3 multipliers are proposed and the synthesis results of the ASAP-7nm process library justify that they can reduce the area by 31. 38% and 36. 17%, and the power consumption by 36. 73% and 35. 66% compared with the exact multiplier, respectively.

Robust Unsupervised Cross-Lingual Word Embedding using Domain Flow Interpolation

no code implementations7 Oct 2022 Liping Tang, Zhen Li, ZhiQuan Luo, Helen Meng

Further experiments on the downstream task of Cross-Lingual Natural Language Inference show that the proposed model achieves significant performance improvement for distant language pairs in downstream tasks compared to state-of-the-art adversarial and non-adversarial models.

Cross-Lingual Natural Language Inference

Designing An Illumination-Aware Network for Deep Image Relighting

1 code implementation21 Jul 2022 Zuo-Liang Zhu, Zhen Li, Rui-Xun Zhang, Chun-Le Guo, Ming-Ming Cheng

Lighting is a determining factor in photography that affects the style, expression of emotion, and even quality of images.

Image Relighting Image-to-Image Translation

2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds

1 code implementation10 Jul 2022 Xu Yan, Jiantao Gao, Chaoda Zheng, Chao Zheng, Ruimao Zhang, Shenghui Cui, Zhen Li

As camera and LiDAR sensors capture complementary information used in autonomous driving, great efforts have been made to develop semantic segmentation algorithms through multi-modality data fusion.

Autonomous Driving Knowledge Distillation +3

Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases

no code implementations5 Jul 2022 Zhihao Yuan, Xu Yan, Zhuo Li, Xuhao Li, Yao Guo, Shuguang Cui, Zhen Li

Recent progress in 3D scene understanding has explored visual grounding (3DVG) to localize a target object through a language description.

Object Representation Learning +3

Toward Clinically Assisted Colorectal Polyp Recognition via Structured Cross-modal Representation Consistency

1 code implementation23 Jun 2022 Weijie Ma, Ye Zhu, Ruimao Zhang, Jie Yang, Yiwen Hu, Zhen Li, Li Xiang

By aligning the class tokens and spatial attention maps of paired NBI and WL images at different levels, the Transformer achieves the ability to keep both global and local representation consistency for the above two modalities.

Classification Image Classification

A Unified Understanding of Deep NLP Models for Text Classification

no code implementations19 Jun 2022 Zhen Li, Xiting Wang, Weikai Yang, Jing Wu, Zhengyan Zhang, Zhiyuan Liu, Maosong Sun, HUI ZHANG, Shixia Liu

The rapid development of deep natural language processing (NLP) models for text classification has led to an urgent need for a unified understanding of these models proposed individually.

text-classification Text Classification

AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation

2 code implementations16 Jun 2022 Yuanfeng Ji, Haotian Bai, Jie Yang, Chongjian Ge, Ye Zhu, Ruimao Zhang, Zhen Li, Lingyan Zhang, Wanling Ma, Xiang Wan, Ping Luo

Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods.

Image Segmentation Medical Image Segmentation +3

Universality and approximation bounds for echo state networks with random weights

no code implementations12 Jun 2022 Zhen Li, Yunfei Yang

We study the uniform approximation of echo state networks with randomly generated internal weights.

Contextual Bandits with Knapsacks for a Conversion Model

no code implementations1 Jun 2022 Zhen Li, Gilles Stoltz

At each round, given the stochastic i. i. d.\ context $\mathbf{x}_t$ and the arm picked $a_t$ (corresponding, e. g., to a discount level), a customer conversion may be obtained, in which case a reward $r(a,\mathbf{x}_t)$ is gained and vector costs $c(a_t,\mathbf{x}_t)$ are suffered (corresponding, e. g., to losses of earnings).

Multi-Armed Bandits

6GAN: IPv6 Multi-Pattern Target Generation via Generative Adversarial Nets with Reinforcement Learning

1 code implementation21 Apr 2022 Tianyu Cui, Gaopeng Gou, Gang Xiong, Chang Liu, Peipei Fu, Zhen Li

6GAN forces multiple generators to train with a multi-class discriminator and an alias detector to generate non-aliased active targets with different addressing pattern types.

Decision Making reinforcement-learning +2

Towards An End-to-End Framework for Flow-Guided Video Inpainting

2 code implementations CVPR 2022 Zhen Li, Cheng-Ze Lu, Jianhua Qin, Chun-Le Guo, Ming-Ming Cheng

Optical flow, which captures motion information across frames, is exploited in recent video inpainting methods through propagating pixels along its trajectories.

Hallucination Optical Flow Estimation +2

Graph Enhanced Contrastive Learning for Radiology Findings Summarization

1 code implementation ACL 2022 Jinpeng Hu, Zhuo Li, Zhihong Chen, Zhen Li, Xiang Wan, Tsung-Hui Chang

To address the limitation, we propose a unified framework for exploiting both extra knowledge and the original findings in an integrated way so that the critical information (i. e., key words and their relations) can be extracted in an appropriate way to facilitate impression generation.

Contrastive Learning

ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification

1 code implementation13 Feb 2022 Xinjie Lin, Gang Xiong, Gaopeng Gou, Zhen Li, Junzheng Shi, Jing Yu

In this paper, we propose a new traffic representation model called Encrypted Traffic Bidirectional Encoder Representations from Transformer (ET-BERT), which pre-trains deep contextualized datagram-level representation from large-scale unlabeled data.

Classification Management +1

RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation

1 code implementation12 Feb 2022 Zhen Li, Guenevere, Chen, Chen Chen, Yayi Zou, Shouhuai Xu

Recent studies show that current source code authorship attribution methods can be compromised by attackers exploiting adversarial examples and coding style manipulation.

Authorship Attribution Bug fixing +2

HEAM: High-Efficiency Approximate Multiplier Optimization for Deep Neural Networks

2 code implementations20 Jan 2022 Su Zheng, Zhen Li, Yao Lu, Jingbo Gao, Jide Zhang, Lingli Wang

We propose an optimization method for the automatic design of approximate multipliers, which minimizes the average error according to the operand distributions.

Quantization Vocal Bursts Intensity Prediction

Heuristic Search for Rank Aggregation with Application to Label Ranking

no code implementations11 Jan 2022 Yangming Zhou, Jin-Kao Hao, Zhen Li, Fred Glover

Rank aggregation aims to combine the preference rankings of a number of alternatives from different voters into a single consensus ranking.

PhyIR: Physics-Based Inverse Rendering for Panoramic Indoor Images

1 code implementation CVPR 2022 Zhen Li, Lingli Wang, Xiang Huang, Cihui Pan, Jiaqi Yang

In this paper, we present PhyIR, a neural inverse rendering method with a more completed SVBRDF representation and a physics-based in-network rendering layer, which can handle complex material and incorporate physical constraints by re-rendering realistic and detailed specular reflectance.

Inverse Rendering

Comprehensive Visual Question Answering on Point Clouds through Compositional Scene Manipulation

1 code implementation22 Dec 2021 Xu Yan, Zhihao Yuan, Yuhao Du, Yinghong Liao, Yao Guo, Zhen Li, Shuguang Cui

To tackle this problem, we propose the CLEVR3D, a large-scale VQA-3D dataset consisting of 171K questions from 8, 771 3D scenes.

Common Sense Reasoning Question Answering +3

CO2Sum:Contrastive Learning for Factual-Consistent Abstractive Summarization

no code implementations2 Dec 2021 Wei Liu, Huanqin Wu, Wenjing Mu, Zhen Li, Tao Chen, Dan Nie

We propose CO2Sum (Contrastive for Consistency), a contrastive learning scheme that can be easily applied on sequence-to-sequence models for factual-consistent abstractive summarization, proving that the model can be fact-aware without modifying the architecture.

Abstractive Text Summarization Contrastive Learning +1

Active Learning for Event Extraction with Memory-based Loss Prediction Model

no code implementations26 Nov 2021 Shirong Shen, Zhen Li, Guilin Qi

During the selection process, we use an internal-external sample loss ranking method to evaluate the sample importance by using local information.

Active Learning Event Extraction

Rapid Assessments of Light-Duty Gasoline Vehicle Emissions Using On-Road Remote Sensing and Machine Learning

no code implementations1 Oct 2021 Yan Xia, Linhui Jiang, Lu Wang, Xue Chen, Jianjie Ye, Tangyan Hou, Liqiang Wang, Yibo Zhang, Mengying Li, Zhen Li, Zhe Song, Yaping Jiang, Weiping Liu, Pengfei Li, Daniel Rosenfeld, John H. Seinfeld, Shaocai Yu

Our results show that the ORRS measurements, assisted by the machine-learning-based ensemble model developed here, can realize day-to-day supervision of on-road vehicle-specific emissions.

Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds

2 code implementations ICCV 2021 Chaoda Zheng, Xu Yan, Jiantao Gao, Weibing Zhao, Wei zhang, Zhen Li, Shuguang Cui

Current 3D single object tracking approaches track the target based on a feature comparison between the target template and the search area.

3D Single Object Tracking Object +1

Colorectal Polyp Classification from White-light Colonoscopy Images via Domain Alignment

no code implementations5 Aug 2021 Qin Wang, Hui Che, Weizhen Ding, Li Xiang, Guanbin Li, Zhen Li, Shuguang Cui

Thus, we propose a novel framework based on a teacher-student architecture for the accurate colorectal polyp classification (CPC) through directly using white-light (WL) colonoscopy images in the examination.

Contrastive Learning

Towards Making Deep Learning-based Vulnerability Detectors Robust

1 code implementation2 Aug 2021 Zhen Li, Jing Tang, Deqing Zou, Qian Chen, Shouhuai Xu, Chao Zhang, Yichen Li, Hai Jin

Automatically detecting software vulnerabilities in source code is an important problem that has attracted much attention.

Deep Learning

Shallow Feature Matters for Weakly Supervised Object Localization

1 code implementation CVPR 2021 Jun Wei, Qin Wang, Zhen Li, Sheng Wang, S. Kevin Zhou, Shuguang Cui

In practice, our SPOL model first generates the CAMs through a novel element-wise multiplication of shallow and deep feature maps, which filters the background noise and generates sharper boundaries robustly.

Object Pseudo Label +1

Inverse Problem of Nonlinear Schrödinger Equation as Learning of Convolutional Neural Network

no code implementations19 Jul 2021 Yiran Wang, Zhen Li

In this work, we use an explainable convolutional neural network (NLS-Net) to solve an inverse problem of the nonlinear Schr\"odinger equation, which is widely used in fiber-optic communications.

Deep Learning

Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation

1 code implementation29 Jun 2021 Guangyi Liu, Zichao Yang, Tianhua Tao, Xiaodan Liang, Junwei Bao, Zhen Li, Xiaodong He, Shuguang Cui, Zhiting Hu

Such training objective is sub-optimal when the target sequence is not perfect, e. g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available.

Machine Translation Style Transfer +3

Multi-Compound Transformer for Accurate Biomedical Image Segmentation

1 code implementation28 Jun 2021 Yuanfeng Ji, Ruimao Zhang, Huijie Wang, Zhen Li, Lingyun Wu, Shaoting Zhang, Ping Luo

The recent vision transformer(i. e. for image classification) learns non-local attentive interaction of different patch tokens.

Image Classification Image Segmentation +2

An error analysis of generative adversarial networks for learning distributions

no code implementations27 May 2021 Jian Huang, Yuling Jiao, Zhen Li, Shiao Liu, Yang Wang, Yunfei Yang

This paper studies how well generative adversarial networks (GANs) learn probability distributions from finite samples.

CARLS: Cross-platform Asynchronous Representation Learning System

1 code implementation26 May 2021 Chun-Ta Lu, Yun Zeng, Da-Cheng Juan, Yicheng Fan, Zhe Li, Jan Dlabal, Yi-Ting Chen, Arjun Gopalan, Allan Heydon, Chun-Sung Ferng, Reah Miyara, Ariel Fuxman, Futang Peng, Zhen Li, Tom Duerig, Andrew Tomkins

In this work, we propose CARLS, a novel framework for augmenting the capacity of existing deep learning frameworks by enabling multiple components -- model trainers, knowledge makers and knowledge banks -- to concertedly work together in an asynchronous fashion across hardware platforms.

Representation Learning

Combining Supervised and Un-supervised Learning for Automatic Citrus Segmentation

no code implementations4 May 2021 Heqing Huang, Tongbin Huang, Zhen Li, Zhiwei Wei, Shilei Lv

Compared with most of the existing citrus segmentation methods, our method uses a small amount of supervised data and a large number of unsupervised data, while learning the pixel level location information and the temporal information of citrus changes to enhance the segmentation effect.

Image Segmentation Segmentation +1

PointLIE: Locally Invertible Embedding for Point Cloud Sampling and Recovery

1 code implementation30 Apr 2021 Weibing Zhao, Xu Yan, Jiantao Gao, Ruimao Zhang, Jiayan Zhang, Zhen Li, Song Wu, Shuguang Cui

In this paper, we address a fundamental problem in PCSR: How to downsample the dense point cloud with arbitrary scales while preserving the local topology of discarding points in a case-agnostic manner (i. e. without additional storage for point relationship)?

Temporal Modulation Network for Controllable Space-Time Video Super-Resolution

1 code implementation CVPR 2021 Gang Xu, Jun Xu, Zhen Li, Liang Wang, Xing Sun, Ming-Ming Cheng

To well exploit the temporal information, we propose a Locally-temporal Feature Comparison (LFC) module, along with the Bi-directional Deformable ConvLSTM, to extract short-term and long-term motion cues in videos.

Space-time Video Super-resolution Video Super-Resolution

Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud

1 code implementation ICCV 2021 Mingtao Feng, Zhen Li, Qi Li, Liang Zhang, Xiangdong Zhang, Guangming Zhu, HUI ZHANG, Yaonan Wang, Ajmal Mian

There are three main challenges in 3D object grounding: to find the main focus in the complex and diverse description; to understand the point cloud scene; and to locate the target object.

Object

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

5 code implementations11 Feb 2021 Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, YunHsuan Sung, Zhen Li, Tom Duerig

In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset.

 Ranked #1 on Image Classification on VTAB-1k (using extra training data)

Cross-Modal Retrieval Fine-Grained Image Classification +6

On the capacity of deep generative networks for approximating distributions

no code implementations29 Jan 2021 Yunfei Yang, Zhen Li, Yang Wang

Furthermore, it is shown that the approximation error in Wasserstein distance grows at most linearly on the ambient dimension and that the approximation order only depends on the intrinsic dimension of the target distribution.

GAKP: GRU Association and Kalman Prediction for Multiple Object Tracking

no code implementations28 Dec 2020 Zhen Li, Sunzeng Cai, Xiaoyi Wang, Zhe Liu, Nian Xue

Multiple Object Tracking (MOT) has been a useful yet challenging task in many real-world applications such as video surveillance, intelligent retail, and smart city.

Multiple Object Tracking

Operator learning for predicting multiscale bubble growth dynamics

no code implementations23 Dec 2020 Chensen Lin, Zhen Li, Lu Lu, Shengze Cai, Martin Maxey, George Em Karniadakis

Simulating and predicting multiscale problems that couple multiple physics and dynamics across many orders of spatiotemporal scales is a great challenge that has not been investigated systematically by deep neural networks (DNNs).

Computational Physics

Formal modeling and performance evaluation for hybrid systems:a probabilistic hybrid process algebra-based approach

no code implementations23 Dec 2020 Fujun Wang, Zining Cao, Lixing Tan, Zhen Li

After that, we present a performance evaluation language, CTRML, to reason over probabilistic systems, which extend the results to real number.

Formal Languages and Automata Theory F.4.3