Search Results for author: Peng Wang

Found 426 papers, 167 papers with code

PCEE-BERT: Accelerating BERT Inference via Patient and Confident Early Exiting

1 code implementation Findings (NAACL) 2022 Zhen Zhang, Wei Zhu, Jinfan Zhang, Peng Wang, Rize Jin, Tae-Sun Chung

In this work, we propose Patient and Confident Early Exiting BERT (PCEE-BERT), an off-the-shelf sample-dependent early exiting method that can work with different PLMs and can also work along with popular model compression methods.

Model Compression

A Nearly-Linear Time Algorithm for Exact Community Recovery in Stochastic Block Model

no code implementations ICML 2020 Peng Wang, Zirui Zhou, Anthony Man-Cho So

In this paper, we focus on the problem of exactly recovering the communities in a binary symmetric SBM, where a graph of $n$ vertices is partitioned into two equal-sized communities and the vertices are connected with probability $p = \alpha\log(n)/n$ within communities and $q = \beta\log(n)/n$ across communities for some $\alpha>\beta>0$.

Stochastic Block Model

Chain-of-Thought Enhanced Shallow Transformers for Wireless Symbol Detection

no code implementations26 Jun 2025 Li Fan, Peng Wang, Jing Yang, Cong Shen

However, prior ICL-based Transformer models rely on deep architectures with many layers to achieve satisfactory performance, resulting in substantial storage and computational costs.

Computational Efficiency In-Context Learning

A Complete Loss Landscape Analysis of Regularized Deep Matrix Factorization

no code implementations25 Jun 2025 Po Chen, Rujun Jiang, Peng Wang

In this work, we aim to fill this gap by conducting a comprehensive study of the loss landscape of the regularized DMF problem.

Analyzing the Impact of Strategic Bidding on the Reserve Capacity via a Bi-Level Model

no code implementations25 Jun 2025 Yun Xu, Yunxiao Bai, Yunyong Zhang, Peng Wang, Xuelin Wang, Jiqun Guo, Kaijun Xie, Rusheng Zhao

The growing integration of renewable energy sources necessitates adequate reserve capacity to maintain power balance.

Language Embedding Meets Dynamic Graph: A New Exploration for Neural Architecture Representation Learning

no code implementations9 Jun 2025 Haizhao Jing, Haokui Zhang, Zhenhao Shang, Rong Xiao, Peng Wang, Yanning Zhang

Specifically, inspired by large language models (LLMs), we propose a language embedding framework where both neural architectures and hardware platform specifications are projected into a unified semantic space through tokenization and LLM processing, enabling zero-shot prediction across different hardware platforms for the first time.

Attribute Graph Representation Learning

SeedEdit 3.0: Fast and High-Quality Generative Image Editing

no code implementations5 Jun 2025 Peng Wang, Yichun Shi, Xiaochen Lian, Zhonghua Zhai, Xin Xia, Xuefeng Xiao, Weilin Huang, Jianchao Yang

We introduce SeedEdit 3. 0, in companion with our T2I model Seedream 3. 0, which significantly improves over our previous SeedEdit versions in both aspects of edit instruction following and image content (e. g., ID/IP) preservation on real image inputs.

Instruction Following

ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions

no code implementations3 Jun 2025 Di Chang, Mingdeng Cao, Yichun Shi, Bo Liu, Shengqu Cai, Shijie Zhou, Weilin Huang, Gordon Wetzstein, Mohammad Soleymani, Peng Wang

To address this gap, we introduce ByteMorph, a comprehensive framework for instruction-based image editing with an emphasis on non-rigid motions.

Benchmarking Diversity

Understanding Generalization in Diffusion Models via Probability Flow Distance

no code implementations26 May 2025 Huijie Zhang, Zijian Huang, Siyi Chen, Jinfan Zhou, Zekai Zhang, Peng Wang, Qing Qu

Diffusion models have emerged as a powerful class of generative models, capable of producing high-quality samples that generalize beyond the training data.

Memorization

Reward Is Enough: LLMs Are In-Context Reinforcement Learners

no code implementations21 May 2025 Kefan Song, Amir Moeini, Peng Wang, Lei Gong, Rohan Chandra, Yanjun Qi, Shangtong Zhang

At the next round, we prompt the LLM again with the same task and a context consisting of all previous responses and rewards.

Large Language Model Reinforcement Learning (RL) +1

X-WebAgentBench: A Multilingual Interactive Web Benchmark for Evaluating Global Agentic System

1 code implementation21 May 2025 Peng Wang, Ruihan Tao, Qiguang Chen, Mengkang Hu, Libo Qin

To fill this gap, we introduce X-WebAgentBench, a novel multilingual agent benchmark in an interactive web environment, which evaluates the planning and interaction performance of language agents across multiple languages, thereby contributing to the advancement of global agent intelligence.

Language Modeling Language Modelling +1

Drug classification based on X-ray spectroscopy combined with machine learning

no code implementations4 May 2025 Yongming Li, Peng Wang, Bangdong Han

X-ray absorption spectroscopy, a non-destructive detection technique, offers advantages such as ease of operation, penetrative observation, and strong substance differentiation capabilities, making it well-suited for application in the field of drug detection and identification.

Cross-Frequency Collaborative Training Network and Dataset for Semi-supervised First Molar Root Canal Segmentation

no code implementations16 Apr 2025 Zhenhuan Zhou, Yuchen Zhang, Along He, Peng Wang, Xueshuo Xie, Tao Li

Additionally, to alleviate the workload of manual annotation for dentists and fully leverage the unlabeled data, we designed a Cross-Frequency Collaborative training semi-supervised learning (SSL) Network called CFC-Net.

Diagnostic Image Segmentation +3

Channel-Adaptive Robust Resource Allocation for Highly Reliable IRS-Assisted V2X Communications

no code implementations16 Apr 2025 Peng Wang, Weihua Wu

We formulate a joint optimization problem for vehicular transmit power, Multi-User Detection (MUD) matrices, V2V link spectrum reuse, and IRS reflection coefficients in IRS-aided V2X communication with imperfect CSI.

Self-Learning

SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model

no code implementations14 Apr 2025 Zongcan Ding, Haodong Zhang, Peng Wu, Guansong Pang, Zhiwei Yang, Peng Wang, Yanning Zhang

Extensive experiments on four benchmarks demonstrate that SlowFastVAD effectively combines the strengths of both fast and slow detectors, and achieves remarkable detection accuracy and interpretability with significantly reduced computational overhead, making it well-suited for real-world VAD applications with high reliability requirements.

Anomaly Detection Domain Adaptation +6

A Survey of Scaling in Large Language Model Reasoning

no code implementations2 Apr 2025 Zihan Chen, Song Wang, Zhen Tan, Xingbo Fu, Zhenyu Lei, Peng Wang, Huan Liu, Cong Shen, Jundong Li

The rapid advancements in large Language models (LLMs) have significantly enhanced their reasoning capabilities, driven by various strategies such as multi-agent collaboration.

Language Modeling Language Modelling +2

An Overview of Low-Rank Structures in the Training and Adaptation of Large Models

no code implementations25 Mar 2025 Laura Balzano, Tianjiao Ding, Benjamin D. Haeffele, Soo Min Kwon, Qing Qu, Peng Wang, Zhangyang Wang, Can Yaras

In this paper, we present a comprehensive review of recent advances in exploiting low-rank structures for deep learning and shed light on their mathematical foundations.

Self-Supervised Learning

A Shared Low-Rank Adaptation Approach to Personalized RLHF

no code implementations24 Mar 2025 Renpu Liu, Peng Wang, Donghao Li, Cong Shen, Jing Yang

Reinforcement Learning from Human Feedback (RLHF) has emerged as a pivotal technique for aligning artificial intelligence systems with human values, achieving remarkable success in fine-tuning large language models.

MMCR: Advancing Visual Language Model in Multimodal Multi-Turn Contextual Reasoning

no code implementations24 Mar 2025 Dawei Yan, Yang Li, Qing-Guo Chen, Weihua Luo, Peng Wang, Haokui Zhang, Chunhua Shen

Compared to single-turn dialogue, multi-turn dialogue involving multiple images better aligns with the needs of real-world human-AI interactions.

Diagnostic Language Modeling +2

Unlocking Generalization Power in LiDAR Point Cloud Registration

1 code implementation CVPR 2025 Zhenxuan Zeng, Qiao Wu, Xiyu Zhang, Lin Yuanbo Wu, Pei An, Jiaqi Yang, Ji Wang, Peng Wang

In real-world environments, a LiDAR point cloud registration method with robust generalization capabilities (across varying distances and datasets) is crucial for ensuring safety in autonomous driving and other LiDAR-based applications.

Autonomous Driving Point Cloud Registration

Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model

1 code implementation10 Mar 2025 Lixue Gong, Xiaoxia Hou, Fanshi Li, Liang Li, Xiaochen Lian, Fei Liu, Liyang Liu, Wei Liu, Wei Lu, Yichun Shi, Shiqi Sun, Yu Tian, Zhi Tian, Peng Wang, Xun Wang, Ye Wang, Guofeng Wu, Jie Wu, Xin Xia, Xuefeng Xiao, Linjie Yang, Zhonghua Zhai, Xinyu Zhang, Qi Zhang, Yuwei Zhang, Shijia Zhao, Jianchao Yang, Weilin Huang

To address these limitations, we present Seedream 2. 0, a native Chinese-English bilingual image generation foundation model that excels across diverse dimensions, which adeptly manages text prompt in both Chinese and English, supporting bilingual image generation and text rendering.

Image Description Image Generation +2

ColorDynamic: Generalizable, Scalable, Real-time, End-to-end Local Planner for Unstructured and Dynamic Environments

1 code implementation27 Feb 2025 Jinghao Xin, Zhichao Liang, Zihuan Zhang, Peng Wang, Ning li

Deep Reinforcement Learning (DRL) has demonstrated potential in addressing robotic local planning problems, yet its efficacy remains constrained in highly unstructured and dynamic environments.

Data Augmentation Deep Reinforcement Learning

MIND: Towards Immersive Psychological Healing with Multi-agent Inner Dialogue

no code implementations27 Feb 2025 Yujia Chen, Changsong Li, Yiming Wang, Qingqing Xiao, Nan Zhang, Zifan Kong, Peng Wang, Binyu Yan

To fill this gap, we propose the MIND (Multi-agent INner Dialogue), a novel paradigm that provides more immersive psychological healing environments.

New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration

1 code implementation27 Feb 2025 Xuzheng Yang, Junzhuo Liu, Peng Wang, Guoqing Wang, Yang Yang, Heng Tao Shen

To address fine-grained compositional REC, we propose novel methods based on a Specialist-MLLM collaboration framework, leveraging the complementary strengths of them: Specialist Models handle simpler tasks efficiently, while MLLMs are better suited for complex reasoning.

Image Comprehension Referring Expression +1

Brain-inspired analogical mixture prototypes for few-shot class-incremental learning

no code implementations26 Feb 2025 Wanyi Li, Wei Wei, Yongkang Luo, Peng Wang

Inspired by the brain's mechanisms for categorization and analogical learning, we propose a novel approach called Brain-inspired Analogical Mixture Prototypes (BAMP).

class-incremental learning Few-Shot Class-Incremental Learning +1

Constructing a Norm for Children's Scientific Drawing: Distribution Features Based on Semantic Similarity of Large Language Models

no code implementations21 Feb 2025 Yi Zhang, Fan Wei, Jingyi Li, Yan Wang, Yanyan Yu, Jianli Chen, Zipo Cai, Xinyu Liu, Wei Wang, Peng Wang, Zhong Wang

The use of children's drawings to examining their conceptual understanding has been proven to be an effective method, but there are two major problems with previous research: 1.

Large Language Model Semantic Similarity +1

Error Bound Analysis for the Regularized Loss of Deep Linear Neural Networks

no code implementations16 Feb 2025 Po Chen, Rujun Jiang, Peng Wang

The optimization foundations of deep linear networks have received significant attention lately.

Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling

no code implementations9 Feb 2025 Xiao Li, Zekai Zhang, Xiang Li, Siyi Chen, Zhihui Zhu, Peng Wang, Qing Qu

Diffusion models, though originally designed for generative tasks, have demonstrated impressive self-supervised representation learning capabilities.

Denoising Representation Learning

Analyzing the Role of the DSO in Electricity Trading of VPPs via a Stackelberg Game Model

1 code implementation13 Jan 2025 Peng Wang, Xi Zhang, Luis Badesa

In order to study the role of DSO as a stakeholder, a Stackelberg game is represented via a bi-level model: the DSO maximizes profits at the upper level, while the VPPs minimize operating costs at the lower level.

Bilevel Optimization

SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild

no code implementations6 Jan 2025 Jiawei Liu, Yuanzhi Zhu, Feiyu Gao, Zhibo Yang, Peng Wang, Junyang Lin, Xinggang Wang, Wenyu Liu

), the text in natural scene images needs to meet the following four key criteria: (1) Fidelity: the generated text should appear as realistic as a photograph and be completely accurate, with no errors in any of the strokes.

Attribute Optical Character Recognition +4

A New Underdetermined Framework for Sparse Estimation of Fault Location for Transmission Lines Using Limited Current Measurements

no code implementations6 Jan 2025 Guangxiao Zhang, Gaoxi Xiao, Xinghua Liu, Yan Xu, Peng Wang

This letter proposes an alternative underdetermined framework for fault location that utilizes current measurements along with the branch-bus matrix, providing another option besides the traditional voltage-based methods.

Understanding How Nonlinear Layers Create Linearly Separable Features for Low-Dimensional Data

no code implementations4 Jan 2025 Alec S. Xu, Can Yaras, Peng Wang, Qing Qu

In this work, we address this gap by examining the linear separation capabilities of shallow nonlinear networks.

Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding

1 code implementation CVPR 2025 Wei Suo, Lijun Zhang, Mengyang Sun, Lin Yuanbo Wu, Peng Wang, Yanning Zhang

Large Vision-Language Models (LVLMs) have obtained impressive performance in visual content understanding and multi-modal reasoning.

Hallucination

Dual Diffusion for Unified Image Generation and Understanding

1 code implementation CVPR 2025 Zijie Li, Henry Li, Yichun Shi, Amir Barati Farimani, Yuval Kluger, Linjie Yang, Peng Wang

Diffusion models have gained tremendous success in text-to-image generation, yet still lag behind with visual understanding tasks, an area dominated by autoregressive vision-language models.

Language Modeling Language Modelling +4

LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context

1 code implementation23 Dec 2024 Kai Ruan, Xuan Wang, Jixiang Hong, Peng Wang, Yang Liu, Hao Sun

While Large Language Models (LLMs) have demonstrated remarkable capabilities in scientific tasks, existing evaluation frameworks primarily assess their performance using rich contextual inputs, overlooking their ability to generate novel ideas from minimal information.

TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use

1 code implementation20 Dec 2024 Junjie Ye, Yilong Wu, Sixian Li, Yuming Yang, Tao Gui, Qi Zhang, Xuanjing Huang, Peng Wang, Zhongchao shi, Jianping Fan, Zhengyin Du

Large language models (LLMs) achieve remarkable advancements by leveraging tools to interact with external environments, a critical step toward generalized AI.

Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning

no code implementations10 Dec 2024 Can Yaras, Siyi Chen, Peng Wang, Qing Qu

Models such as Contrastive Language-Image Pretraining (CLIP) are designed to bridge different modalities, such as images and text, by learning a shared representation space through contrastive learning.

Contrastive Learning Image-text Retrieval +3

Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models

no code implementations9 Dec 2024 Wei Suo, Ji Ma, Mengyang Sun, Lin Yuanbo Wu, Peng Wang, Yanning Zhang

Although Large Vision-Language Models (LVLMs) have achieved impressive results, their high computational cost poses a significant barrier to wider application.

All Self-Supervised Learning

Building a Family of Data Augmentation Models for Low-cost LLM Fine-tuning on the Cloud

no code implementations6 Dec 2024 Yuanhao Yue, Chengyu Wang, Jun Huang, Peng Wang

Specializing LLMs in various domain-specific tasks has emerged as a critical step towards achieving high performance.

Data Augmentation

Sustainable Self-evolution Adversarial Training

no code implementations3 Dec 2024 Wenxuan Wang, Chenglei Wang, Huihui Qi, Menghao Ye, Xuelin Qian, Peng Wang, Yanning Zhang

With the wide application of deep neural network models in various computer vision tasks, there has been a proliferation of adversarial example generation strategies aimed at deeply exploring model security.

Adversarial Defense Continual Learning

Multi-Functional RIS Integrated Sensing and Communications for 6G Networks

no code implementations2 Dec 2024 Dongsheng Han, Peng Wang, Wanli Ni, Wen Wang, Ailing Zheng, Dusit Niyato, Naofal Al-Dhahir

We propose a MF-RIS-enabled multi-user and multi-target ISAC system, and formulate an optimization problem to maximize the signal-to-interference-plus-noise ratio (SINR) of sensing targets.

ISAC

SuperGaussians: Enhancing Gaussian Splatting Using Primitives with Spatially Varying Colors

1 code implementation28 Nov 2024 Rui Xu, Wenyue Chen, Jiepeng Wang, YuAn Liu, Peng Wang, Lin Gao, Shiqing Xin, Taku Komura, Xin Li, Wenping Wang

However, the current Gaussian primitives only have a single view-dependent color and an opacity to represent the appearance and geometry of the scene, resulting in a non-compact representation.

Novel View Synthesis

A Unified Analysis for Finite Weight Averaging

no code implementations20 Nov 2024 Peng Wang, Li Shen, Zerui Tao, Yan Sun, Guodong Zheng, DaCheng Tao

In this work, we first generalize SGD and LAWA as Finite Weight Averaging (FWA) and explain their advantages compared to SGD from the perspective of optimization and generalization.

Mathematical Induction

Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution

1 code implementation19 Nov 2024 Yang Zou, Zhixin Chen, Zhipeng Zhang, Xingyuan Li, Long Ma, JinYuan Liu, Peng Wang, Yanning Zhang

In this work, we emphasize the infrared spectral distribution fidelity and propose a Contourlet refinement gate framework to restore infrared modal-specific features while preserving spectral distribution fidelity.

Image Enhancement Image Super-Resolution +2

MVLight: Relightable Text-to-3D Generation via Light-conditioned Multi-View Diffusion

no code implementations18 Nov 2024 Dongseok Shim, Yichun Shi, Kejie Li, H. Jin Kim, Peng Wang

Recent advancements in text-to-3D generation, building on the success of high-performance text-to-image generative models, have made it possible to create imaginative and richly textured 3D objects from textual descriptions.

3D Generation Text to 3D

MDHP-Net: Detecting an Emerging Time-exciting Threat in IVN

no code implementations15 Nov 2024 Qi Liu, Yanchen Liu, Ruifeng Li, Chenhong Cao, Yufeng Li, Xingyu Li, Peng Wang, Runhan Feng, Shiyang Bu

We systematically analyze the characteristics of the threat: dynamism, time-exciting impact, and low prior knowledge dependency.

Diagnostic

MBA-SLAM: Motion Blur Aware Dense Visual SLAM with Radiance Fields Representation

1 code implementation13 Nov 2024 Peng Wang, Lingzhe Zhao, Yin Zhang, Shiyu Zhao, Peidong Liu

In our experiments, we demonstrate that MBA-SLAM surpasses previous state-of-the-art methods in both camera localization and map reconstruction, showcasing superior performance across a range of datasets, including synthetic and real datasets featuring sharp images as well as those affected by motion blur, highlighting the versatility and robustness of our approach.

3DGS Camera Localization +2

World Models: The Safety Perspective

no code implementations12 Nov 2024 Zifan Zeng, Chongzhe Zhang, Feng Liu, Joseph Sifakis, Qunli Zhang, Shiming Liu, Peng Wang

With the proliferation of the Large Language Model (LLM), the concept of World Models (WM) has recently attracted a great deal of attention in the AI research community, especially in the context of AI agents.

AI Agent Language Modeling +1

SeedEdit: Align Image Re-Generation to Image Editing

no code implementations11 Nov 2024 Yichun Shi, Peng Wang, Weilin Huang

We introduce SeedEdit, a diffusion model that is able to revise a given image with any text prompt.

Image Reconstruction

Smart-LLaMA: Two-Stage Post-Training of Large Language Models for Smart Contract Vulnerability Detection and Explanation

no code implementations9 Nov 2024 Lei Yu, Shiqi Chen, Hang Yuan, Peng Wang, Zhirong Huang, Jingyuan Zhang, Chenjie Shen, Fengjun Zhang, Li Yang, Jiajia Ma

Existing smart contract vulnerability detection methods face three main issues: (1) Insufficient quality of datasets, lacking detailed explanations and precise vulnerability locations.

Vulnerability Detection

Dynamic Textual Prompt For Rehearsal-free Lifelong Person Re-identification

no code implementations9 Nov 2024 Hongyu Chen, Bingliang Jiao, Wenxuan Wang, Peng Wang

By leveraging this shared textual space as an anchor, we can prompt the ReID model to embed images from various domains into a unified semantic space, thereby alleviating catastrophic forgetting caused by domain shifts.

Knowledge Distillation Person Re-Identification

Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning

no code implementations3 Nov 2024 Fei Zhou, Peng Wang, Lei Zhang, Zhenghua Chen, Wei Wei, Chen Ding, Guosheng Lin, Yanning Zhang

Meta-learning offers a promising avenue for few-shot learning (FSL), enabling models to glean a generalizable feature embedding through episodic training on synthetic FSL tasks in a source domain.

cross-domain few-shot learning

Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation

no code implementations30 Oct 2024 Wei Dong, Yuan Sun, Yiting Yang, Xing Zhang, Zhijun Lin, Qingsen Yan, Haokui Zhang, Peng Wang, Yang Yang, HengTao Shen

A common strategy for Parameter-Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViTs) involves adapting the model to downstream tasks by learning a low-rank adaptation matrix.

parameter-efficient fine-tuning

Wrong-of-Thought: An Integrated Reasoning Framework with Multi-Perspective Verification and Wrong Information

1 code implementation6 Oct 2024 Yongheng Zhang, Qiguang Chen, Jingxuan Zhou, Peng Wang, Jiasheng Si, Jin Wang, Wenpeng Lu, Libo Qin

To address these challenges, we propose Wrong-of-Thought (WoT), which includes two core modules: (1) Multi-Perspective Verification: A multi-perspective verification method for accurately refining the reasoning process and result, and (2) Wrong Information Utilization: Utilizing wrong information to alert LLMs and reduce the probability of LLMs making same mistakes.

60 Data Points are Sufficient to Fine-Tune LLMs for Question-Answering

no code implementations24 Sep 2024 Junjie Ye, Yuming Yang, Qi Zhang, Tao Gui, Xuanjing Huang, Peng Wang, Zhongchao shi, Jianping Fan

Large language models (LLMs) encode extensive world knowledge through pre-training on massive datasets, which can then be fine-tuned for the question-answering (QA) task.

Question Answering World Knowledge

FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension

1 code implementation23 Sep 2024 Junzhuo Liu, Xuzheng Yang, Weiwei Li, Peng Wang

Referring Expression Comprehension (REC) is a crucial cross-modal task that objectively evaluates the capabilities of language understanding, image comprehension, and language-to-image grounding.

Image Comprehension Referring Expression +2

VL-Reader: Vision and Language Reconstructor is an Effective Scene Text Recognizer

no code implementations18 Sep 2024 Humen Zhong, Zhibo Yang, Zhaohai Li, Peng Wang, Jun Tang, Wenqing Cheng, Cong Yao

Text recognition is an inherent integration of vision and language, encompassing the visual texture in stroke patterns and the semantic context among the character sequences.

Decoder Scene Text Recognition

VSFormer: Mining Correlations in Flexible View Set for Multi-view 3D Shape Understanding

1 code implementation14 Sep 2024 Hongyu Sun, Yongcai Wang, Peng Wang, Haoran Deng, Xudong Cai, Deying Li

In particular, we propose to incorporate different views of a 3D shape into a permutation-invariant set, referred to as \emph{View Set}, which removes rigid relation assumptions and facilitates adequate information exchange and fusion among views.

MCDGLN: Masked Connection-based Dynamic Graph Learning Network for Autism Spectrum Disorder

no code implementations10 Sep 2024 Peng Wang, Xin Wen, Ruochen Cao, Chengxin Gao, Yanrong Hao, Rui Cao

We then employ a specialized weighted edge aggregation (WEA) module, which uses the cross convolution with channel-wise element-wise convolutional kernel, to integrate dynamic functional connectivity and to isolating task-relevant connections.

Functional Connectivity Graph Learning

Deep Learning for Video Anomaly Detection: A Review

no code implementations9 Sep 2024 Peng Wu, Chengyu Pan, Yuting Yan, Guansong Pang, Peng Wang, Yanning Zhang

Video anomaly detection (VAD) aims to discover behaviors or events deviating from the normality in videos.

Anomaly Detection Deep Learning +1

Exploring Low-Dimensional Subspaces in Diffusion Models for Controllable Image Editing

2 code implementations4 Sep 2024 Siyi Chen, Huijie Zhang, Minzhe Guo, Yifu Lu, Peng Wang, Qing Qu

In this work, we improve the understanding of their semantic spaces from intriguing observations: among a certain range of noise levels, (1) the learned posterior mean predictor (PMP) in the diffusion model is locally linear, and (2) the singular vectors of its Jacobian lie in low-dimensional semantic subspaces.

Image Generation

Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering

1 code implementation4 Sep 2024 Peng Wang, Huijie Zhang, Zekai Zhang, Siyi Chen, Yi Ma, Qing Qu

Remarkably, these models can achieve this even with a small number of training samples despite a large image dimension, circumventing the curse of dimensionality.

Clustering Denoising

Platypus: A Generalized Specialist Model for Reading Text in Various Forms

1 code implementation27 Aug 2024 Peng Wang, Zhaohai Li, Jun Tang, Humen Zhong, Fei Huang, Zhibo Yang, Cong Yao

Recently, generalist models (such as GPT-4V), trained on tremendous data in a unified way, have shown enormous potential in reading text in various scenarios, but with the drawbacks of limited accuracy and low efficiency.

Handwritten Text Recognition Scene Text Recognition

Sequential-Scanning Dual-Energy CT Imaging Using High Temporal Resolution Image Reconstruction and Error-Compensated Material Basis Image Generation

no code implementations27 Aug 2024 Qiaoxin Li, Ruifeng Chen, Peng Wang, Guotao Quan, Yanfeng Du, Dong Liang, Yinsheng Li

As existing material basis image reconstruction approaches assume that the data sets acquired at two tube potentials are temporally consistent, the violation of this assumption results in inaccurate quantification of material concentration.

Image Generation Image Reconstruction +1

Enhancing Adaptive Deep Networks for Image Classification via Uncertainty-aware Decision Fusion

1 code implementation25 Aug 2024 Xu Zhang, Zhipeng Xie, Haiyang Yu, Qitong Wang, Peng Wang, Wei Wang

Based on this observation, we introduce the Collaborative Decision Making (CDM) module, which fuses the multiple classifier heads to enhance the inference performance of adaptive deep networks.

Decision Making image-classification +1

Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts

no code implementations12 Aug 2024 Peng Wu, Xuerong Zhou, Guansong Pang, Zhiwei Yang, Qingsen Yan, Peng Wang, Yanning Zhang

Existing works typically involve extracting global features from full-resolution video frames and training frame-level classifiers to detect anomalies in the temporal dimension.

Anomaly Detection Event Detection +3

Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process

no code implementations4 Aug 2024 Peng Wang, Xiaobin Wang, Chao Lou, Shengyu Mao, Pengjun Xie, Yong Jiang

In-context learning (ICL) is a few-shot learning paradigm that involves learning mappings through input-output pairs and appropriately applying them to new instances.

Diversity Few-Shot Learning +3

A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap

1 code implementation31 Jul 2024 Lijun Zhang, Wei Suo, Peng Wang, Yanning Zhang

On one hand, considering the crucial role of human-object pairs information in HOI tasks, the feature alignment module aligns the human-object pairs by aggregating instance information.

Human-Object Interaction Detection Image Reconstruction +3

Diffusion Models as Optimizers for Efficient Planning in Offline RL

1 code implementation23 Jul 2024 Renming Huang, Yunqiang Pei, Guoqing Wang, Yangming Zhang, Yang Yang, Peng Wang, HengTao Shen

To evaluate the effectiveness and efficiency of the Trajectory Diffuser, we conduct experiments on the D4RL benchmarks.

D4RL Decision Making +3

Visual Text Generation in the Wild

1 code implementation19 Jul 2024 Yuanzhi Zhu, Jiawei Liu, Feiyu Gao, Wenyu Liu, Xinggang Wang, Peng Wang, Fei Huang, Cong Yao, Zhibo Yang

However, it is still challenging to render high-quality text images in real-world scenarios, as three critical criteria should be satisfied: (1) Fidelity: the generated text images should be photo-realistic and the contents are expected to be the same as specified in the given conditions; (2) Reasonability: the regions and contents of the generated text should cohere with the scene; (3) Utility: the generated text images can facilitate related tasks (e. g., text detection and recognition).

Language Modelling Large Language Model +3

Visual Prompt Selection for In-Context Learning Segmentation

1 code implementation14 Jul 2024 Wei Suo, Lanqing Lai, Mengyang Sun, Hanwang Zhang, Peng Wang, Yanning Zhang

As a fundamental and extensively studied task in computer vision, image segmentation aims to locate and identify different semantic concepts at the pixel level.

Diversity Image Segmentation +3

CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model

no code implementations CVPR 2025 Xiaoding Yuan, Shitao Tang, Kejie Li, Alan Yuille, Peng Wang

This paper introduces Camera-free Diffusion (CamFreeDiff) model for 360-degree image outpainting from a single camera-free image and text description.

Image Outpainting

Fast and Continual Knowledge Graph Embedding via Incremental LoRA

1 code implementation8 Jul 2024 Jiajun Liu, Wenjun Ke, Peng Wang, Jiahao Wang, Jinhua Gao, Ziyu Shang, Guozheng Li, Zijie Xu, Ke Ji, Yining Li

To address this issue, we propose a fast CKGE framework (\model), incorporating an incremental low-rank adapter (\mec) mechanism to efficiently acquire new knowledge while preserving old knowledge.

Knowledge Graph Embedding Knowledge Graphs +1

Quantum Dynamics of Machine Learning

no code implementations7 Jul 2024 Peng Wang, Maimaitiniyazi Maimaitiabudula

This equation reformulates the iterative process of machine learning into a time-dependent partial differential equation with a clear mathematical structure, offering a theoretical framework for investigating machine learning iterations through quantum and mathematical theories.

CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models

no code implementations2 Jul 2024 Song Wang, Peng Wang, Tong Zhou, Yushun Dong, Zhen Tan, Jundong Li

To address these limitations, we collect a variety of datasets designed for the bias evaluation of LLMs, and further propose CEB, a Compositional Evaluation Benchmark that covers different types of bias across different social groups and tasks.

Fairness

BeNeRF: Neural Radiance Fields from a Single Blurry Image and Event Stream

1 code implementation2 Jul 2024 Wenpu Li, Pian Wan, Peng Wang, Jinghang Li, Yi Zhou, Peidong Liu

Our method can jointly learn both the implicit neural scene representation and recover the camera motion by minimizing the differences between the synthesized data and the real measurements without pre-computed camera poses from COLMAP.

NeRF

LLM Granularity for On-the-Fly Robot Control

no code implementations20 Jun 2024 Peng Wang, Mattia Robbiani, Zhihao Guo

Assistive robots have attracted significant attention due to their potential to enhance the quality of life for vulnerable individuals like the elderly.

Enhancing Visible-Infrared Person Re-identification with Modality- and Instance-aware Visual Prompt Learning

no code implementations18 Jun 2024 Ruiqi Wu, Bingliang Jiao, Wenxuan Wang, Meng Liu, Peng Wang

In this model, we have designed a series of modality-specific prompts, which could enable our model to adapt to and make use of the specific information inherent in different modality inputs, thereby reducing the interference caused by the modality gap and achieving better identification.

Person Re-Identification Prompt Learning

Autoregressive Pretraining with Mamba in Vision

1 code implementation11 Jun 2024 Sucheng Ren, Xianhang Li, Haoqin Tu, Feng Wang, Fangxun Shu, Lei Zhang, Jieru Mei, Linjie Yang, Peng Wang, Heng Wang, Alan Yuille, Cihang Xie

The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks.

Mamba

Symmetric Matrix Completion with ReLU Sampling

no code implementations9 Jun 2024 Huikang Liu, Peng Wang, Longxiu Huang, Qing Qu, Laura Balzano

We study the problem of symmetric positive semi-definite low-rank matrix completion (MC) with deterministic entry-dependent sampling.

Low-Rank Matrix Completion

Visual Prompt Tuning in Null Space for Continual Learning

1 code implementation9 Jun 2024 Yue Lu, Shizhou Zhang, De Cheng, Yinghui Xing, Nannan Wang, Peng Wang, Yanning Zhang

Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL), by selecting and updating relevant prompts in the vision-transformer models.

Continual Learning Visual Prompt Tuning

Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation

1 code implementation6 Jun 2024 Can Yaras, Peng Wang, Laura Balzano, Qing Qu

In practice, we demonstrate the effectiveness of this approach for deep low-rank matrix completion as well as fine-tuning language models.

Language Modelling Low-Rank Matrix Completion

A Global Geometric Analysis of Maximal Coding Rate Reduction

no code implementations4 Jun 2024 Peng Wang, Huikang Liu, Druv Pai, Yaodong Yu, Zhihui Zhu, Qing Qu, Yi Ma

The maximal coding rate reduction (MCR$^2$) objective for learning structured and compact deep representations is drawing increasing attention, especially after its recent usage in the derivation of fully explainable and highly effective deep network architectures.

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors

no code implementations1 Jun 2024 Qingming Liu, YuAn Liu, Jiepeng Wang, Xianqiang Lyv, Peng Wang, Wenping Wang, Junhui Hou

In this paper, we propose MoDGS, a new pipeline to render novel views of dy namic scenes from a casually captured monocular video.

Depth Estimation NeRF

A Full-duplex Speech Dialogue Scheme Based On Large Language Models

no code implementations29 May 2024 Peng Wang, Songshuo Lu, Yaohua Tang, Sijie Yan, Wei Xia, Yuanjun Xiong

It is based on a large language model (LLM) carefully aligned to be aware of a perception module, a motor function module, and the concept of a simple finite state machine (called neural FSM) with two states.

Language Modeling Language Modelling +1

RaFe: Ranking Feedback Improves Query Rewriting for RAG

no code implementations23 May 2024 Shengyu Mao, Yong Jiang, Boli Chen, Xiao Li, Peng Wang, Xinyu Wang, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang

As Large Language Models (LLMs) and Retrieval Augmentation Generation (RAG) techniques have evolved, query rewriting has been widely incorporated into the RAG system for downstream tasks like open-domain QA.

RAG Retrieval

WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models

2 code implementations23 May 2024 Peng Wang, Zexi Li, Ningyu Zhang, Ziwen Xu, Yunzhi Yao, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen

In WISE, we design a dual parametric memory scheme, which consists of the main memory for the pretrained knowledge and a side memory for the edited knowledge.

Hallucination Model Editing +2

Multi-Objective Optimization-Based Waveform Design for Multi-User and Multi-Target MIMO-ISAC Systems

no code implementations22 May 2024 Peng Wang, Dongsheng Han, Yashuai Cao, Wanli Ni, Dusit Niyato

In this paper, we investigate the waveform design problem in a downlink multi-user and multi-target ISAC system under different C&S performance preferences.

Integrated sensing and communication ISAC

Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning

no code implementations22 May 2024 Yuanhao Yue, Chengyu Wang, Jun Huang, Peng Wang

In addition, by incorporating curriculum planning, our approach systematically escalates the difficulty levels of tasks, progressively enhancing the student LLM's capabilities.

Code Generation Instruction Following +1

C3L: Content Correlated Vision-Language Instruction Tuning Data Generation via Contrastive Learning

no code implementations21 May 2024 Ji Ma, Wei Suo, Peng Wang, Yanning Zhang

Vision-Language Instruction Tuning (VLIT) is a critical training phase for Large Vision-Language Models (LVLMs).

Contrastive Learning

Learning Social Graph for Inactive User Recommendation

1 code implementation8 May 2024 Nian Liu, Shen Fan, Ting Bai, Peng Wang, Mingwei Sun, Yanhu Mo, Xiaoxiao Xu, Hong Liu, Chuan Shi

In this paper, we propose a novel social recommendation method called LSIR (\textbf{L}earning \textbf{S}ocial Graph for \textbf{I}nactive User \textbf{R}ecommendation) that learns an optimal social graph structure for social recommendation, especially for inactive users.

Graph structure learning Recommendation Systems

Towards Continual Knowledge Graph Embedding via Incremental Distillation

1 code implementation7 May 2024 Jiajun Liu, Wenjun Ke, Peng Wang, Ziyu Shang, Jinhua Gao, Guozheng Li, Ke Ji, Yanhe Liu

On the one hand, existing methods usually learn new triples in a random order, destroying the inner structure of new KGs.

Knowledge Graph Embedding

Depth Priors in Removal Neural Radiance Fields

no code implementations1 May 2024 Zhihao Guo, Peng Wang

This paper proposes a new pipeline that leverages SpinNeRF and monocular depth estimation models like ZoeDepth to enhance NeRF's performance in complex object removal with improved efficiency.

3D Reconstruction Monocular Depth Estimation +3

Dual-Modal Prompting for Sketch-Based Image Retrieval

no code implementations29 Apr 2024 Liying Gao, Bingliang Jiao, Peng Wang, Shizhou Zhang, Hanwang Zhang, Yanning Zhang

In this study, we aim to tackle two major challenges of this task simultaneously: i) zero-shot, dealing with unseen categories, and ii) fine-grained, referring to intra-category instance-level retrieval.

Retrieval Sketch-Based Image Retrieval

Recall, Retrieve and Reason: Towards Better In-Context Relation Extraction

no code implementations27 Apr 2024 Guozheng Li, Peng Wang, Wenjun Ke, Yikai Guo, Ke Ji, Ziyu Shang, Jiajun Liu, Zijie Xu

On the one hand, retrieving good demonstrations is a non-trivial process in RE, which easily results in low relevance regarding entities and relations.

In-Context Learning Language Modeling +5

Meta In-Context Learning Makes Large Language Models Better Zero and Few-Shot Relation Extractors

no code implementations27 Apr 2024 Guozheng Li, Peng Wang, Jiajun Liu, Yikai Guo, Ke Ji, Ziyu Shang, Zijie Xu

To this end, we introduce \textsc{Micre} (\textbf{M}eta \textbf{I}n-\textbf{C}ontext learning of LLMs for \textbf{R}elation \textbf{E}xtraction), a new meta-training framework for zero and few-shot RE where an LLM is tuned to do ICL on a diverse collection of RE datasets (i. e., learning to learn in context for RE).

Few-Shot Learning In-Context Learning +2

Multi-view Image Prompted Multi-view Diffusion for Improved 3D Generation

no code implementations26 Apr 2024 SeungWook Kim, Yichun Shi, Kejie Li, Minsu Cho, Peng Wang

Using image as prompts for 3D generation demonstrate particularly strong performances compared to using text prompts alone, for images provide a more intuitive guidance for the 3D generation process.

3D Generation

Enhancing Prompt Following with Visual Control Through Training-Free Mask-Guided Diffusion

no code implementations23 Apr 2024 Hongyu Chen, Yiqi Gao, Min Zhou, Peng Wang, Xubin Li, Tiezheng Ge, Bo Zheng

Meanwhile, a network, dubbed as Masked ControlNet, is designed to utilize these object masks for object generation in the misaligned visual control region.

Attribute Object

CorrespondentDream: Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences

no code implementations CVPR 2024 SeungWook Kim, Kejie Li, Xueqing Deng, Yichun Shi, Minsu Cho, Peng Wang

Leveraging multi-view diffusion models as priors for 3D optimization have alleviated the problem of 3D consistency, e. g., the Janus face problem or the content drift problem, in zero-shot text-to-3D models.

Common Sense Reasoning NeRF +1

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

no code implementations15 Apr 2024 Mude Hui, Siwei Yang, Bingchen Zhao, Yichun Shi, Heng Wang, Peng Wang, Yuyin Zhou, Cihang Xie

This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200, 000 edits.

Attribute

COCONut: Modernizing COCO Segmentation

no code implementations CVPR 2024 Xueqing Deng, Qihang Yu, Peng Wang, Xiaohui Shen, Liang-Chieh Chen

By enhancing the annotation quality and expanding the dataset to encompass 383K images with more than 5. 18M panoptic masks, we introduce COCONut, the COCO Next Universal segmenTation dataset.

Panoptic Segmentation Segmentation +1

Self-Explainable Affordance Learning with Embodied Caption

no code implementations8 Apr 2024 Zhipeng Zhang, Zhimin Wei, Guolei Sun, Peng Wang, Luc van Gool

In the field of visual affordance learning, previous methods mainly used abundant images or videos that delineate human behavior patterns to identify action possibility regions for object manipulation, with a variety of applications in robotic tasks.

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

1 code implementation CVPR 2024 Wei Dong, Xing Zhang, Bihui Chen, Dawei Yan, Zhijun Lin, Qingsen Yan, Peng Wang, Yang Yang

Parameter-efficient fine-tuning for pre-trained Vision Transformers aims to adeptly tailor a model to downstream tasks by learning a minimal set of new adaptation parameters while preserving the frozen majority of pre-trained parameters.

image-classification Image Classification +1

BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting

1 code implementation18 Mar 2024 Lingzhe Zhao, Peng Wang, Peidong Liu

In this paper, we introduce a novel approach, named BAD-Gaussians (Bundle Adjusted Deblur Gaussian Splatting), which leverages explicit Gaussian representation and handles severe motion-blurred images with inaccurate camera poses to achieve high-quality scene reconstruction.

3D Scene Reconstruction Deblurring +3

CoLeCLIP: Open-Domain Continual Learning via Joint Task Prompt and Vocabulary Learning

1 code implementation15 Mar 2024 Yukun Li, Guansong Pang, Wei Suo, Chenchen Jing, Yuling Xi, Lingqiao Liu, Hao Chen, Guoqiang Liang, Peng Wang

Large pre-trained VLMs like CLIP have demonstrated superior zero-shot recognition ability, and a number of recent studies leverage this ability to mitigate catastrophic forgetting in CL, but they focus on closed-set CL in a single domain dataset.

class-incremental learning Class Incremental Learning +2

Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning

no code implementations15 Mar 2024 Meixuan Li, Tianyu Li, Guoqing Wang, Peng Wang, Yang Yang, Heng Tao Shen

Aligning these distributions between corresponding regions from different tasks imparts higher flexibility and capacity to capture intra-region structures, accommodating a broader range of tasks.

Depth Estimation Semantic Segmentation +1

Dcl-Net: Dual Contrastive Learning Network for Semi-Supervised Multi-Organ Segmentation

no code implementations6 Mar 2024 Lu Wen, Zhenghao Feng, Yun Hou, Peng Wang, Xi Wu, Jiliu Zhou, Yan Wang

Semi-supervised learning is a sound measure to relieve the strict demand of abundant annotated datasets, especially for challenging multi-organ segmentation .

Contrastive Learning Organ Segmentation

Vision-Language Navigation with Embodied Intelligence: A Survey

no code implementations22 Feb 2024 Peng Gao, Peng Wang, Feng Gao, Fei Wang, Ruyue Yuan

As a long-term vision in the field of artificial intelligence, the core goal of embodied intelligence is to improve the perception, understanding, and interaction capabilities of agents and the environment.

Survey Vision-Language Navigation

Unlocking Instructive In-Context Learning with Tabular Prompting for Relational Triple Extraction

no code implementations21 Feb 2024 Guozheng Li, Wenjun Ke, Peng Wang, Zijie Xu, Ke Ji, Jiajun Liu, Ziyu Shang, Qiqing Luo

The in-context learning (ICL) for relational triple extraction (RTE) has achieved promising performance, but still encounters two key challenges: (1) how to design effective prompts and (2) how to select proper demonstrations.

Blocking In-Context Learning +1

GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended Tasks

1 code implementation11 Feb 2024 Mengmei Zhang, Mingwei Sun, Peng Wang, Shen Fan, Yanhu Mo, Xiaoxiao Xu, Hong Liu, Cheng Yang, Chuan Shi

Large language models (LLMs) like ChatGPT, exhibit powerful zero-shot and instruction-following capabilities, have catalyzed a revolutionary transformation across diverse fields, especially for open-ended tasks.

Graph Question Answering Instruction Following +6

TransGPT: Multi-modal Generative Pre-trained Transformer for Transportation

no code implementations11 Feb 2024 Peng Wang, Xiang Wei, Fangxu Hu, Wenjuan Han

TransGPT-MM is finetuned on a multi-modal Transportation dataset (MTD) that we manually collected from three areas of the transportation domain: driving tests, traffic signs, and landmarks.

Language Modelling Large Language Model

Image Fusion via Vision-Language Model

3 code implementations3 Feb 2024 Zixiang Zhao, Lilun Deng, Haowen Bai, Yukun Cui, Zhipeng Zhang, Yulun Zhang, Haotong Qin, Dongdong Chen, Jiangshe Zhang, Peng Wang, Luc van Gool

Therefore, we introduce a novel fusion paradigm named image Fusion via vIsion-Language Model (FILM), for the first time, utilizing explicit textual information from source images to guide the fusion process.

Decoder Language Modeling +3

Traffic Flow and Speed Monitoring Based On Optical Fiber Distributed Acoustic Sensor

no code implementations20 Jan 2024 LinLin Wang, Shixin Wang, Peng Wang, Wei Wang, Dezhao Wang, Yongcai Wang, Shanwen Wang

In the realm of intelligent transportation systems, accurate and reliable traffic monitoring is crucial.

LLM A*: Human in the Loop Large Language Models Enabled A* Search for Robotics

1 code implementation4 Dec 2023 Hengjia Xiao, Peng Wang, Mingzhe Yu, Mattia Robbiani

This research focuses on how Large Language Models (LLMs) can help with (path) planning for mobile embodied agents such as robots, in a human-in-the-loop and interactive manner.

ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation

1 code implementation2 Dec 2023 Peng Wang, Yichun Shi

We introduce "ImageDream," an innovative image-prompt, multi-view diffusion model for 3D object generation.

3D Generation Object

Matching Weak Informative Ontologies

1 code implementation1 Dec 2023 Peng Wang

In this paper, these ontologies are named as weak informative ontologies (WIOs) and it is challenging for existing methods to matching WIOs.

Ontology Matching

Continual Referring Expression Comprehension via Dual Modular Memorization

1 code implementation25 Nov 2023 Heng Tao Shen, Cheng Chen, Peng Wang, Lianli Gao, Meng Wang, Jingkuan Song

In this paper, we propose Continual Referring Expression Comprehension (CREC), a new setting for REC, where a model is learning on a stream of incoming tasks.

Memorization Referring Expression +1

Attribute-Aware Deep Hashing with Self-Consistency for Large-Scale Fine-Grained Image Retrieval

1 code implementation21 Nov 2023 Xiu-Shen Wei, Yang shen, Xuhao Sun, Peng Wang, Yuxin Peng

Our work focuses on tackling large-scale fine-grained image retrieval as ranking the images depicting the concept of interests (i. e., the same sub-category labels) highest based on the fine-grained details in the query.

Attribute Deep Hashing +2

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

no code implementations20 Nov 2023 Peng Wang, Hao Tan, Sai Bi, Yinghao Xu, Fujun Luan, Kalyan Sunkavalli, Wenping Wang, Zexiang Xu, Kai Zhang

We propose a Pose-Free Large Reconstruction Model (PF-LRM) for reconstructing a 3D object from a few unposed images even with little visual overlap, while simultaneously estimating the relative camera poses in ~1. 3 seconds on a single A100 GPU.

3D Reconstruction Image to 3D +1

DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

no code implementations15 Nov 2023 Yinghao Xu, Hao Tan, Fujun Luan, Sai Bi, Peng Wang, Jiahao Li, Zifan Shi, Kalyan Sunkavalli, Gordon Wetzstein, Zexiang Xu, Kai Zhang

We propose \textbf{DMV3D}, a novel 3D generation approach that uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion.

3D Generation Denoising +3

Open-Vocabulary Video Anomaly Detection

no code implementations CVPR 2024 Peng Wu, Xuerong Zhou, Guansong Pang, Yujia Sun, Jing Liu, Peng Wang, Yanning Zhang

Particularly, we devise a semantic knowledge injection module to introduce semantic knowledge from large language models for the detection task, and design a novel anomaly synthesis module to generate pseudo unseen anomaly videos with the help of large vision generation models for the classification task.

Anomaly Detection Weakly-supervised Video Anomaly Detection

SCL-VI: Self-supervised Context Learning for Visual Inspection of Industrial Defects

1 code implementation11 Nov 2023 Peng Wang, Haiming Yao, Wenyong Yu

Current unsupervised models struggle to strike a balance between detecting texture and object defects, lacking the capacity to discern latent representations and intricate features.

Anomaly Detection Self-Supervised Learning

Understanding Deep Representation Learning via Layerwise Feature Compression and Discrimination

1 code implementation6 Nov 2023 Peng Wang, Xiao Li, Can Yaras, Zhihui Zhu, Laura Balzano, Wei Hu, Qing Qu

To the best of our knowledge, this is the first quantitative characterization of feature evolution in hierarchical representations of deep linear networks.

Feature Compression Multi-class Classification +2

PERF: Panoramic Neural Radiance Field from a Single Panorama

1 code implementation25 Oct 2023 Guangcong Wang, Peng Wang, Zhaoxi Chen, Wenping Wang, Chen Change Loy, Ziwei Liu

In this paper, we present PERF, a 360-degree novel view synthesis framework that trains a panoramic neural radiance field from a single panorama.

NeRF Novel View Synthesis +1

Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation

no code implementations24 Oct 2023 Yinjie Lei, Zixuan Wang, Feng Chen, Guoqing Wang, Peng Wang, Yang Yang

Multi-modal 3D scene understanding has gained considerable attention due to its wide applications in many areas, such as autonomous driving and human-computer interaction.

Autonomous Driving Scene Understanding

Efficient Adaptation of Large Vision Transformer via Adapter Re-Composing

1 code implementation NeurIPS 2023 Wei Dong, Dawei Yan, Zhijun Lin, Peng Wang

Consequently, effectively adapting large pre-trained models to downstream tasks in an efficient manner has become a prominent research area.

ARC image-classification +2

Generalized Neural Collapse for a Large Number of Classes

no code implementations9 Oct 2023 Jiachen Jiang, Jinxin Zhou, Peng Wang, Qing Qu, Dustin Mixon, Chong You, Zhihui Zhu

However, most of the existing empirical and theoretical studies in neural collapse focus on the case that the number of classes is small relative to the dimension of the feature space.

Face Recognition Retrieval

The Emergence of Reproducibility and Generalizability in Diffusion Models

1 code implementation8 Oct 2023 Huijie Zhang, Jinfan Zhou, Yifu Lu, Minzhe Guo, Peng Wang, Liyue Shen, Qing Qu

In this work, we investigate an intriguing and prevalent phenomenon of diffusion models which we term as "consistent model reproducibility": given the same starting noise input and a deterministic sampler, different diffusion models often yield remarkably similar outputs.

Image Generation Memorization

Revisiting Large Language Models as Zero-shot Relation Extractors

no code implementations8 Oct 2023 Guozheng Li, Peng Wang, Wenjun Ke

On the one hand, we analyze the drawbacks of existing RE prompts and attempt to incorporate recent prompt techniques such as chain-of-thought (CoT) to improve zero-shot RE.

Question Answering Relation +1

Consistent-1-to-3: Consistent Image to 3D View Synthesis via Geometry-aware Diffusion Models

no code implementations4 Oct 2023 Jianglong Ye, Peng Wang, Kejie Li, Yichun Shi, Heng Wang

Specifically, we decompose the NVS task into two stages: (i) transforming observed regions to a novel view, and (ii) hallucinating unseen regions.

Image to 3D Novel View Synthesis

Human-centric Behavior Description in Videos: New Benchmark and Model

no code implementations4 Oct 2023 Lingru Zhou, Yiqi Gao, Manqing Zhang, Peng Wu, Peng Wang, Yanning Zhang

To address this challenge, we construct a human-centric video surveillance captioning dataset, which provides detailed descriptions of the dynamic behaviors of 7, 820 individuals.

Video Captioning

USB-NeRF: Unrolling Shutter Bundle Adjusted Neural Radiance Fields

1 code implementation4 Oct 2023 Moyang Li, Peng Wang, Lingzhe Zhao, Bangyan Liao, Peidong Liu

USB-NeRF is able to correct rolling shutter distortions and recover accurate camera motion trajectory simultaneously under the framework of NeRF, by modeling the physical image formation process of a RS camera.

Camera Pose Estimation Image Generation +4

Selective Feature Adapter for Dense Vision Transformers

no code implementations3 Oct 2023 Xueqing Deng, Qi Fan, Xiaojie Jin, Linjie Yang, Peng Wang

Specifically, SFA consists of external adapters and internal adapters which are sequentially operated over a transformer model.

Depth Estimation

MMPI: a Flexible Radiance Field Representation by Multiple Multi-plane Images Blending

no code implementations30 Sep 2023 Yuze He, Peng Wang, Yubin Hu, Wang Zhao, Ran Yi, Yong-Jin Liu, Wenping Wang

In this paper, we explore the potential of MPI and show that MPI can synthesize high-quality novel views of complex scenes with diverse camera distributions and view directions, which are not only limited to simple forward-facing scenes.

Autonomous Driving NeRF +1

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer

no code implementations14 Sep 2023 Peng Wang, Yifan Yang, Zheng Liang, Tian Tan, Shiliang Zhang, Xie Chen

Despite advancements of end-to-end (E2E) models in speech recognition, named entity recognition (NER) is still challenging but critical for semantic understanding.

Language Modeling Language Modelling +5

MVDream: Multi-view Diffusion for 3D Generation

4 code implementations31 Aug 2023 Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, Xiao Yang

We introduce MVDream, a diffusion model that is able to generate consistent multi-view images from a given text prompt.

3D Generation Prompt Learning

TouchStone: Evaluating Vision-Language Models by Language Models

1 code implementation31 Aug 2023 Shuai Bai, Shusheng Yang, Jinze Bai, Peng Wang, Xingxuan Zhang, Junyang Lin, Xinggang Wang, Chang Zhou, Jingren Zhou

Large vision-language models (LVLMs) have recently witnessed rapid advancements, exhibiting a remarkable capacity for perceiving, understanding, and processing visual information by connecting visual receptor with large language models (LLMs).

Visual Storytelling

Ground-to-Aerial Person Search: Benchmark Dataset and Approach

1 code implementation24 Aug 2023 Shizhou Zhang, Qingchun Yang, De Cheng, Yinghui Xing, Guoqiang Liang, Peng Wang, Yanning Zhang

In this work, we construct a large-scale dataset for Ground-to-Aerial Person Search, named G2APS, which contains 31, 770 images of 260, 559 annotated bounding boxes for 2, 644 identities appearing in both of the UAVs and ground surveillance cameras.

Knowledge Distillation Person Search

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

2 code implementations24 Aug 2023 Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, Jingren Zhou

In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models (LVLMs) designed to perceive and understand both texts and images.

Chart Question Answering FS-MEVQA +11

VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection

1 code implementation22 Aug 2023 Peng Wu, Xuerong Zhou, Guansong Pang, Lingru Zhou, Qingsen Yan, Peng Wang, Yanning Zhang

With the benefit of dual branch, VadCLIP achieves both coarse-grained and fine-grained video anomaly detection by transferring pre-trained knowledge from CLIP to WSVAD task.

Anomaly Detection Binary Classification +1

Polymerized Feature-based Domain Adaptation for Cervical Cancer Dose Map Prediction

no code implementations20 Aug 2023 Jie Zeng, Zeyu Han, Xingchen Peng, Jianghong Xiao, Peng Wang, Yan Wang

Recently, deep learning (DL) has automated and accelerated the clinical radiation therapy (RT) planning significantly by predicting accurate dose maps.

Domain Adaptation

Contrastive Diffusion Model with Auxiliary Guidance for Coarse-to-Fine PET Reconstruction

1 code implementation20 Aug 2023 Zeyu Han, YuHan Wang, Luping Zhou, Peng Wang, Binyu Yan, Jiliu Zhou, Yan Wang, Dinggang Shen

To obtain high-quality positron emission tomography (PET) scans while reducing radiation exposure to the human body, various approaches have been proposed to reconstruct standard-dose PET (SPET) images from low-dose PET (LPET) images.

Pre-training with Large Language Model-based Document Expansion for Dense Passage Retrieval

no code implementations16 Aug 2023 Guangyuan Ma, Xing Wu, Peng Wang, Zijia Lin, Songlin Hu

Concretely, we leverage the capabilities of LLMs for document expansion, i. e. query generation, and effectively transfer expanded knowledge to retrievers using pre-training strategies tailored for passage retrieval.

Contrastive Learning Language Modeling +4

EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models

3 code implementations14 Aug 2023 Peng Wang, Ningyu Zhang, Bozhong Tian, Zekun Xi, Yunzhi Yao, Ziwen Xu, Mengru Wang, Shengyu Mao, Xiaohan Wang, Siyuan Cheng, Kangwei Liu, Yuansheng Ni, Guozhou Zheng, Huajun Chen

Large Language Models (LLMs) usually suffer from knowledge cutoff or fallacy issues, which means they are unaware of unseen events or generate text with incorrect facts owing to outdated/noisy data.

knowledge editing

AerialVLN: Vision-and-Language Navigation for UAVs

1 code implementation ICCV 2023 Shubo Liu, Hongsheng Zhang, Yuankai Qi, Peng Wang, Yaning Zhang, Qi Wu

Navigating in the sky is more complicated than on the ground because agents need to consider the flying height and more complex spatial relationship reasoning.

cross-modal alignment Navigate +1

Wireless Edge Content Broadcast via Integrated Terrestrial and Non-terrestrial Networks

no code implementations10 Aug 2023 Feng Wang, Giovanni Geraci, Lingxiang Li, Peng Wang, Tony Q. S. Quek

In this paper, we introduce a novel approach to optimize wireless edge content placement using NTN, positioning NTN as a complement to TN for achieving optimal content broadcasting.

TriDo-Former: A Triple-Domain Transformer for Direct PET Reconstruction from Low-Dose Sinograms

no code implementations10 Aug 2023 Jiaqi Cui, Pinxian Zeng, Xinyi Zeng, Peng Wang, Xi Wu, Jiliu Zhou, Yan Wang, Dinggang Shen

Specifically, the TriDo-Former consists of two cascaded networks, i. e., a sinogram enhancement transformer (SE-Former) for denoising the input LPET sinograms and a spatial-spectral reconstruction transformer (SSR-Former) for reconstructing SPET images from the denoised sinograms.

Denoising Image Reconstruction +1

A Survey on Deep Learning-based Spatio-temporal Action Detection

no code implementations3 Aug 2023 Peng Wang, Fanwei Zeng, Yuntao Qian

Spatio-temporal action detection (STAD) aims to classify the actions present in a video and localize them in space and time.

Action Detection Autonomous Driving +2

Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition

2 code implementations25 Jul 2023 Cheng Da, Peng Wang, Cong Yao

Specifically, MGP-STR achieves an average recognition accuracy of $94\%$ on standard benchmarks for scene text recognition.

Language Modelling Optical Character Recognition (OCR) +1

Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model

1 code implementation24 Jul 2023 Peng Wu, Jing Liu, Xiangteng He, Yuxin Peng, Peng Wang, Yanning Zhang

In this context, we propose a novel task called Video Anomaly Retrieval (VAR), which aims to pragmatically retrieve relevant anomalous videos by cross-modalities, e. g., language descriptions and synchronous audios.

Anomaly Detection Retrieval +2

Pre-train, Adapt and Detect: Multi-Task Adapter Tuning for Camouflaged Object Detection

no code implementations20 Jul 2023 Yinghui Xing, Dexuan Kong, Shizhou Zhang, Geng Chen, Lingyan Ran, Peng Wang, Yanning Zhang

Camouflaged object detection (COD), aiming to segment camouflaged objects which exhibit similar patterns with the background, is a challenging task.

Multi-Task Learning object-detection +1

Watch out Venomous Snake Species: A Solution to SnakeCLEF2023

1 code implementation19 Jul 2023 Feiran Hu, Peng Wang, Yangyang Li, Chenlong Duan, Zijian Zhu, Fei Wang, Faen Zhang, Yong Li, Xiu-Shen Wei

The SnakeCLEF2023 competition aims to the development of advanced algorithms for snake species identification through the analysis of images and accompanying metadata.

Data Augmentation

DiffDP: Radiotherapy Dose Prediction via a Diffusion Model

no code implementations19 Jul 2023 Zhenghao Feng, Lu Wen, Peng Wang, Binyu Yan, Xi Wu, Jiliu Zhou, Yan Wang

To alleviate this limitation, we innovatively introduce a diffusion-based dose prediction (DiffDP) model for predicting the radiotherapy dose distribution of cancer patients.

Anatomy model +1

6G Network Business Support System

no code implementations19 Jul 2023 Ye Ouyang, Yaqin Zhang, Peng Wang, Yunxin Liu, Wen Qiao, Jun Zhu, Yang Liu, Feng Zhang, Shuling Wang, Xidong Wang

6G is the next-generation intelligent and integrated digital information infrastructure, characterized by ubiquitous interconnection, native intelligence, multi-dimensional perception, global coverage, green and low-carbon, native network security, etc.

Sparsified Simultaneous Confidence Intervals for High-Dimensional Linear Models

no code implementations14 Jul 2023 Xiaorui Zhu, Yichen Qin, Peng Wang

A critical question remains unsettled; that is, is it possible and how to embed the inference of the model into the simultaneous inference of the coefficients?

Model Selection

FedDCT: A Dynamic Cross-Tier Federated Learning Framework in Wireless Networks

no code implementations10 Jul 2023 Youquan Xian, Xiaoyun Gan, Chuanjian Yao, Dongcheng Li, Peng Wang, Peng Liu, Ying Zhao

Federated Learning (FL), as a privacy-preserving machine learning paradigm, trains a global model across devices without exposing local data.

Federated Learning Privacy Preserving

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion

1 code implementation NeurIPS 2023 Shitao Tang, Fuyang Zhang, Jiacheng Chen, Peng Wang, Yasutaka Furukawa

This paper introduces MVDiffusion, a simple yet effective method for generating consistent multi-view images from text prompts given pixel-to-pixel correspondences (e. g., perspective crops from a panorama or multi-view images given depth maps and poses).

Image Generation

Fast and Automatic 3D Modeling of Antenna Structure Using CNN-LSTM Network for Efficient Data Generation

no code implementations27 Jun 2023 Zhaohui Wei, Zhao Zhou, Peng Wang, Jian Ren, Yingzeng Yin, Gert Frølund Pedersen, Ming Shen

In this study, we proposed a deep learning-assisted and image-based intelligent modeling approach for accelerating the data acquisition of antenna samples with different physical structures.

The Law of Parsimony in Gradient Descent for Learning Deep Linear Networks

1 code implementation1 Jun 2023 Can Yaras, Peng Wang, Wei Hu, Zhihui Zhu, Laura Balzano, Qing Qu

Second, it allows us to better understand deep representation learning by elucidating the linear progressive separation and concentration of representations from shallow to deep layers.

Representation Learning

Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental Learning

1 code implementation1 Jun 2023 Shengqin Jiang, Yaoyu Fang, Haokui Zhang, Qingshan Liu, Yuankai Qi, Yang Yang, Peng Wang

Rehearsal-based video incremental learning often employs knowledge distillation to mitigate catastrophic forgetting of previously learned data.

Incremental Learning Knowledge Distillation +1

Learning Conditional Attributes for Compositional Zero-Shot Learning

1 code implementation CVPR 2023 Qingsheng Wang, Lingqiao Liu, Chenchen Jing, Hao Chen, Guoqiang Liang, Peng Wang, Chunhua Shen

Compositional Zero-Shot Learning (CZSL) aims to train models to recognize novel compositional concepts based on learned concepts such as attribute-object combinations.

Attribute Compositional Zero-Shot Learning

Continuous and Noninvasive Measurement of Arterial Pulse Pressure and Pressure Waveform using an Image-free Ultrasound System

no code implementations29 May 2023 Lirui Xu, Pang Wu, Pan Xia, Fanglin Geng, Peng Wang, Xianxiang Chen, Zhenfeng Li, Lidong Du, Shuping Liu, Li Li, Hongbo Chang, Zhen Fang

In in vitro cardiovascular phantom experiments, the results demonstrated high accuracy in the measurement of PP (error < 3 mmHg) and blood pressure waveform (root-mean-square-errors (RMSE) < 2 mmHg, correlation coefficient (r) > textgreater 0. 99).

NeRO: Neural Geometry and BRDF Reconstruction of Reflective Objects from Multiview Images

1 code implementation27 May 2023 YuAn Liu, Peng Wang, Cheng Lin, Xiaoxiao Long, Jiepeng Wang, Lingjie Liu, Taku Komura, Wenping Wang

We present a neural rendering-based method called NeRO for reconstructing the geometry and the BRDF of reflective objects from multiview images captured in an unknown environment.

Neural Rendering Object

A New Comprehensive Benchmark for Semi-supervised Video Anomaly Detection and Anticipation

no code implementations CVPR 2023 Congqi Cao, Yue Lu, Peng Wang, Yanning Zhang

At present, it is the largest semi-supervised VAD dataset with the largest number of scenes and classes of anomalies, the longest duration, and the only one considering the scene-dependent anomaly.

Anomaly Detection Video Anomaly Detection

Editing Large Language Models: Problems, Methods, and Opportunities

5 code implementations22 May 2023 Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, Ningyu Zhang

Our objective is to provide valuable insights into the effectiveness and feasibility of each editing technique, thereby assisting the community in making informed decisions on the selection of the most appropriate method for a specific task or context.

Model Editing

Cannot find the paper you are looking for? You can Submit a new open access paper.