1 code implementation • Findings (NAACL) 2022 • Zhen Zhang, Wei Zhu, Jinfan Zhang, Peng Wang, Rize Jin, Tae-Sun Chung
In this work, we propose Patient and Confident Early Exiting BERT (PCEE-BERT), an off-the-shelf sample-dependent early exiting method that can work with different PLMs and can also work along with popular model compression methods.
no code implementations • Findings (EMNLP) 2021 • Zhe Pan, Peng Wang
Existing embedding methods are mostly built on Euclidean space, which are difficult to handle hierarchical structures.
no code implementations • ICML 2020 • Peng Wang, Zirui Zhou, Anthony Man-Cho So
In this paper, we focus on the problem of exactly recovering the communities in a binary symmetric SBM, where a graph of $n$ vertices is partitioned into two equal-sized communities and the vertices are connected with probability $p = \alpha\log(n)/n$ within communities and $q = \beta\log(n)/n$ across communities for some $\alpha>\beta>0$.
no code implementations • 26 Jun 2025 • Li Fan, Peng Wang, Jing Yang, Cong Shen
However, prior ICL-based Transformer models rely on deep architectures with many layers to achieve satisfactory performance, resulting in substantial storage and computational costs.
no code implementations • 25 Jun 2025 • Po Chen, Rujun Jiang, Peng Wang
In this work, we aim to fill this gap by conducting a comprehensive study of the loss landscape of the regularized DMF problem.
no code implementations • 25 Jun 2025 • Yun Xu, Yunxiao Bai, Yunyong Zhang, Peng Wang, Xuelin Wang, Jiqun Guo, Kaijun Xie, Rusheng Zhao
The growing integration of renewable energy sources necessitates adequate reserve capacity to maintain power balance.
no code implementations • 9 Jun 2025 • Haizhao Jing, Haokui Zhang, Zhenhao Shang, Rong Xiao, Peng Wang, Yanning Zhang
Specifically, inspired by large language models (LLMs), we propose a language embedding framework where both neural architectures and hardware platform specifications are projected into a unified semantic space through tokenization and LLM processing, enabling zero-shot prediction across different hardware platforms for the first time.
no code implementations • 5 Jun 2025 • Peng Wang, Yichun Shi, Xiaochen Lian, Zhonghua Zhai, Xin Xia, Xuefeng Xiao, Weilin Huang, Jianchao Yang
We introduce SeedEdit 3. 0, in companion with our T2I model Seedream 3. 0, which significantly improves over our previous SeedEdit versions in both aspects of edit instruction following and image content (e. g., ID/IP) preservation on real image inputs.
1 code implementation • 4 Jun 2025 • Xiaomi LLM-Core Team, :, Zihao Yue, Zhenru Lin, YiFan Song, Weikun Wang, Shuhuai Ren, Shuhao Gu, Shicheng Li, Peidian Li, Liang Zhao, Lei LI, Kainan Bao, Hao Tian, Hailin Zhang, Gang Wang, Dawei Zhu, Cici, Chenhong He, Bowen Ye, Bowen Shen, Zihan Zhang, Zihan Jiang, Zhixian Zheng, Zhichao Song, Zhenbo Luo, Yue Yu, Yudong Wang, Yuanyuan Tian, Yu Tu, Yihan Yan, Yi Huang, Xu Wang, Xinzhe Xu, Xingchen Song, Xing Zhang, Xing Yong, Xin Zhang, Xiangwei Deng, Wenyu Yang, Wenhan Ma, Weiwei Lv, Weiji Zhuang, Wei Liu, Sirui Deng, Shuo Liu, Shimao Chen, Shihua Yu, Shaohui Liu, Shande Wang, Rui Ma, Qiantong Wang, Peng Wang, Nuo Chen, Menghang Zhu, Kangyang Zhou, Kang Zhou, Kai Fang, Jun Shi, Jinhao Dong, Jiebao Xiao, Jiaming Xu, Huaqiu Liu, Hongshen Xu, Heng Qu, Haochen Zhao, Hanglong Lv, Guoan Wang, Duo Zhang, Dong Zhang, Di Zhang, Chong Ma, Chang Liu, Can Cai, Bingquan Xia
We open-source MiMo-VL-7B-SFT and MiMo-VL-7B-RL, two powerful vision-language models delivering state-of-the-art performance in both general visual understanding and multimodal reasoning.
no code implementations • 3 Jun 2025 • Di Chang, Mingdeng Cao, Yichun Shi, Bo Liu, Shengqu Cai, Shijie Zhou, Weilin Huang, Gordon Wetzstein, Mohammad Soleymani, Peng Wang
To address this gap, we introduce ByteMorph, a comprehensive framework for instruction-based image editing with an emphasis on non-rigid motions.
no code implementations • 26 May 2025 • Huijie Zhang, Zijian Huang, Siyi Chen, Jinfan Zhou, Zekai Zhang, Peng Wang, Qing Qu
Diffusion models have emerged as a powerful class of generative models, capable of producing high-quality samples that generalize beyond the training data.
no code implementations • 22 May 2025 • Hai-Feng Zhang, Zhao-Yun Chen, Peng Wang, Liang-Liang Guo, Tian-Le Wang, Xiao-Yan Yang, Ren-Ze Zhao, Ze-An Zhao, Sheng Zhang, Lei Du, Hao-Ran Tao, Zhi-Long Jia, Wei-Cheng Kong, Huan-Yu Liu, Athanasios V. Vasilakos, Yang Yang, Yu-Chun Wu, Ji Guan, Peng Duan, Guo-Ping Guo
Our benchmarking framework features an efficient adversarial attack algorithm designed for QNNs, enabling quantitative characterization of adversarial robustness and robustness bounds.
1 code implementation • 21 May 2025 • Peng Wang, Biyu Zhou, Xuehai Tang, Jizhong Han, Songlin Hu
To tackle this, we model the sequential editing as a constrained stochastic programming.
no code implementations • 21 May 2025 • Kefan Song, Amir Moeini, Peng Wang, Lei Gong, Rohan Chandra, Yanjun Qi, Shangtong Zhang
At the next round, we prompt the LLM again with the same task and a context consisting of all previous responses and rewards.
1 code implementation • 21 May 2025 • Peng Wang, Ruihan Tao, Qiguang Chen, Mengkang Hu, Libo Qin
To fill this gap, we introduce X-WebAgentBench, a novel multilingual agent benchmark in an interactive web environment, which evaluates the planning and interaction performance of language agents across multiple languages, thereby contributing to the advancement of global agent intelligence.
4 code implementations • 14 May 2025 • An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, Le Yu, Lianghao Deng, Mei Li, Mingfeng Xue, Mingze Li, Pei Zhang, Peng Wang, Qin Zhu, Rui Men, Ruize Gao, Shixuan Liu, Shuang Luo, TianHao Li, Tianyi Tang, Wenbiao Yin, Xingzhang Ren, Xinyu Wang, Xinyu Zhang, Xuancheng Ren, Yang Fan, Yang Su, Yichang Zhang, Yinger Zhang, Yu Wan, Yuqiong Liu, Zekun Wang, Zeyu Cui, Zhenru Zhang, Zhipeng Zhou, Zihan Qiu
In this work, we present Qwen3, the latest version of the Qwen model family.
1 code implementation • 12 May 2025 • Junjie Ye, Caishuang Huang, Zhuohan Chen, Wenjie Fu, Chenyuan Yang, Leyi Yang, Yilong Wu, Peng Wang, Meng Zhou, Xiaolong Yang, Tao Gui, Qi Zhang, Zhongchao shi, Jianping Fan, Xuanjing Huang
Instruction following evaluates large language models (LLMs) on their ability to generate outputs that adhere to user-defined constraints.
1 code implementation • 12 May 2025 • Xiaomi LLM-Core Team, :, Bingquan Xia, Bowen Shen, Cici, Dawei Zhu, Di Zhang, Gang Wang, Hailin Zhang, Huaqiu Liu, Jiebao Xiao, Jinhao Dong, Liang Zhao, Peidian Li, Peng Wang, Shihua Yu, Shimao Chen, Weikun Wang, Wenhan Ma, Xiangwei Deng, Yi Huang, YiFan Song, Zihan Jiang, Bowen Ye, Can Cai, Chenhong He, Dong Zhang, Duo Zhang, Guoan Wang, Hao Tian, Haochen Zhao, Heng Qu, Hongshen Xu, Jun Shi, Kainan Bao, Qingkai Fang, Kang Zhou, Kangyang Zhou, Lei LI, Menghang Zhu, Nuo Chen, Qiantong Wang, Shaohui Liu, Shicheng Li, Shuhao Gu, Shuhuai Ren, Shuo Liu, Sirui Deng, Weiji Zhuang, Weiwei Lv, Wenyu Yang, Xin Zhang, Xing Yong, Xing Zhang, Xingchen Song, Xinzhe Xu, Xu Wang, Yihan Yan, Yu Tu, Yuanyuan Tian, Yudong Wang, Yue Yu, Zhenru Lin, Zhichao Song, Zihao Yue
We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages.
no code implementations • 4 May 2025 • Yongming Li, Peng Wang, Bangdong Han
X-ray absorption spectroscopy, a non-destructive detection technique, offers advantages such as ease of operation, penetrative observation, and strong substance differentiation capabilities, making it well-suited for application in the field of drug detection and identification.
no code implementations • 1 May 2025 • Hanwen Jiang, Hao Tan, Peng Wang, Haian Jin, Yue Zhao, Sai Bi, Kai Zhang, Fujun Luan, Kalyan Sunkavalli, QiXing Huang, Georgios Pavlakos
The emerging 3D awareness of RayZer is attributed to two key factors.
no code implementations • 16 Apr 2025 • Zhenhuan Zhou, Yuchen Zhang, Along He, Peng Wang, Xueshuo Xie, Tao Li
Additionally, to alleviate the workload of manual annotation for dentists and fully leverage the unlabeled data, we designed a Cross-Frequency Collaborative training semi-supervised learning (SSL) Network called CFC-Net.
no code implementations • 16 Apr 2025 • Peng Wang, Weihua Wu
We formulate a joint optimization problem for vehicular transmit power, Multi-User Detection (MUD) matrices, V2V link spectrum reuse, and IRS reflection coefficients in IRS-aided V2X communication with imperfect CSI.
no code implementations • 15 Apr 2025 • Yu Gao, Lixue Gong, Qiushan Guo, Xiaoxia Hou, Zhichao Lai, Fanshi Li, Liang Li, Xiaochen Lian, Chao Liao, Liyang Liu, Wei Liu, Yichun Shi, Shiqi Sun, Yu Tian, Zhi Tian, Peng Wang, Rui Wang, Xuanda Wang, Xun Wang, Ye Wang, Guofeng Wu, Jie Wu, Xin Xia, Xuefeng Xiao, Zhonghua Zhai, Xinyu Zhang, Qi Zhang, Yuwei Zhang, Shijia Zhao, Jianchao Yang, Weilin Huang
At the data stratum, we double the dataset using a defect-aware training paradigm and a dual-axis collaborative data-sampling framework.
no code implementations • 14 Apr 2025 • Zongcan Ding, Haodong Zhang, Peng Wu, Guansong Pang, Zhiwei Yang, Peng Wang, Yanning Zhang
Extensive experiments on four benchmarks demonstrate that SlowFastVAD effectively combines the strengths of both fast and slow detectors, and achieves remarkable detection accuracy and interpretability with significantly reduced computational overhead, making it well-suited for real-world VAD applications with high reliability requirements.
no code implementations • 2 Apr 2025 • Zihan Chen, Song Wang, Zhen Tan, Xingbo Fu, Zhenyu Lei, Peng Wang, Huan Liu, Cong Shen, Jundong Li
The rapid advancements in large Language models (LLMs) have significantly enhanced their reasoning capabilities, driven by various strategies such as multi-agent collaboration.
no code implementations • 25 Mar 2025 • Laura Balzano, Tianjiao Ding, Benjamin D. Haeffele, Soo Min Kwon, Qing Qu, Peng Wang, Zhangyang Wang, Can Yaras
In this paper, we present a comprehensive review of recent advances in exploiting low-rank structures for deep learning and shed light on their mathematical foundations.
no code implementations • 24 Mar 2025 • Renpu Liu, Peng Wang, Donghao Li, Cong Shen, Jing Yang
Reinforcement Learning from Human Feedback (RLHF) has emerged as a pivotal technique for aligning artificial intelligence systems with human values, achieving remarkable success in fine-tuning large language models.
no code implementations • 24 Mar 2025 • Dawei Yan, Yang Li, Qing-Guo Chen, Weihua Luo, Peng Wang, Haokui Zhang, Chunhua Shen
Compared to single-turn dialogue, multi-turn dialogue involving multiple images better aligns with the needs of real-world human-AI interactions.
1 code implementation • CVPR 2025 • Zhenxuan Zeng, Qiao Wu, Xiyu Zhang, Lin Yuanbo Wu, Pei An, Jiaqi Yang, Ji Wang, Peng Wang
In real-world environments, a LiDAR point cloud registration method with robust generalization capabilities (across varying distances and datasets) is crucial for ensuring safety in autonomous driving and other LiDAR-based applications.
no code implementations • 12 Mar 2025 • Kaifeng Zou, Xiaoyi Feng, Peng Wang, Tao Huang, Zizhou Huang, Zhang Haihang, Yuntao Zou, Dagang Li
Generative models are widely used in visual content creation.
no code implementations • 12 Mar 2025 • Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, YuHang Zhou, Te Gao, Wangxiang Che
This survey seeks to fill this gap by offering a unified perspective on Long CoT.
1 code implementation • 10 Mar 2025 • Lixue Gong, Xiaoxia Hou, Fanshi Li, Liang Li, Xiaochen Lian, Fei Liu, Liyang Liu, Wei Liu, Wei Lu, Yichun Shi, Shiqi Sun, Yu Tian, Zhi Tian, Peng Wang, Xun Wang, Ye Wang, Guofeng Wu, Jie Wu, Xin Xia, Xuefeng Xiao, Linjie Yang, Zhonghua Zhai, Xinyu Zhang, Qi Zhang, Yuwei Zhang, Shijia Zhao, Jianchao Yang, Weilin Huang
To address these limitations, we present Seedream 2. 0, a native Chinese-English bilingual image generation foundation model that excels across diverse dimensions, which adeptly manages text prompt in both Chinese and English, supporting bilingual image generation and text rendering.
1 code implementation • CVPR 2025 • Shining Wang, Yunlong Wang, Ruiqi Wu, Bingliang Jiao, Wenxuan Wang, Peng Wang
To address this issue, previous methods attempt to reduce the differences between viewpoints by critical attributes and decoupling the viewpoints.
1 code implementation • 27 Feb 2025 • Jinghao Xin, Zhichao Liang, Zihuan Zhang, Peng Wang, Ning li
Deep Reinforcement Learning (DRL) has demonstrated potential in addressing robotic local planning problems, yet its efficacy remains constrained in highly unstructured and dynamic environments.
no code implementations • 27 Feb 2025 • Yujia Chen, Changsong Li, Yiming Wang, Qingqing Xiao, Nan Zhang, Zifan Kong, Peng Wang, Binyu Yan
To fill this gap, we propose the MIND (Multi-agent INner Dialogue), a novel paradigm that provides more immersive psychological healing environments.
1 code implementation • 27 Feb 2025 • Xuzheng Yang, Junzhuo Liu, Peng Wang, Guoqing Wang, Yang Yang, Heng Tao Shen
To address fine-grained compositional REC, we propose novel methods based on a Specialist-MLLM collaboration framework, leveraging the complementary strengths of them: Specialist Models handle simpler tasks efficiently, while MLLMs are better suited for complex reasoning.
no code implementations • 26 Feb 2025 • Wanyi Li, Wei Wei, Yongkang Luo, Peng Wang
Inspired by the brain's mechanisms for categorization and analogical learning, we propose a novel approach called Brain-inspired Analogical Mixture Prototypes (BAMP).
class-incremental learning
Few-Shot Class-Incremental Learning
+1
no code implementations • 21 Feb 2025 • Sumei Fan, Deyun Zhang, Yue Wang, Shijia Geng, Kun Lu, Meng Sang, Weilun Xu, Haixue Wang, Qinghao Zhao, Chuandong Cheng, Peng Wang, Shenda Hong
Home-based single-lead AI-ECG devices have enabled continuous, real-world cardiac monitoring.
no code implementations • 21 Feb 2025 • Yi Zhang, Fan Wei, Jingyi Li, Yan Wang, Yanyan Yu, Jianli Chen, Zipo Cai, Xinyu Liu, Wei Wang, Peng Wang, Zhong Wang
The use of children's drawings to examining their conceptual understanding has been proven to be an effective method, but there are two major problems with previous research: 1.
4 code implementations • 19 Feb 2025 • Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, Junyang Lin
We introduce Qwen2. 5-VL, the latest flagship model of Qwen vision-language series, which demonstrates significant advancements in both foundational capabilities and innovative functionalities.
Ranked #2 on
Visual Question Answering (VQA)
on VLM2-Bench
no code implementations • 16 Feb 2025 • Po Chen, Rujun Jiang, Peng Wang
The optimization foundations of deep linear networks have received significant attention lately.
no code implementations • 14 Feb 2025 • Peng Wang, Shengchao Hu, Zerui Tao, Guoxia Wang, dianhai yu, Li Shen, Quan Zheng, DaCheng Tao
Weight averaging has become a standard technique for enhancing model performance.
no code implementations • 9 Feb 2025 • Xiao Li, Zekai Zhang, Xiang Li, Siyi Chen, Zhihui Zhu, Peng Wang, Qing Qu
Diffusion models, though originally designed for generative tasks, have demonstrated impressive self-supervised representation learning capabilities.
no code implementations • 8 Feb 2025 • Qirui Wu, Shizhou Zhang, De Cheng, Yinghui Xing, Di Xu, Peng Wang, Yanning Zhang
Catastrophic forgetting is a critical chanllenge for incremental object detection (IOD).
1 code implementation • 4 Feb 2025 • Zhihao Guo, Jingxuan Su, Shenglin Wang, Jinlong Fan, Jing Zhang, Wei Zhou, Hadi Amirpour, Yunlong Zhao, Liangxiu Han, Peng Wang
3D Gaussian Splatting has emerged as an efficient photorealistic novel view synthesis method.
1 code implementation • 13 Jan 2025 • Peng Wang, Xi Zhang, Luis Badesa
In order to study the role of DSO as a stakeholder, a Stackelberg game is represented via a bi-level model: the DSO maximizes profits at the upper level, while the VPPs minimize operating costs at the lower level.
no code implementations • 6 Jan 2025 • Jiawei Liu, Yuanzhi Zhu, Feiyu Gao, Zhibo Yang, Peng Wang, Junyang Lin, Xinggang Wang, Wenyu Liu
), the text in natural scene images needs to meet the following four key criteria: (1) Fidelity: the generated text should appear as realistic as a photograph and be completely accurate, with no errors in any of the strokes.
no code implementations • 6 Jan 2025 • Guangxiao Zhang, Gaoxi Xiao, Xinghua Liu, Yan Xu, Peng Wang
This letter proposes an alternative underdetermined framework for fault location that utilizes current measurements along with the branch-bus matrix, providing another option besides the traditional voltage-based methods.
no code implementations • 4 Jan 2025 • Alec S. Xu, Can Yaras, Peng Wang, Qing Qu
In this work, we address this gap by examining the linear separation capabilities of shallow nonlinear networks.
1 code implementation • CVPR 2025 • Wei Suo, Lijun Zhang, Mengyang Sun, Lin Yuanbo Wu, Peng Wang, Yanning Zhang
Large Vision-Language Models (LVLMs) have obtained impressive performance in visual content understanding and multi-modal reasoning.
1 code implementation • CVPR 2025 • Zijie Li, Henry Li, Yichun Shi, Amir Barati Farimani, Yuval Kluger, Linjie Yang, Peng Wang
Diffusion models have gained tremendous success in text-to-image generation, yet still lag behind with visual understanding tasks, an area dominated by autoregressive vision-language models.
1 code implementation • 23 Dec 2024 • Kai Ruan, Xuan Wang, Jixiang Hong, Peng Wang, Yang Liu, Hao Sun
While Large Language Models (LLMs) have demonstrated remarkable capabilities in scientific tasks, existing evaluation frameworks primarily assess their performance using rich contextual inputs, overlooking their ability to generate novel ideas from minimal information.
1 code implementation • 20 Dec 2024 • Junjie Ye, Yilong Wu, Sixian Li, Yuming Yang, Tao Gui, Qi Zhang, Xuanjing Huang, Peng Wang, Zhongchao shi, Jianping Fan, Zhengyin Du
Large language models (LLMs) achieve remarkable advancements by leveraging tools to interact with external environments, a critical step toward generalized AI.
no code implementations • 10 Dec 2024 • Can Yaras, Siyi Chen, Peng Wang, Qing Qu
Models such as Contrastive Language-Image Pretraining (CLIP) are designed to bridge different modalities, such as images and text, by learning a shared representation space through contrastive learning.
no code implementations • 9 Dec 2024 • Wei Suo, Ji Ma, Mengyang Sun, Lin Yuanbo Wu, Peng Wang, Yanning Zhang
Although Large Vision-Language Models (LVLMs) have achieved impressive results, their high computational cost poses a significant barrier to wider application.
no code implementations • 6 Dec 2024 • Yuanhao Yue, Chengyu Wang, Jun Huang, Peng Wang
Specializing LLMs in various domain-specific tasks has emerged as a critical step towards achieving high performance.
no code implementations • 3 Dec 2024 • Zhibo Yang, Jun Tang, Zhaohai Li, Pengfei Wang, Jianqiang Wan, Humen Zhong, Xuejing Liu, Mingkun Yang, Peng Wang, Shuai Bai, Lianwen Jin, Junyang Lin
The current landscape lacks a comprehensive benchmark to effectively measure the literate capabilities of LMMs.
no code implementations • 3 Dec 2024 • Wenxuan Wang, Chenglei Wang, Huihui Qi, Menghao Ye, Xuelin Qian, Peng Wang, Yanning Zhang
With the wide application of deep neural network models in various computer vision tasks, there has been a proliferation of adversarial example generation strategies aimed at deeply exploring model security.
no code implementations • 2 Dec 2024 • Dongsheng Han, Peng Wang, Wanli Ni, Wen Wang, Ailing Zheng, Dusit Niyato, Naofal Al-Dhahir
We propose a MF-RIS-enabled multi-user and multi-target ISAC system, and formulate an optimization problem to maximize the signal-to-interference-plus-noise ratio (SINR) of sensing targets.
no code implementations • 29 Nov 2024 • Jiepeng Wang, YuAn Liu, Peng Wang, Cheng Lin, Junhui Hou, Xin Li, Taku Komura, Wenping Wang
3D Gaussian Splatting has achieved impressive performance in novel view synthesis with real-time rendering capabilities.
1 code implementation • 28 Nov 2024 • Rui Xu, Wenyue Chen, Jiepeng Wang, YuAn Liu, Peng Wang, Lin Gao, Shiqing Xin, Taku Komura, Xin Li, Wenping Wang
However, the current Gaussian primitives only have a single view-dependent color and an opacity to represent the appearance and geometry of the scene, resulting in a non-compact representation.
no code implementations • 20 Nov 2024 • Peng Wang, Li Shen, Zerui Tao, Yan Sun, Guodong Zheng, DaCheng Tao
In this work, we first generalize SGD and LAWA as Finite Weight Averaging (FWA) and explain their advantages compared to SGD from the perspective of optimization and generalization.
1 code implementation • 19 Nov 2024 • Yang Zou, Zhixin Chen, Zhipeng Zhang, Xingyuan Li, Long Ma, JinYuan Liu, Peng Wang, Yanning Zhang
In this work, we emphasize the infrared spectral distribution fidelity and propose a Contourlet refinement gate framework to restore infrared modal-specific features while preserving spectral distribution fidelity.
no code implementations • 18 Nov 2024 • Dongseok Shim, Yichun Shi, Kejie Li, H. Jin Kim, Peng Wang
Recent advancements in text-to-3D generation, building on the success of high-performance text-to-image generative models, have made it possible to create imaginative and richly textured 3D objects from textual descriptions.
no code implementations • 15 Nov 2024 • Qi Liu, Yanchen Liu, Ruifeng Li, Chenhong Cao, Yufeng Li, Xingyu Li, Peng Wang, Runhan Feng, Shiyang Bu
We systematically analyze the characteristics of the threat: dynamism, time-exciting impact, and low prior knowledge dependency.
1 code implementation • 13 Nov 2024 • Peng Wang, Lingzhe Zhao, Yin Zhang, Shiyu Zhao, Peidong Liu
In our experiments, we demonstrate that MBA-SLAM surpasses previous state-of-the-art methods in both camera localization and map reconstruction, showcasing superior performance across a range of datasets, including synthetic and real datasets featuring sharp images as well as those affected by motion blur, highlighting the versatility and robustness of our approach.
no code implementations • 12 Nov 2024 • Zifan Zeng, Chongzhe Zhang, Feng Liu, Joseph Sifakis, Qunli Zhang, Shiming Liu, Peng Wang
With the proliferation of the Large Language Model (LLM), the concept of World Models (WM) has recently attracted a great deal of attention in the AI research community, especially in the context of AI agents.
no code implementations • 11 Nov 2024 • Yichun Shi, Peng Wang, Weilin Huang
We introduce SeedEdit, a diffusion model that is able to revise a given image with any text prompt.
no code implementations • 9 Nov 2024 • Lei Yu, Shiqi Chen, Hang Yuan, Peng Wang, Zhirong Huang, Jingyuan Zhang, Chenjie Shen, Fengjun Zhang, Li Yang, Jiajia Ma
Existing smart contract vulnerability detection methods face three main issues: (1) Insufficient quality of datasets, lacking detailed explanations and precise vulnerability locations.
no code implementations • 9 Nov 2024 • Hongyu Chen, Bingliang Jiao, Wenxuan Wang, Peng Wang
By leveraging this shared textual space as an anchor, we can prompt the ReID model to embed images from various domains into a unified semantic space, thereby alleviating catastrophic forgetting caused by domain shifts.
no code implementations • 3 Nov 2024 • Fei Zhou, Peng Wang, Lei Zhang, Zhenghua Chen, Wei Wei, Chen Ding, Guosheng Lin, Yanning Zhang
Meta-learning offers a promising avenue for few-shot learning (FSL), enabling models to glean a generalizable feature embedding through episodic training on synthetic FSL tasks in a source domain.
no code implementations • 30 Oct 2024 • Wei Dong, Yuan Sun, Yiting Yang, Xing Zhang, Zhijun Lin, Qingsen Yan, Haokui Zhang, Peng Wang, Yang Yang, HengTao Shen
A common strategy for Parameter-Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViTs) involves adapting the model to downstream tasks by learning a low-rank adaptation matrix.
no code implementations • 9 Oct 2024 • Wanli Ni, Wen Wang, Ailing Zheng, Peng Wang, Changsheng You, Yonina C. Eldar, Dusit Niyato, Robert Schober
Furthermore, we present two schemes that utilize MF-RISs to enhance the performance of integrated sensing and communication (ISAC).
1 code implementation • 6 Oct 2024 • Yongheng Zhang, Qiguang Chen, Jingxuan Zhou, Peng Wang, Jiasheng Si, Jin Wang, Wenpeng Lu, Libo Qin
To address these challenges, we propose Wrong-of-Thought (WoT), which includes two core modules: (1) Multi-Perspective Verification: A multi-perspective verification method for accurately refining the reasoning process and result, and (2) Wrong Information Utilization: Utilizing wrong information to alert LLMs and reduce the probability of LLMs making same mistakes.
no code implementations • 24 Sep 2024 • Junjie Ye, Yuming Yang, Qi Zhang, Tao Gui, Xuanjing Huang, Peng Wang, Zhongchao shi, Jianping Fan
Large language models (LLMs) encode extensive world knowledge through pre-training on massive datasets, which can then be fine-tuned for the question-answering (QA) task.
1 code implementation • 23 Sep 2024 • Junzhuo Liu, Xuzheng Yang, Weiwei Li, Peng Wang
Referring Expression Comprehension (REC) is a crucial cross-modal task that objectively evaluates the capabilities of language understanding, image comprehension, and language-to-image grounding.
8 code implementations • 18 Sep 2024 • Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Yang Fan, Kai Dang, Mengfei Du, Xuancheng Ren, Rui Men, Dayiheng Liu, Chang Zhou, Jingren Zhou, Junyang Lin
We present the Qwen2-VL Series, an advanced upgrade of the previous Qwen-VL models that redefines the conventional predetermined-resolution approach in visual processing.
Ranked #3 on
Temporal Relation Extraction
on Vinoground
Natural Language Visual Grounding
Temporal Relation Extraction
+2
no code implementations • 18 Sep 2024 • Humen Zhong, Zhibo Yang, Zhaohai Li, Peng Wang, Jun Tang, Wenqing Cheng, Cong Yao
Text recognition is an inherent integration of vision and language, encompassing the visual texture in stroke patterns and the semantic context among the character sequences.
1 code implementation • 14 Sep 2024 • Hongyu Sun, Yongcai Wang, Peng Wang, Haoran Deng, Xudong Cai, Deying Li
In particular, we propose to incorporate different views of a 3D shape into a permutation-invariant set, referred to as \emph{View Set}, which removes rigid relation assumptions and facilitates adequate information exchange and fusion among views.
no code implementations • 10 Sep 2024 • Peng Wang, Xin Wen, Ruochen Cao, Chengxin Gao, Yanrong Hao, Rui Cao
We then employ a specialized weighted edge aggregation (WEA) module, which uses the cross convolution with channel-wise element-wise convolutional kernel, to integrate dynamic functional connectivity and to isolating task-relevant connections.
1 code implementation • 9 Sep 2024 • Ningyu Zhang, Zekun Xi, Yujie Luo, Peng Wang, Bozhong Tian, Yunzhi Yao, Jintian Zhang, Shumin Deng, Mengshu Sun, Lei Liang, Zhiqiang Zhang, Xiaowei Zhu, Jun Zhou, Huajun Chen
Knowledge representation has been a central aim of AI since its inception.
no code implementations • 9 Sep 2024 • Peng Wu, Chengyu Pan, Yuting Yan, Guansong Pang, Peng Wang, Yanning Zhang
Video anomaly detection (VAD) aims to discover behaviors or events deviating from the normality in videos.
1 code implementation • 6 Sep 2024 • Renming Huang, Shaochong Liu, Yunqiang Pei, Peng Wang, Guoqing Wang, Yang Yang, HengTao Shen
To achieve our goal, we propose a novel subgoal guidance learning strategy.
2 code implementations • 4 Sep 2024 • Siyi Chen, Huijie Zhang, Minzhe Guo, Yifu Lu, Peng Wang, Qing Qu
In this work, we improve the understanding of their semantic spaces from intriguing observations: among a certain range of noise levels, (1) the learned posterior mean predictor (PMP) in the diffusion model is locally linear, and (2) the singular vectors of its Jacobian lie in low-dimensional semantic subspaces.
1 code implementation • 4 Sep 2024 • Peng Wang, Huijie Zhang, Zekai Zhang, Siyi Chen, Yi Ma, Qing Qu
Remarkably, these models can achieve this even with a small number of training samples despite a large image dimension, circumventing the curse of dimensionality.
1 code implementation • 27 Aug 2024 • Peng Wang, Zhaohai Li, Jun Tang, Humen Zhong, Fei Huang, Zhibo Yang, Cong Yao
Recently, generalist models (such as GPT-4V), trained on tremendous data in a unified way, have shown enormous potential in reading text in various scenarios, but with the drawbacks of limited accuracy and low efficiency.
no code implementations • 27 Aug 2024 • Qiaoxin Li, Ruifeng Chen, Peng Wang, Guotao Quan, Yanfeng Du, Dong Liang, Yinsheng Li
As existing material basis image reconstruction approaches assume that the data sets acquired at two tube potentials are temporally consistent, the violation of this assumption results in inaccurate quantification of material concentration.
1 code implementation • 25 Aug 2024 • Xu Zhang, Zhipeng Xie, Haiyang Yu, Qitong Wang, Peng Wang, Wei Wang
Based on this observation, we introduce the Collaborative Decision Making (CDM) module, which fuses the multiple classifier heads to enhance the inference performance of adaptive deep networks.
no code implementations • 12 Aug 2024 • Peng Wu, Xuerong Zhou, Guansong Pang, Zhiwei Yang, Qingsen Yan, Peng Wang, Yanning Zhang
Existing works typically involve extracting global features from full-resolution video frames and training frame-level classifiers to detect anomalies in the temporal dimension.
no code implementations • 4 Aug 2024 • Peng Wang, Xiaobin Wang, Chao Lou, Shengyu Mao, Pengjun Xie, Yong Jiang
In-context learning (ICL) is a few-shot learning paradigm that involves learning mappings through input-output pairs and appropriately applying them to new instances.
1 code implementation • 31 Jul 2024 • Lijun Zhang, Wei Suo, Peng Wang, Yanning Zhang
On one hand, considering the crucial role of human-object pairs information in HOI tasks, the feature alignment module aligns the human-object pairs by aggregating instance information.
1 code implementation • 23 Jul 2024 • Renming Huang, Yunqiang Pei, Guoqing Wang, Yangming Zhang, Yang Yang, Peng Wang, HengTao Shen
To evaluate the effectiveness and efficiency of the Trajectory Diffuser, we conduct experiments on the D4RL benchmarks.
no code implementations • 22 Jul 2024 • Mengru Wang, Yunzhi Yao, Ziwen Xu, Shuofei Qiao, Shumin Deng, Peng Wang, Xiang Chen, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang
Understanding knowledge mechanisms in Large Language Models (LLMs) is crucial for advancing towards trustworthy AGI.
1 code implementation • 19 Jul 2024 • Yuanzhi Zhu, Jiawei Liu, Feiyu Gao, Wenyu Liu, Xinggang Wang, Peng Wang, Fei Huang, Cong Yao, Zhibo Yang
However, it is still challenging to render high-quality text images in real-world scenarios, as three critical criteria should be satisfied: (1) Fidelity: the generated text images should be photo-realistic and the contents are expected to be the same as specified in the given conditions; (2) Reasonability: the regions and contents of the generated text should cohere with the scene; (3) Utility: the generated text images can facilitate related tasks (e. g., text detection and recognition).
6 code implementations • 15 Jul 2024 • An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng Xue, Na Ni, Pei Zhang, Peng Wang, Ru Peng, Rui Men, Ruize Gao, Runji Lin, Shijie Wang, Shuai Bai, Sinan Tan, Tianhang Zhu, TianHao Li, Tianyu Liu, Wenbin Ge, Xiaodong Deng, Xiaohuan Zhou, Xingzhang Ren, Xinyu Zhang, Xipin Wei, Xuancheng Ren, Xuejing Liu, Yang Fan, Yang Yao, Yichang Zhang, Yu Wan, Yunfei Chu, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zhifang Guo, Zhihao Fan
This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models.
Ranked #3 on
Arithmetic Reasoning
on GSM8K
(using extra training data)
1 code implementation • 14 Jul 2024 • Wei Suo, Lanqing Lai, Mengyang Sun, Hanwang Zhang, Peng Wang, Yanning Zhang
As a fundamental and extensively studied task in computer vision, image segmentation aims to locate and identify different semantic concepts at the pixel level.
no code implementations • 12 Jul 2024 • Ke Ji, Peng Wang, Wenjun Ke, Guozheng Li, Jiajun Liu, Jingsheng Gao, Ziyu Shang
The main challenge is how to transfer the unstructured semantic space in PLMs to the downstream domain hierarchy.
1 code implementation • 12 Jul 2024 • Peng Wang, Yongcai Wang, Deying Li
Finally, both the refined feature embeddings and the predicted positions are integrated to enhance the object association.
Ranked #1 on
Multi-Object Tracking
on VisDrone2019
no code implementations • CVPR 2025 • Xiaoding Yuan, Shitao Tang, Kejie Li, Alan Yuille, Peng Wang
This paper introduces Camera-free Diffusion (CamFreeDiff) model for 360-degree image outpainting from a single camera-free image and text description.
1 code implementation • 8 Jul 2024 • Jiajun Liu, Wenjun Ke, Peng Wang, Jiahao Wang, Jinhua Gao, Ziyu Shang, Guozheng Li, Zijie Xu, Ke Ji, Yining Li
To address this issue, we propose a fast CKGE framework (\model), incorporating an incremental low-rank adapter (\mec) mechanism to efficiently acquire new knowledge while preserving old knowledge.
no code implementations • 7 Jul 2024 • Peng Wang, Maimaitiniyazi Maimaitiabudula
This equation reformulates the iterative process of machine learning into a time-dependent partial differential equation with a clear mathematical structure, offering a theoretical framework for investigating machine learning iterations through quantum and mathematical theories.
no code implementations • 2 Jul 2024 • Song Wang, Peng Wang, Tong Zhou, Yushun Dong, Zhen Tan, Jundong Li
To address these limitations, we collect a variety of datasets designed for the bias evaluation of LLMs, and further propose CEB, a Compositional Evaluation Benchmark that covers different types of bias across different social groups and tasks.
1 code implementation • 2 Jul 2024 • Wenpu Li, Pian Wan, Peng Wang, Jinghang Li, Yi Zhou, Peidong Liu
Our method can jointly learn both the implicit neural scene representation and recover the camera motion by minimizing the differences between the synthesized data and the real measurements without pre-computed camera poses from COLMAP.
no code implementations • 20 Jun 2024 • Peng Wang, Mattia Robbiani, Zhihao Guo
Assistive robots have attracted significant attention due to their potential to enhance the quality of life for vulnerable individuals like the elderly.
no code implementations • 18 Jun 2024 • Ruiqi Wu, Bingliang Jiao, Wenxuan Wang, Meng Liu, Peng Wang
In this model, we have designed a series of modality-specific prompts, which could enable our model to adapt to and make use of the specific information inherent in different modality inputs, thereby reducing the interference caused by the modality gap and achieving better identification.
1 code implementation • 11 Jun 2024 • Sucheng Ren, Xianhang Li, Haoqin Tu, Feng Wang, Fangxun Shu, Lei Zhang, Jieru Mei, Linjie Yang, Peng Wang, Heng Wang, Alan Yuille, Cihang Xie
The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks.
no code implementations • 9 Jun 2024 • Huikang Liu, Peng Wang, Longxiu Huang, Qing Qu, Laura Balzano
We study the problem of symmetric positive semi-definite low-rank matrix completion (MC) with deterministic entry-dependent sampling.
1 code implementation • 9 Jun 2024 • Yue Lu, Shizhou Zhang, De Cheng, Yinghui Xing, Nannan Wang, Peng Wang, Yanning Zhang
Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL), by selecting and updating relevant prompts in the vision-transformer models.
1 code implementation • 6 Jun 2024 • Can Yaras, Peng Wang, Laura Balzano, Qing Qu
In practice, we demonstrate the effectiveness of this approach for deep low-rank matrix completion as well as fine-tuning language models.
no code implementations • 4 Jun 2024 • Peng Wang, Huikang Liu, Druv Pai, Yaodong Yu, Zhihui Zhu, Qing Qu, Yi Ma
The maximal coding rate reduction (MCR$^2$) objective for learning structured and compact deep representations is drawing increasing attention, especially after its recent usage in the derivation of fully explainable and highly effective deep network architectures.
no code implementations • 1 Jun 2024 • Qingming Liu, YuAn Liu, Jiepeng Wang, Xianqiang Lyv, Peng Wang, Wenping Wang, Junhui Hou
In this paper, we propose MoDGS, a new pipeline to render novel views of dy namic scenes from a casually captured monocular video.
no code implementations • 29 May 2024 • Peng Wang, Songshuo Lu, Yaohua Tang, Sijie Yan, Wei Xia, Yuanjun Xiong
It is based on a large language model (LLM) carefully aligned to be aware of a perception module, a motor function module, and the concept of a simple finite state machine (called neural FSM) with two states.
no code implementations • 23 May 2024 • Shengyu Mao, Yong Jiang, Boli Chen, Xiao Li, Peng Wang, Xinyu Wang, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang
As Large Language Models (LLMs) and Retrieval Augmentation Generation (RAG) techniques have evolved, query rewriting has been widely incorporated into the RAG system for downstream tasks like open-domain QA.
2 code implementations • 23 May 2024 • Peng Wang, Zexi Li, Ningyu Zhang, Ziwen Xu, Yunzhi Yao, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen
In WISE, we design a dual parametric memory scheme, which consists of the main memory for the pretrained knowledge and a side memory for the edited knowledge.
no code implementations • 22 May 2024 • Peng Wang, Dongsheng Han, Yashuai Cao, Wanli Ni, Dusit Niyato
In this paper, we investigate the waveform design problem in a downlink multi-user and multi-target ISAC system under different C&S performance preferences.
no code implementations • 22 May 2024 • Yuanhao Yue, Chengyu Wang, Jun Huang, Peng Wang
In addition, by incorporating curriculum planning, our approach systematically escalates the difficulty levels of tasks, progressively enhancing the student LLM's capabilities.
no code implementations • 21 May 2024 • Ji Ma, Wei Suo, Peng Wang, Yanning Zhang
Vision-Language Instruction Tuning (VLIT) is a critical training phase for Large Vision-Language Models (LVLMs).
1 code implementation • 8 May 2024 • Nian Liu, Shen Fan, Ting Bai, Peng Wang, Mingwei Sun, Yanhu Mo, Xiaoxiao Xu, Hong Liu, Chuan Shi
In this paper, we propose a novel social recommendation method called LSIR (\textbf{L}earning \textbf{S}ocial Graph for \textbf{I}nactive User \textbf{R}ecommendation) that learns an optimal social graph structure for social recommendation, especially for inactive users.
1 code implementation • 7 May 2024 • Jiajun Liu, Wenjun Ke, Peng Wang, Ziyu Shang, Jinhua Gao, Guozheng Li, Ke Ji, Yanhe Liu
On the one hand, existing methods usually learn new triples in a random order, destroying the inner structure of new KGs.
no code implementations • 1 May 2024 • Zhihao Guo, Peng Wang
This paper proposes a new pipeline that leverages SpinNeRF and monocular depth estimation models like ZoeDepth to enhance NeRF's performance in complex object removal with improved efficiency.
no code implementations • 29 Apr 2024 • Liying Gao, Bingliang Jiao, Peng Wang, Shizhou Zhang, Hanwang Zhang, Yanning Zhang
In this study, we aim to tackle two major challenges of this task simultaneously: i) zero-shot, dealing with unseen categories, and ii) fine-grained, referring to intra-category instance-level retrieval.
no code implementations • 27 Apr 2024 • Guozheng Li, Peng Wang, Wenjun Ke, Yikai Guo, Ke Ji, Ziyu Shang, Jiajun Liu, Zijie Xu
On the one hand, retrieving good demonstrations is a non-trivial process in RE, which easily results in low relevance regarding entities and relations.
no code implementations • 27 Apr 2024 • Guozheng Li, Peng Wang, Jiajun Liu, Yikai Guo, Ke Ji, Ziyu Shang, Zijie Xu
To this end, we introduce \textsc{Micre} (\textbf{M}eta \textbf{I}n-\textbf{C}ontext learning of LLMs for \textbf{R}elation \textbf{E}xtraction), a new meta-training framework for zero and few-shot RE where an LLM is tuned to do ICL on a diverse collection of RE datasets (i. e., learning to learn in context for RE).
no code implementations • 26 Apr 2024 • SeungWook Kim, Yichun Shi, Kejie Li, Minsu Cho, Peng Wang
Using image as prompts for 3D generation demonstrate particularly strong performances compared to using text prompts alone, for images provide a more intuitive guidance for the 3D generation process.
no code implementations • 23 Apr 2024 • Hongyu Chen, Yiqi Gao, Min Zhou, Peng Wang, Xubin Li, Tiezheng Ge, Bo Zheng
Meanwhile, a network, dubbed as Masked ControlNet, is designed to utilize these object masks for object generation in the misaligned visual control region.
no code implementations • CVPR 2024 • SeungWook Kim, Kejie Li, Xueqing Deng, Yichun Shi, Minsu Cho, Peng Wang
Leveraging multi-view diffusion models as priors for 3D optimization have alleviated the problem of 3D consistency, e. g., the Janus face problem or the content drift problem, in zero-shot text-to-3D models.
no code implementations • 15 Apr 2024 • Mude Hui, Siwei Yang, Bingchen Zhao, Yichun Shi, Heng Wang, Peng Wang, Yuyin Zhou, Cihang Xie
This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200, 000 edits.
no code implementations • CVPR 2024 • Xueqing Deng, Qihang Yu, Peng Wang, Xiaohui Shen, Liang-Chieh Chen
By enhancing the annotation quality and expanding the dataset to encompass 383K images with more than 5. 18M panoptic masks, we introduce COCONut, the COCO Next Universal segmenTation dataset.
no code implementations • 8 Apr 2024 • Zhipeng Zhang, Zhimin Wei, Guolei Sun, Peng Wang, Luc van Gool
In the field of visual affordance learning, previous methods mainly used abundant images or videos that delineate human behavior patterns to identify action possibility regions for object manipulation, with a variety of applications in robotic tasks.
1 code implementation • CVPR 2024 • Wei Dong, Xing Zhang, Bihui Chen, Dawei Yan, Zhijun Lin, Qingsen Yan, Peng Wang, Yang Yang
Parameter-efficient fine-tuning for pre-trained Vision Transformers aims to adeptly tailor a model to downstream tasks by learning a minimal set of new adaptation parameters while preserving the frozen majority of pre-trained parameters.
1 code implementation • 18 Mar 2024 • Lingzhe Zhao, Peng Wang, Peidong Liu
In this paper, we introduce a novel approach, named BAD-Gaussians (Bundle Adjusted Deblur Gaussian Splatting), which leverages explicit Gaussian representation and handles severe motion-blurred images with inaccurate camera poses to achieve high-quality scene reconstruction.
1 code implementation • 15 Mar 2024 • Yukun Li, Guansong Pang, Wei Suo, Chenchen Jing, Yuling Xi, Lingqiao Liu, Hao Chen, Guoqiang Liang, Peng Wang
Large pre-trained VLMs like CLIP have demonstrated superior zero-shot recognition ability, and a number of recent studies leverage this ability to mitigate catastrophic forgetting in CL, but they focus on closed-set CL in a single domain dataset.
no code implementations • 15 Mar 2024 • Meixuan Li, Tianyu Li, Guoqing Wang, Peng Wang, Yang Yang, Heng Tao Shen
Aligning these distributions between corresponding regions from different tasks imparts higher flexibility and capacity to capture intra-region structures, accommodating a broader range of tasks.
no code implementations • 14 Mar 2024 • Yanfei Song, Bangzheng Pu, Peng Wang, Hongxu Jiang, Dong Dong, Yongxiang Cao, Yiqing Shen
Moreover, it takes only 244MB memory, which is 3. 5\% of the vanilla SAM.
no code implementations • 6 Mar 2024 • Lu Wen, Zhenghao Feng, Yun Hou, Peng Wang, Xi Wu, Jiliu Zhou, Yan Wang
Semi-supervised learning is a sound measure to relieve the strict demand of abundant annotated datasets, especially for challenging multi-organ segmentation .
no code implementations • 22 Feb 2024 • Peng Gao, Peng Wang, Feng Gao, Fei Wang, Ruyue Yuan
As a long-term vision in the field of artificial intelligence, the core goal of embodied intelligence is to improve the perception, understanding, and interaction capabilities of agents and the environment.
no code implementations • 21 Feb 2024 • Guozheng Li, Wenjun Ke, Peng Wang, Zijie Xu, Ke Ji, Jiajun Liu, Ziyu Shang, Qiqing Luo
The in-context learning (ICL) for relational triple extraction (RTE) has achieved promising performance, but still encounters two key challenges: (1) how to design effective prompts and (2) how to select proper demonstrations.
1 code implementation • 11 Feb 2024 • Mengmei Zhang, Mingwei Sun, Peng Wang, Shen Fan, Yanhu Mo, Xiaoxiao Xu, Hong Liu, Cheng Yang, Chuan Shi
Large language models (LLMs) like ChatGPT, exhibit powerful zero-shot and instruction-following capabilities, have catalyzed a revolutionary transformation across diverse fields, especially for open-ended tasks.
no code implementations • 11 Feb 2024 • Peng Wang, Xiang Wei, Fangxu Hu, Wenjuan Han
TransGPT-MM is finetuned on a multi-modal Transportation dataset (MTD) that we manually collected from three areas of the transportation domain: driving tests, traffic signs, and landmarks.
3 code implementations • 3 Feb 2024 • Zixiang Zhao, Lilun Deng, Haowen Bai, Yukun Cui, Zhipeng Zhang, Yulun Zhang, Haotong Qin, Dongdong Chen, Jiangshe Zhang, Peng Wang, Luc van Gool
Therefore, we introduce a novel fusion paradigm named image Fusion via vIsion-Language Model (FILM), for the first time, utilizing explicit textual information from source images to guide the fusion process.
no code implementations • 20 Jan 2024 • LinLin Wang, Shixin Wang, Peng Wang, Wei Wang, Dezhao Wang, Yongcai Wang, Shanwen Wang
In the realm of intelligent transportation systems, accurate and reliable traffic monitoring is crucial.
3 code implementations • 2 Jan 2024 • Ningyu Zhang, Yunzhi Yao, Bozhong Tian, Peng Wang, Shumin Deng, Mengru Wang, Zekun Xi, Shengyu Mao, Jintian Zhang, Yuansheng Ni, Siyuan Cheng, Ziwen Xu, Xin Xu, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Lei Liang, Zhiqiang Zhang, Xiaowei Zhu, Jun Zhou, Huajun Chen
In this paper, we first define the knowledge editing problem and then provide a comprehensive review of cutting-edge approaches.
Ranked #1 on
knowledge editing
on zsRE
(using extra training data)
1 code implementation • 4 Dec 2023 • Hengjia Xiao, Peng Wang, Mingzhe Yu, Mattia Robbiani
This research focuses on how Large Language Models (LLMs) can help with (path) planning for mobile embodied agents such as robots, in a human-in-the-loop and interactive manner.
1 code implementation • 2 Dec 2023 • Peng Wang, Yichun Shi
We introduce "ImageDream," an innovative image-prompt, multi-view diffusion model for 3D object generation.
1 code implementation • 1 Dec 2023 • Peng Wang
In this paper, these ontologies are named as weak informative ontologies (WIOs) and it is challenging for existing methods to matching WIOs.
1 code implementation • 25 Nov 2023 • Heng Tao Shen, Cheng Chen, Peng Wang, Lianli Gao, Meng Wang, Jingkuan Song
In this paper, we propose Continual Referring Expression Comprehension (CREC), a new setting for REC, where a model is learning on a stream of incoming tasks.
1 code implementation • 21 Nov 2023 • Xiu-Shen Wei, Yang shen, Xuhao Sun, Peng Wang, Yuxin Peng
Our work focuses on tackling large-scale fine-grained image retrieval as ranking the images depicting the concept of interests (i. e., the same sub-category labels) highest based on the fine-grained details in the query.
no code implementations • 20 Nov 2023 • Peng Wang, Hao Tan, Sai Bi, Yinghao Xu, Fujun Luan, Kalyan Sunkavalli, Wenping Wang, Zexiang Xu, Kai Zhang
We propose a Pose-Free Large Reconstruction Model (PF-LRM) for reconstructing a 3D object from a few unposed images even with little visual overlap, while simultaneously estimating the relative camera poses in ~1. 3 seconds on a single A100 GPU.
no code implementations • 16 Nov 2023 • Yimin Jing, Renren Jin, Jiahao Hu, Huishi Qiu, Xiaohua Wang, Peng Wang, Deyi Xiong
In pursuit of this goal, various benchmarks have been constructed to evaluate the instruction-following capacity of these models.
no code implementations • 15 Nov 2023 • Yinghao Xu, Hao Tan, Fujun Luan, Sai Bi, Peng Wang, Jiahao Li, Zifan Shi, Kalyan Sunkavalli, Gordon Wetzstein, Zexiang Xu, Kai Zhang
We propose \textbf{DMV3D}, a novel 3D generation approach that uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion.
no code implementations • CVPR 2024 • Peng Wu, Xuerong Zhou, Guansong Pang, Yujia Sun, Jing Liu, Peng Wang, Yanning Zhang
Particularly, we devise a semantic knowledge injection module to introduce semantic knowledge from large language models for the detection task, and design a novel anomaly synthesis module to generate pseudo unseen anomaly videos with the help of large vision generation models for the classification task.
1 code implementation • 11 Nov 2023 • Peng Wang, Haiming Yao, Wenyong Yu
Current unsupervised models struggle to strike a balance between detecting texture and object defects, lacking the capacity to discern latent representations and intricate features.
Ranked #86 on
Anomaly Detection
on MVTec AD
no code implementations • 10 Nov 2023 • Yifei Yang, Peng Wang, Xiaofan He, Dongmian Zou
Detecting unusual patterns in graph data is a crucial task in data mining.
1 code implementation • 6 Nov 2023 • Peng Wang, Xiao Li, Can Yaras, Zhihui Zhu, Laura Balzano, Wei Hu, Qing Qu
To the best of our knowledge, this is the first quantitative characterization of feature evolution in hierarchical representations of deep linear networks.
1 code implementation • 25 Oct 2023 • Guangcong Wang, Peng Wang, Zhaoxi Chen, Wenping Wang, Chen Change Loy, Ziwei Liu
In this paper, we present PERF, a 360-degree novel view synthesis framework that trains a panoramic neural radiance field from a single panorama.
no code implementations • 24 Oct 2023 • Yinjie Lei, Zixuan Wang, Feng Chen, Guoqing Wang, Peng Wang, Yang Yang
Multi-modal 3D scene understanding has gained considerable attention due to its wide applications in many areas, such as autonomous driving and human-computer interaction.
1 code implementation • NeurIPS 2023 • Wei Dong, Dawei Yan, Zhijun Lin, Peng Wang
Consequently, effectively adapting large pre-trained models to downstream tasks in an efficient manner has become a prominent research area.
no code implementations • 9 Oct 2023 • Jiachen Jiang, Jinxin Zhou, Peng Wang, Qing Qu, Dustin Mixon, Chong You, Zhihui Zhu
However, most of the existing empirical and theoretical studies in neural collapse focus on the case that the number of classes is small relative to the dimension of the feature space.
1 code implementation • 8 Oct 2023 • Huijie Zhang, Jinfan Zhou, Yifu Lu, Minzhe Guo, Peng Wang, Liyue Shen, Qing Qu
In this work, we investigate an intriguing and prevalent phenomenon of diffusion models which we term as "consistent model reproducibility": given the same starting noise input and a deterministic sampler, different diffusion models often yield remarkably similar outputs.
no code implementations • 8 Oct 2023 • Guozheng Li, Peng Wang, Wenjun Ke
On the one hand, we analyze the drawbacks of existing RE prompts and attempt to incorporate recent prompt techniques such as chain-of-thought (CoT) to improve zero-shot RE.
no code implementations • 4 Oct 2023 • Jianglong Ye, Peng Wang, Kejie Li, Yichun Shi, Heng Wang
Specifically, we decompose the NVS task into two stages: (i) transforming observed regions to a novel view, and (ii) hallucinating unseen regions.
no code implementations • 4 Oct 2023 • Lingru Zhou, Yiqi Gao, Manqing Zhang, Peng Wu, Peng Wang, Yanning Zhang
To address this challenge, we construct a human-centric video surveillance captioning dataset, which provides detailed descriptions of the dynamic behaviors of 7, 820 individuals.
1 code implementation • 4 Oct 2023 • Moyang Li, Peng Wang, Lingzhe Zhao, Bangyan Liao, Peidong Liu
USB-NeRF is able to correct rolling shutter distortions and recover accurate camera motion trajectory simultaneously under the framework of NeRF, by modeling the physical image formation process of a RS camera.
no code implementations • 3 Oct 2023 • Xueqing Deng, Qi Fan, Xiaojie Jin, Linjie Yang, Peng Wang
Specifically, SFA consists of external adapters and internal adapters which are sequentially operated over a transformer model.
no code implementations • 30 Sep 2023 • Yuze He, Peng Wang, Yubin Hu, Wang Zhao, Ran Yi, Yong-Jin Liu, Wenping Wang
In this paper, we explore the potential of MPI and show that MPI can synthesize high-quality novel views of complex scenes with diverse camera distributions and view directions, which are not only limited to simple forward-facing scenes.
2 code implementations • 28 Sep 2023 • Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou, Tianhang Zhu
Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans.
Ranked #3 on
Multi-Label Text Classification
on CC3M-TagMask
no code implementations • 14 Sep 2023 • Peng Wang, Yifan Yang, Zheng Liang, Tian Tan, Shiliang Zhang, Xie Chen
Despite advancements of end-to-end (E2E) models in speech recognition, named entity recognition (NER) is still challenging but critical for semantic understanding.
no code implementations • CVPR 2023 • Wei Suo, Mengyang Sun, Weisong Liu, Yiqi Gao, Peng Wang, Yanning Zhang, Qi Wu
VQA Natural Language Explanation (VQA-NLE) task aims to explain the decision-making process of VQA models in natural language.
4 code implementations • 31 Aug 2023 • Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, Xiao Yang
We introduce MVDream, a diffusion model that is able to generate consistent multi-view images from a given text prompt.
1 code implementation • 31 Aug 2023 • Shuai Bai, Shusheng Yang, Jinze Bai, Peng Wang, Xingxuan Zhang, Junyang Lin, Xinggang Wang, Chang Zhou, Jingren Zhou
Large vision-language models (LVLMs) have recently witnessed rapid advancements, exhibiting a remarkable capacity for perceiving, understanding, and processing visual information by connecting visual receptor with large language models (LLMs).
1 code implementation • ICCV 2023 • Changxu Cheng, Peng Wang, Cheng Da, Qi Zheng, Cong Yao
The diversity in length constitutes a significant characteristic of text.
1 code implementation • 24 Aug 2023 • Shizhou Zhang, Qingchun Yang, De Cheng, Yinghui Xing, Guoqiang Liang, Peng Wang, Yanning Zhang
In this work, we construct a large-scale dataset for Ground-to-Aerial Person Search, named G2APS, which contains 31, 770 images of 260, 559 annotated bounding boxes for 2, 644 identities appearing in both of the UAVs and ground surveillance cameras.
2 code implementations • 24 Aug 2023 • Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, Jingren Zhou
In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models (LVLMs) designed to perceive and understand both texts and images.
Ranked #2 on
Spatial Reasoning
on EmbSpatial-Bench
1 code implementation • 22 Aug 2023 • Peng Wu, Xuerong Zhou, Guansong Pang, Lingru Zhou, Qingsen Yan, Peng Wang, Yanning Zhang
With the benefit of dual branch, VadCLIP achieves both coarse-grained and fine-grained video anomaly detection by transferring pre-trained knowledge from CLIP to WSVAD task.
no code implementations • 20 Aug 2023 • Jie Zeng, Zeyu Han, Xingchen Peng, Jianghong Xiao, Peng Wang, Yan Wang
Recently, deep learning (DL) has automated and accelerated the clinical radiation therapy (RT) planning significantly by predicting accurate dose maps.
1 code implementation • 20 Aug 2023 • Zeyu Han, YuHan Wang, Luping Zhou, Peng Wang, Binyu Yan, Jiliu Zhou, Yan Wang, Dinggang Shen
To obtain high-quality positron emission tomography (PET) scans while reducing radiation exposure to the human body, various approaches have been proposed to reconstruct standard-dose PET (SPET) images from low-dose PET (LPET) images.
no code implementations • 16 Aug 2023 • Guangyuan Ma, Xing Wu, Peng Wang, Zijia Lin, Songlin Hu
Concretely, we leverage the capabilities of LLMs for document expansion, i. e. query generation, and effectively transfer expanded knowledge to retrievers using pre-training strategies tailored for passage retrieval.
3 code implementations • 14 Aug 2023 • Peng Wang, Ningyu Zhang, Bozhong Tian, Zekun Xi, Yunzhi Yao, Ziwen Xu, Mengru Wang, Shengyu Mao, Xiaohan Wang, Siyuan Cheng, Kangwei Liu, Yuansheng Ni, Guozhou Zheng, Huajun Chen
Large Language Models (LLMs) usually suffer from knowledge cutoff or fallacy issues, which means they are unaware of unseen events or generate text with incorrect facts owing to outdated/noisy data.
1 code implementation • ICCV 2023 • Shubo Liu, Hongsheng Zhang, Yuankai Qi, Peng Wang, Yaning Zhang, Qi Wu
Navigating in the sky is more complicated than on the ground because agents need to consider the flying height and more complex spatial relationship reasoning.
no code implementations • 10 Aug 2023 • Feng Wang, Giovanni Geraci, Lingxiang Li, Peng Wang, Tony Q. S. Quek
In this paper, we introduce a novel approach to optimize wireless edge content placement using NTN, positioning NTN as a complement to TN for achieving optimal content broadcasting.
no code implementations • 10 Aug 2023 • Jiaqi Cui, Pinxian Zeng, Xinyi Zeng, Peng Wang, Xi Wu, Jiliu Zhou, Yan Wang, Dinggang Shen
Specifically, the TriDo-Former consists of two cascaded networks, i. e., a sinogram enhancement transformer (SE-Former) for denoising the input LPET sinograms and a spatial-spectral reconstruction transformer (SSR-Former) for reconstructing SPET images from the denoised sinograms.
no code implementations • 3 Aug 2023 • Peng Wang, Fanwei Zeng, Yuntao Qian
Spatio-temporal action detection (STAD) aims to classify the actions present in a video and localize them in space and time.
2 code implementations • 25 Jul 2023 • Cheng Da, Peng Wang, Cong Yao
Specifically, MGP-STR achieves an average recognition accuracy of $94\%$ on standard benchmarks for scene text recognition.
1 code implementation • 24 Jul 2023 • Peng Wu, Jing Liu, Xiangteng He, Yuxin Peng, Peng Wang, Yanning Zhang
In this context, we propose a novel task called Video Anomaly Retrieval (VAR), which aims to pragmatically retrieve relevant anomalous videos by cross-modalities, e. g., language descriptions and synchronous audios.
no code implementations • 20 Jul 2023 • Yinghui Xing, Dexuan Kong, Shizhou Zhang, Geng Chen, Lingyan Ran, Peng Wang, Yanning Zhang
Camouflaged object detection (COD), aiming to segment camouflaged objects which exhibit similar patterns with the background, is a challenging task.
1 code implementation • 19 Jul 2023 • Feiran Hu, Peng Wang, Yangyang Li, Chenlong Duan, Zijian Zhu, Fei Wang, Faen Zhang, Yong Li, Xiu-Shen Wei
The SnakeCLEF2023 competition aims to the development of advanced algorithms for snake species identification through the analysis of images and accompanying metadata.
no code implementations • 19 Jul 2023 • Zhenghao Feng, Lu Wen, Peng Wang, Binyu Yan, Xi Wu, Jiliu Zhou, Yan Wang
To alleviate this limitation, we innovatively introduce a diffusion-based dose prediction (DiffDP) model for predicting the radiotherapy dose distribution of cancer patients.
no code implementations • 19 Jul 2023 • Ye Ouyang, Yaqin Zhang, Peng Wang, Yunxin Liu, Wen Qiao, Jun Zhu, Yang Liu, Feng Zhang, Shuling Wang, Xidong Wang
6G is the next-generation intelligent and integrated digital information infrastructure, characterized by ubiquitous interconnection, native intelligence, multi-dimensional perception, global coverage, green and low-carbon, native network security, etc.
no code implementations • 14 Jul 2023 • Xiaorui Zhu, Yichen Qin, Peng Wang
A critical question remains unsettled; that is, is it possible and how to embed the inference of the model into the simultaneous inference of the coefficients?
no code implementations • 10 Jul 2023 • Youquan Xian, Xiaoyun Gan, Chuanjian Yao, Dongcheng Li, Peng Wang, Peng Liu, Ying Zhao
Federated Learning (FL), as a privacy-preserving machine learning paradigm, trains a global model across devices without exposing local data.
1 code implementation • NeurIPS 2023 • Shitao Tang, Fuyang Zhang, Jiacheng Chen, Peng Wang, Yasutaka Furukawa
This paper introduces MVDiffusion, a simple yet effective method for generating consistent multi-view images from text prompts given pixel-to-pixel correspondences (e. g., perspective crops from a panorama or multi-view images given depth maps and poses).
no code implementations • 27 Jun 2023 • Zhaohui Wei, Zhao Zhou, Peng Wang, Jian Ren, Yingzeng Yin, Gert Frølund Pedersen, Ming Shen
In this study, we proposed a deep learning-assisted and image-based intelligent modeling approach for accelerating the data acquisition of antenna samples with different physical structures.
no code implementations • 8 Jun 2023 • Yuling Xi, Hao Chen, Ning Wang, Peng Wang, Yanning Zhang, Chunhua Shen, Yifan Liu
In particular, one feature merge branch is designed for instance-level recognition the other for dense predictions.
1 code implementation • 1 Jun 2023 • Can Yaras, Peng Wang, Wei Hu, Zhihui Zhu, Laura Balzano, Qing Qu
Second, it allows us to better understand deep representation learning by elucidating the linear progressive separation and concentration of representations from shallow to deep layers.
1 code implementation • 1 Jun 2023 • Shengqin Jiang, Yaoyu Fang, Haokui Zhang, Qingshan Liu, Yuankai Qi, Yang Yang, Peng Wang
Rehearsal-based video incremental learning often employs knowledge distillation to mitigate catastrophic forgetting of previously learned data.
1 code implementation • CVPR 2023 • Qingsheng Wang, Lingqiao Liu, Chenchen Jing, Hao Chen, Guoqiang Liang, Peng Wang, Chunhua Shen
Compositional Zero-Shot Learning (CZSL) aims to train models to recognize novel compositional concepts based on learned concepts such as attribute-object combinations.
Ranked #1 on
Compositional Zero-Shot Learning
on MIT-States
no code implementations • 29 May 2023 • Lirui Xu, Pang Wu, Pan Xia, Fanglin Geng, Peng Wang, Xianxiang Chen, Zhenfeng Li, Lidong Du, Shuping Liu, Li Li, Hongbo Chang, Zhen Fang
In in vitro cardiovascular phantom experiments, the results demonstrated high accuracy in the measurement of PP (error < 3 mmHg) and blood pressure waveform (root-mean-square-errors (RMSE) < 2 mmHg, correlation coefficient (r) > textgreater 0. 99).
1 code implementation • 27 May 2023 • YuAn Liu, Peng Wang, Cheng Lin, Xiaoxiao Long, Jiepeng Wang, Lingjie Liu, Taku Komura, Wenping Wang
We present a neural rendering-based method called NeRO for reconstructing the geometry and the BRDF of reflective objects from multiview images captured in an unknown environment.
no code implementations • CVPR 2023 • Congqi Cao, Yue Lu, Peng Wang, Yanning Zhang
At present, it is the largest semi-supervised VAD dataset with the largest number of scenes and classes of anomalies, the longest duration, and the only one considering the scene-dependent anomaly.
5 code implementations • 22 May 2023 • Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, Ningyu Zhang
Our objective is to provide valuable insights into the effectiveness and feasibility of each editing technique, thereby assisting the community in making informed decisions on the selection of the most appropriate method for a specific task or context.