Search Results for author: Yan Wang

Found 430 papers, 161 papers with code

zydhjh4593@SMM4H’22: A Generic Pre-trained BERT-based Framework for Social Media Health Text Classification

no code implementations SMM4H (COLING) 2022 Chenghao Huang, Xiaolu Chen, Yuxi Chen, Yutong Wu, Weimin Yuan, Yan Wang, Yanru Zhang

This paper describes our proposed framework for the 10 text classification tasks of Task 1a, 2a, 2b, 3a, 4, 5, 6, 7, 8, and 9, in the Social Media Mining for Health (SMM4H) 2022.

text-classification Text Classification

Enabling Deep Residual Networks for Weakly Supervised Object Detection

no code implementations ECCV 2020 Yunhang Shen, Rongrong Ji, Yan Wang, Zhiwei Chen, Feng Zheng, Feiyue Huang, Yunsheng Wu

Weakly supervised object detection (WSOD) has attracted extensive research attention due to its great flexibility of exploiting large-scale image-level annotation for detector training.

Object object-detection +1

NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results

no code implementations17 Apr 2025 Xin Li, Kun Yuan, Bingchen Li, Fengbin Guan, Yizhen Shao, Zihao Yu, Xijun Wang, Yiting Lu, Wei Luo, Suhang Yao, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Yabin Zhang, Ao-Xiang Zhang, Tianwu Zhi, Jianzhao Liu, Yang Li, Jingwen Xu, Yiting Liao, Yushen Zuo, Mingyang Wu, Renjie Li, Shengyun Zhong, Zhengzhong Tu, Yufan Liu, Xiangguang Chen, Zuowei Cao, Minhao Tang, Shan Liu, Kexin Zhang, Jingfen Xie, Yan Wang, Kai Chen, Shijie Zhao, Yunchen Zhang, Xiangkai Xu, Hong Gao, Ji Shi, Yiming Bao, Xiugang Dong, Xiangsheng Zhou, Yaofeng Tu, Ying Liang, Yiwen Wang, Xinning Chai, Yuxuan Zhang, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Rong Xie, Li Song, Wei Sun, Kang Fu, Linhan Cao, Dandan Zhu, Kaiwei Zhang, Yucheng Zhu, ZiCheng Zhang, Menghan Hu, Xiongkuo Min, Guangtao Zhai, Zhi Jin, Jiawei Wu, Wei Wang, Wenjian Zhang, Yuhai Lan, Gaoxiong Yi, Hengyuan Na, Wang Luo, Di wu, MingYin Bai, Jiawang Du, Zilong Lu, Zhenyu Jiang, Hui Zeng, Ziguan Cui, Zongliang Gan, Guijin Tang, Xinglin Xie, Kehuan Song, Xiaoqiang Lu, Licheng Jiao, Fang Liu, Xu Liu, Puhua Chen, Ha Thu Nguyen, Katrien De Moor, Seyed Ali Amirshahi, Mohamed-Chaker Larabi, Qi Tang, Linfeng He, Zhiyong Gao, Zixuan Gao, Guohua Zhang, Zhiye Huang, Yi Deng, Qingmiao Jiang, Lu Chen, Yi Yang, Xi Liao, Nourine Mohammed Nadir, YuXuan Jiang, Qiang Zhu, Siyue Teng, Fan Zhang, Shuyuan Zhu, Bing Zeng, David Bull, Meiqin Liu, Chao Yao, Yao Zhao

This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement.

Safe Data-Driven Predictive Control

no code implementations11 Apr 2025 Amin Vahidi-Moghaddam, Kaian Chen, Kaixiang Zhang, Zhaojian Li, Yan Wang, Kai Wu

In the realm of control systems, model predictive control (MPC) has exhibited remarkable potential; however, its reliance on accurate models and substantial computational resources has hindered its broader application, especially within real-time nonlinear systems.

Model Predictive Control

AVP-AP: Self-supervised Automatic View Positioning in 3D cardiac CT via Atlas Prompting

no code implementations8 Apr 2025 Xiaolin Fan, Yan Wang, Yingying Zhang, Mingkun Bao, Bosen Jia, Dong Lu, Yifan Gu, Jian Cheng, Haogang Zhu

We thus introduce a novel framework, AVP-AP, the first to use Atlas Prompting for self-supervised Automatic View Positioning in the 3D CT volume.

Computed Tomography (CT) SSIM

Can Test-Time Scaling Improve World Foundation Model?

1 code implementation31 Mar 2025 Wenyan Cong, Hanqing Zhu, Peihao Wang, Bangya Liu, Dejia Xu, Kevin Wang, David Z. Pan, Yan Wang, Zhiwen Fan, Zhangyang Wang

Our findings reveal that test-time scaling laws hold for WFMs and that SWIFT provides a scalable and effective pathway for improving WFM inference without retraining or increasing model size.

Autonomous Driving

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis

no code implementations28 Mar 2025 Jiangyong Huang, Baoxiong Jia, Yan Wang, Ziyu Zhu, Xiongkun Linghu, Qing Li, Song-Chun Zhu, Siyuan Huang

Our evaluation of state-of-the-art 3D-VL models on Beacon3D reveals that (i) object-centric evaluation elicits true model performance and particularly weak generalization in QA; (ii) grounding-QA coherence remains fragile in current 3D-VL models, and (iii) incorporating large language models (LLMs) to 3D-VL models, though as a prevalent practice, hinders grounding capabilities and has yet to elevate QA capabilities.

Question Answering

Fundamental Limit of Angular Resolution in Partly Calibrated Arrays with Position Errors

no code implementations27 Mar 2025 Guangbin Zhang, Yan Wang, Tianyao Huang, Yonina C. Eldar

We then theoretically analyze the declining and plateau phases of CRB, and explain that the turning point of CRB in partly calibrated arrays is close to the angular resolution limit of distributed arrays without errors, demonstrating high resolution ability.

Position

SIT-FER: Integration of Semantic-, Instance-, Text-level Information for Semi-supervised Facial Expression Recognition

1 code implementation24 Mar 2025 Sixian Ding, Xu Jiang, Zhongjing Du, Jiaqi Cui, Xinyi Zeng, Yan Wang

Specifically, for the unlabeled data, considering the comprehensive knowledge within the textual descriptions and instance representations, we respectively calculate the similarities between the facial vision features and the corresponding textual and instance features to obtain the probabilities at the text- and instance-level.

Facial Expression Recognition

MambaIC: State Space Models for High-Performance Learned Image Compression

1 code implementation16 Mar 2025 Fanhu Zeng, Hao Tang, Yihua Shao, Siyu Chen, Ling Shao, Yan Wang

Inspired by the effectiveness of state space models (SSMs) in capturing long-range dependencies, we leverage SSMs to address computational inefficiency in existing methods and improve image compression from multiple perspectives.

Image Compression State Space Models

GenDR: Lightning Generative Detail Restorator

no code implementations9 Mar 2025 Yan Wang, Shijie Zhao, Kai Chen, Kexin Zhang, Junlin Li, Li Zhang

In detail, we train a new SD2. 1-VAE16 (0. 9B) via representation alignment to expand latent space without enlarging the model size.

Super-Resolution

TR-DQ: Time-Rotation Diffusion Quantization

no code implementations9 Mar 2025 Yihua Shao, Deyang Lin, Fanhu Zeng, Minxi Yan, Muyang Zhang, Siyu Chen, Yuxuan Fan, Ziyang Yan, Haozhe Wang, Jingcai Guo, Yan Wang, Haotong Qin, Hao Tang

TR-DQ achieves state-of-the-art (SOTA) performance on image generation and video generation tasks and a 1. 38-1. 89x speedup and 1. 97-2. 58x memory reduction in inference compared to existing quantization methods.

Image Generation Quantization +1

OrdRankBen: A Novel Ranking Benchmark for Ordinal Relevance in NLP

no code implementations2 Mar 2025 Yan Wang, Lingfei Qian, Xueqing Peng, Jimin Huang, Dongji Feng

The evaluation of ranking tasks remains a significant challenge in natural language processing (NLP), particularly due to the lack of direct labels for results in real-world scenarios.

VidLBEval: Benchmarking and Mitigating Language Bias in Video-Involved LVLMs

no code implementations23 Feb 2025 Yiming Yang, Yangyang Guo, Hui Lu, Yan Wang

Recently, Large Vision-Language Models (LVLMs) have made significant strides across diverse multimodal tasks and benchmarks.

Benchmarking

Constructing a Norm for Children's Scientific Drawing: Distribution Features Based on Semantic Similarity of Large Language Models

no code implementations21 Feb 2025 Yi Zhang, Fan Wei, Jingyi Li, Yan Wang, Yanyan Yu, Jianli Chen, Zipo Cai, Xinyu Liu, Wei Wang, Peng Wang, Zhong Wang

The use of children's drawings to examining their conceptual understanding has been proven to be an effective method, but there are two major problems with previous research: 1.

Large Language Model Semantic Similarity +1

A Macro- and Micro-Hierarchical Transfer Learning Framework for Cross-Domain Fake News Detection

no code implementations20 Feb 2025 Xuankai Yang, Yan Wang, Xiuzhen Zhang, Shoujin Wang, Huaxiong Wang, Kwok Yan Lam

Secondly, we propose a macro-hierarchical transfer learning module to generate engagement features based on common users' shared behaviors in different domains for improving effectiveness of knowledge transfer.

Fake News Detection Transfer Learning

Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance

1 code implementation12 Feb 2025 Lingfei Qian, Weipeng Zhou, Yan Wang, Xueqing Peng, Han Yi, Jimin Huang, Qianqian Xie, Jianyun Nie

While large language models (LLMs) have shown strong general reasoning capabilities, their effectiveness in financial reasoning, which is crucial for real-world financial applications remains underexplored.

Benchmarking Long-Context Understanding

Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving

1 code implementation11 Feb 2025 Xiang Li, Pengfei Li, Yupeng Zheng, Wei Sun, Yan Wang, Yilun Chen

Considering the high annotation cost for 3D outdoor scenes, we propose a semi-supervised vision-centric 3D occupancy world model, PreWorld, to leverage the potential of 2D labels through a novel two-stage training paradigm: the self-supervised pre-training stage and the fully-supervised fine-tuning stage.

Attribute Autonomous Driving +1

A Survey on Video Analytics in Cloud-Edge-Terminal Collaborative Systems

no code implementations10 Feb 2025 Linxiao Gong, Hao Yang, Gaoyun Fang, Bobo Ju, Juncen Guo, Xiaoguang Zhu, Xiping Hu, Yan Wang, Peng Sun, Azzedine Boukerche

The explosive growth of video data has driven the development of distributed video analytics in cloud-edge-terminal collaborative (CETC) systems, enabling efficient video processing, real-time inference, and privacy-preserving analysis.

Autonomous Driving Edge-computing +3

A Comprehensive Review of Protein Language Models

1 code implementation8 Feb 2025 Lei Wang, Xudong Li, Han Zhang, Jinyi Wang, Dingkang Jiang, Zhidong Xue, Yan Wang

At the intersection of the rapidly growing biological data landscape and advancements in Natural Language Processing (NLP), protein language models (PLMs) have emerged as a transformative force in modern research.

Rethinking Diffusion Posterior Sampling: From Conditional Score Estimator to Maximizing a Posterior

1 code implementation31 Jan 2025 Tongda Xu, Xiyan Cai, Xinjie Zhang, Xingtong Ge, Dailan He, Ming Sun, Jingjing Liu, Ya-Qin Zhang, Jian Li, Yan Wang

Recent advancements in diffusion models have been leveraged to address inverse problems without additional training, and Diffusion Posterior Sampling (DPS) (Chung et al., 2022a) is among the most popular approaches.

IROAM: Improving Roadside Monocular 3D Object Detection Learning from Autonomous Vehicle Data Domain

no code implementations30 Jan 2025 Zhe Wang, Xiaoliang Huo, Siqi Fan, Jingjing Liu, Ya-Qin Zhang, Yan Wang

In-Domain Query Interaction module utilizes a transformer to learn content and depth information for each domain and outputs object queries.

Autonomous Driving Contrastive Learning +2

In-Context Meta LoRA Generation

no code implementations29 Jan 2025 Yihua Shao, Minxi Yan, Yang Liu, Siyu Chen, Wenjie Chen, Xinwei Long, Ziyang Yan, Lei LI, Chenyu Zhang, Nicu Sebe, Hao Tang, Yan Wang, Hao Zhao, Mengzhu Wang, Jingcai Guo

As a result, our method achieves more accurate LoRA parameter generation for diverse tasks using CVAE.

Meta-Learning

A Holistically Point-guided Text Framework for Weakly-Supervised Camouflaged Object Detection

no code implementations10 Jan 2025 Tsui Qin Mok, Shuyong Gao, Haozhe Xing, Miaoyang He, Yan Wang, Wenqiang Zhang

Weakly-Supervised Camouflaged Object Detection (WSCOD) has gained popularity for its promise to train models with weak labels to segment objects that visually blend into their surroundings.

object-detection Object Detection

BASIC: Semi-supervised Multi-organ Segmentation with Balanced Subclass Regularization and Semantic-conflict Penalty

no code implementations7 Jan 2025 Zhenghao Feng, Lu Wen, Yuanyuan Xu, Binyu Yan, Xi Wu, Jiliu Zhou, Yan Wang

Additionally, based on a mean teacher framework, we elaborately design a balanced subclass regularization to utilize the teacher predictions of SCS task to supervise the student predictions of MoS task, thus effectively transferring unbiased knowledge to the MoS subnetwork and alleviating the influence of the class-imbalance problem.

Multi-Task Learning Organ Segmentation

Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection

no code implementations4 Jan 2025 Yachao Zhao, Bo wang, Yan Wang

Through extensive experiments on state-of-the-art LLMs across multiple social dimensions, we demonstrate that LLMs exhibit a substantial inconsistency between explicit and implicit biases, where explicit biases manifest as mild stereotypes while implicit biases show strong stereotypes.

DreamDrive: Generative 4D Scene Modeling from Street View Images

no code implementations31 Dec 2024 Jiageng Mao, Boyi Li, Boris Ivanovic, Yuxiao Chen, Yan Wang, Yurong You, Chaowei Xiao, Danfei Xu, Marco Pavone, Yue Wang

In this paper, we present DreamDrive, a 4D spatial-temporal scene generation approach that combines the merits of generation and reconstruction, to synthesize generalizable 4D driving scenes and dynamic driving videos with 3D consistency.

Autonomous Driving Neural Rendering +2

P3S-Diffusion:A Selective Subject-driven Generation Framework via Point Supervision

no code implementations27 Dec 2024 Junjie Hu, Shuyong Gao, Lingyi Hong, Qishan Wang, Yuzhou Zhao, Yan Wang, Wenqiang Zhang

Recent research in subject-driven generation increasingly emphasizes the importance of selective subject features.

Image Generation

Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering

no code implementations19 Dec 2024 Peize Li, Qingyi Si, Peng Fu, Zheng Lin, Yan Wang

Retrieval-based multi-image question answering (QA) task involves retrieving multiple question-related images and synthesizing these images to generate an answer.

Contrastive Learning Language Modeling +6

Extrapolated Urban View Synthesis Benchmark

1 code implementation6 Dec 2024 Xiangyu Han, Zhen Jia, Boyi Li, Yan Wang, Boris Ivanovic, Yurong You, Lingjie Liu, Yue Wang, Marco Pavone, Chen Feng, Yiming Li

Photorealistic simulators are essential for the training and evaluation of vision-centric autonomous vehicles (AVs).

Autonomous Vehicles Novel View Synthesis

Progressive Vision-Language Prompt for Multi-Organ Multi-Class Cell Semantic Segmentation with Single Branch

no code implementations4 Dec 2024 Qing Zhang, Hang Guo, Siyuan Yang, Qingli Li, Yan Wang

Given that multiple cell types exist across various organs, with subtle differences in cell size and shape, multi-organ, multi-class cell segmentation is particularly challenging.

Cell Segmentation Segmentation +1

DIVD: Deblurring with Improved Video Diffusion Model

no code implementations1 Dec 2024 Haoyang Long, Yan Wang, Wendong Wang

However, due to the computational complexity and challenges inherent in adapting diffusion models, there is still uncertainty regarding the potential of video diffusion models in video deblurring tasks.

Deblurring model +2

Semantic Data Augmentation for Long-tailed Facial Expression Recognition

no code implementations26 Nov 2024 Zijian Li, Yan Wang, Bowen Guan, JianKai Yin

Then, for facial expression recognition in RAF-DB dataset, we use our augmentation method to balance the long-tailed distribution.

Data Augmentation Facial Expression Recognition

Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data

no code implementations23 Nov 2024 Rui Huang, Henry Zheng, Yan Wang, Zhuofan Xia, Marco Pavone, Gao Huang

However, training 3D models with labels directly derived from pseudo-LiDAR is inadequate due to imprecise boxes estimated from noisy point clouds and severely occluded objects.

Autonomous Driving Monocular 3D Object Detection +1

Global Challenge for Safe and Secure LLMs Track 1

no code implementations21 Nov 2024 Xiaojun Jia, Yihao Huang, Yang Liu, Peng Yan Tan, Weng Kuan Yau, Mun-Thye Mak, Xin Ming Sim, Wee Siong Ng, See Kiong Ng, Hanqing Liu, Lifeng Zhou, Huanqian Yan, Xiaobing Sun, Wei Liu, Long Wang, Yiming Qian, Yong liu, Junxiao Yang, Zhexin Zhang, Leqi Lei, Renmiao Chen, Yida Lu, Shiyao Cui, Zizhou Wang, Shaohua Li, Yan Wang, Rick Siow Mong Goh, Liangli Zhen, Yingjie Zhang, Zhe Zhao

This paper introduces the Global Challenge for Safe and Secure Large Language Models (LLMs), a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO) to foster the development of advanced defense mechanisms against automated jailbreaking attacks.

Misinformation

Explainable Behavior Cloning: Teaching Large Language Model Agents through Learning by Demonstration

no code implementations30 Oct 2024 Yanchu Guan, Dong Wang, Yan Wang, Haiqing Wang, Renen Sun, Chenyi Zhuang, Jinjie Gu, Zhixuan Chu

In this paper, we propose an Explainable Behavior Cloning LLM Agent (EBC-LLMAgent), a novel approach that combines large language models (LLMs) with behavior cloning by learning demonstrations to create intelligent and explainable agents for autonomous mobile app interaction.

Code Generation Language Modeling +3

A Survey on RGB, 3D, and Multimodal Approaches for Unsupervised Industrial Anomaly Detection

1 code implementation29 Oct 2024 Yuxuan Lin, Yang Chang, Xuan Tong, Jiawen Yu, Antonio Liotta, Guofan Huang, Wei Song, Deyu Zeng, Zongze Wu, Yan Wang, Wenqiang Zhang

We focus on 3D UIAD and multimodal UIAD, providing a comprehensive summary of unsupervised industrial anomaly detection in three modal settings.

Anomaly Detection

Enhanced channel estimation for near-field IRS-aided multi-user MIMO system via deep residual network

no code implementations28 Oct 2024 Yan Wang, Yongqiang Li, Minghao Chen, Yu Yao, Feng Shu, Jiangzhou Wang

In this paper, channel estimation (CE) of intelligent reflecting surface aided near-field (NF) multi-user communication is investigated.

Denoising Federated Learning

Label Filling via Mixed Supervision for Medical Image Segmentation from Noisy Annotations

no code implementations21 Oct 2024 Ming Li, Wei Shen, Qingli Li, Yan Wang

The fundamental idea of label filling is to supervise the segmentation model by a subset of pixels with trustworthy labels, meanwhile filling labels of other pixels by mixed supervision.

Image Segmentation Lesion Segmentation +2

Instruction-Driven Game Engine: A Poker Case Study

no code implementations17 Oct 2024 Hongqiu Wu, XingYuan Liu, Yan Wang, Hai Zhao

The IDGE allows users to create games simply by natural language instructions, which significantly lowers the barrier for game development.

Diversity Language Modeling +2

MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes

no code implementations17 Oct 2024 Xinjie Zhang, Zhening Liu, Yifan Zhang, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Zehong Lin, Shuicheng Yan, Jun Zhang

Furthermore, we introduce an entropy-constrained Gaussian deformation technique that uses a deformation field to expand the action range of each Gaussian and integrates an opacity-based entropy loss to limit the number of Gaussians, thus forcing our model to use as few Gaussians as possible to fit a dynamic scene well.

Attribute

CrossQuant: A Post-Training Quantization Method with Smaller Quantization Kernel for Precise Large Language Model Compression

no code implementations10 Oct 2024 Wenyuan Liu, Xindian Ma, Peng Zhang, Yan Wang

Through quantitative analysis of the quantization kernel, we find that these elements are crucial for maintaining the accuracy of quantized LLMs.

Language Modeling Language Modelling +3

FDDM: Frequency-Decomposed Diffusion Model for Rectum Cancer Dose Prediction in Radiotherapy

no code implementations10 Oct 2024 Xin Liao, Zhenghao Feng, Jianghong Xiao, Xingchen Peng, Yan Wang

To be specific, we design a Coarse Dose Prediction Module (CDPM) to first predict a coarse dose map and then utilize discrete wavelet transform to decompose the coarse dose map into a low-frequency subband and three high? frequency subbands.

Prediction

Performance Analysis of Local Partial MMSE Precoding Based User-Centric Cell-Free Massive MIMO Systems and Deployment Optimization

no code implementations8 Oct 2024 Peng Jiang, Jiafei Fu, Pengcheng Zhu, Yan Wang, Jiangzhou Wang, Xiaohu You

Cell-free massive multiple-input multiple-output (MIMO) systems, leveraging tight cooperation among wireless access points, exhibit remarkable signal enhancement and interference suppression capabilities, demonstrating significant performance advantages over traditional cellular networks.

Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models

1 code implementation1 Oct 2024 Shitian Zhao, Renrui Zhang, Xu Luo, Yan Wang, Shanghang Zhang, Peng Gao

In this framework, people can propose new basic composition methods and combine them to get the new mixed composition methods.

Question Answering Visual Question Answering

PlainUSR: Chasing Faster ConvNet for Efficient Super-Resolution

1 code implementation20 Sep 2024 Yan Wang, Yusen Li, Gang Wang, Xiaoguang Liu

In particular, compared to recent NGswin, the PlainUSR-L is 16. 4x faster with competitive performance.

Super-Resolution

RenderWorld: World Model with Self-Supervised 3D Label

no code implementations17 Sep 2024 Ziyang Yan, Wenzhen Dong, Yihua Shao, Yuhang Lu, Liu Haiyang, Jingwen Liu, Haozhe Wang, Zhe Wang, Yan Wang, Fabio Remondino, Yuexin Ma

End-to-end autonomous driving with vision-only is not only more cost-effective compared to LiDAR-vision fusion but also more reliable than traditional methods.

Autonomous Driving model +2

KVPruner: Structural Pruning for Faster and Memory-Efficient Large Language Models

no code implementations17 Sep 2024 Bo Lv, Quan Zhou, Xuanang Ding, Yan Wang, Zeming Ma

The bottleneck associated with the key-value(KV) cache presents a significant challenge during the inference processes of large language models.

Block-Attention for Efficient RAG

1 code implementation14 Sep 2024 East Sun, Yan Wang, Lan Tian

In RAG scenarios, by defining each passage as a block, Block-Attention enables us to reuse the KV states of passages that have been seen before, thereby significantly reducing the latency and the computation overhead during inference.

RAG

ProteinBench: A Holistic Evaluation of Protein Foundation Models

no code implementations10 Sep 2024 Fei Ye, Zaixiang Zheng, Dongyu Xue, Yuning Shen, Lihao Wang, Yiming Ma, Yan Wang, Xinyou Wang, Xiangxin Zhou, Quanquan Gu

Recent years have witnessed a surge in the development of protein foundation models, significantly improving performance in protein prediction and generative tasks ranging from 3D structure prediction and protein design to conformational dynamics.

Protein Design

RevSAM2: Prompt SAM2 for Medical Image Segmentation via Reverse-Propagation without Fine-tuning

1 code implementation6 Sep 2024 Yunhao Bai, Boxiang Yun, Zeli Chen, Qinji Yu, Yingda Xia, Yan Wang

Specifically, to segment a 3D query volume using a limited number of support image-label pairs that define a new segmentation task, we propose reverse propagation strategy as a query information selection mechanism.

Image Segmentation Medical Image Segmentation +2

LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models

1 code implementation31 Aug 2024 Zhiyuan Hu, Yuliang Liu, Jinman Zhao, Suyuchen Wang, Yan Wang, Wei Shen, Qing Gu, Anh Tuan Luu, See-Kiong Ng, Zhiwei Jiang, Bryan Hooi

Large language models (LLMs) face significant challenges in handling long-context tasks because of their limited effective context window size during pretraining, which restricts their ability to generalize over extended sequences.

8k

BTMuda: A Bi-level Multi-source unsupervised domain adaptation framework for breast cancer diagnosis

no code implementations30 Aug 2024 Yuxiang Yang, Xinyi Zeng, Pinxian Zeng, Binyu Yan, Xi Wu, Jiliu Zhou, Yan Wang

To address these limitations, unsupervised domain adaptation (UDA) methods have been used to transfer knowledge from one labeled source domain to the unlabeled target domain, yet these approaches suffer from severe domain shift issues and often ignore the potential benefits of leveraging multiple relevant sources in practical applications.

Multi-Source Unsupervised Domain Adaptation Unsupervised Domain Adaptation

CogVLM2: Visual Language Models for Image and Video Understanding

3 code implementations29 Aug 2024 Wenyi Hong, Weihan Wang, Ming Ding, Wenmeng Yu, Qingsong Lv, Yan Wang, Yean Cheng, Shiyu Huang, Junhui Ji, Zhao Xue, Lei Zhao, Zhuoyi Yang, Xiaotao Gu, Xiaohan Zhang, Guanyu Feng, Da Yin, Zihan Wang, Ji Qi, Xixuan Song, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Yuxiao Dong, Jie Tang

Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications.

MM-Vet MVBench +3

A Survey on Facial Expression Recognition of Static and Dynamic Emotions

1 code implementation28 Aug 2024 Yan Wang, Shaoqi Yan, Yang Liu, Wei Song, Jing Liu, Yang Chang, Xinji Mai, Xiping Hu, Wenqiang Zhang, Zhongxue Gan

Facial expression recognition (FER) aims to analyze emotional states from static images and dynamic sequences, which is pivotal in enhancing anthropomorphic communication among humans, robots, and digital avatars by leveraging AI technologies.

cross-modal alignment Facial Expression Recognition +1

ARANet: Attention-based Residual Adversarial Network with Deep Supervision for Radiotherapy Dose Prediction of Cervical Cancer

no code implementations26 Aug 2024 Lu Wen, Wenxia Yin, Zhenghao Feng, Xi Wu, Deng Xiong, Yan Wang

Radiation therapy is the mainstay treatment for cervical cancer, and its ultimate goal is to ensure the planning target volume (PTV) reaches the prescribed dose while reducing dose deposition of organs-at-risk (OARs) as much as possible.

Alleviating Class Imbalance in Semi-supervised Multi-organ Segmentation via Balanced Subclass Regularization

no code implementations26 Aug 2024 Zhenghao Feng, Lu Wen, Binyu Yan, Jiaqi Cui, Yan Wang

To alleviate this issue, we present a two-phase semi-supervised network (BSR-Net) with balanced subclass regularization for MoS.

Organ Segmentation

Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype

no code implementations19 Aug 2024 Yadong Lu, Shitian Zhao, Boxiang Yun, Dongsheng Jiang, Yin Li, Qingli Li, Yan Wang

Despite recent progress in enhancing the efficacy of Open-Domain Continual Learning (ODCL) in Vision-Language Models (VLM), failing to (1) correctly identify the Task-ID of a test image and (2) use only the category set corresponding to the Task-ID, while preserving the knowledge related to each domain, cannot address the two primary challenges of ODCL: forgetting old knowledge and maintaining zero-shot capabilities, as well as the confusions caused by category-relatedness between domains.

Continual Learning

Open Role-Playing with Delta-Engines

1 code implementation11 Aug 2024 Hongqiu Wu, Zekai Xu, Tianyang Xu, Shize Wei, Yan Wang, Jiale Hong, Weiqi Wu, Hai Zhao

In this paper, we propose a new style of game-play to bridge self-expression and role-playing: \emph{open role-playing games (ORPGs)}, where players are allowed to craft and embody their unique characters in the game world.

S3PET: Semi-supervised Standard-dose PET Image Reconstruction via Dose-aware Token Swap

no code implementations30 Jul 2024 Jiaqi Cui, Pinxian Zeng, Yuanyuan Xu, Xi Wu, Jiliu Zhou, Yan Wang

Our S3PET involves an un-supervised pre-training stage (Stage I) to extract representations from unpaired images, and a supervised dose-aware reconstruction stage (Stage II) to achieve LPET-to-SPET reconstruction by transferring the dose-specific knowledge between paired images.

Image Reconstruction

LoFormer: Local Frequency Transformer for Image Deblurring

2 code implementations24 Jul 2024 Xintian Mao, Jiansheng Wang, Xingran Xie, Qingli Li, Yan Wang

Due to the computational complexity of self-attention (SA), prevalent techniques for image deblurring often resort to either adopting localized SA or employing coarse-grained global SA methods, both of which exhibit drawbacks such as compromising global modeling or lacking fine-grained correlation.

Deblurring Image Deblurring

VILA$^2$: VILA Augmented VILA

no code implementations24 Jul 2024 Yunhao Fang, Ligeng Zhu, Yao Lu, Yan Wang, Pavlo Molchanov, Jan Kautz, Jang Hyun Cho, Marco Pavone, Song Han, Hongxu Yin

In the self-augment step, the instruction-finetuned VLM recaptions its pretraining caption datasets and then retrains from scratch leveraging refined data.

Hallucination Optical Character Recognition (OCR) +1

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction

1 code implementation23 Jul 2024 Haoran Wang, Xinji Mai, Zeng Tao, Yan Wang, Jiawen Yu, Ziheng Zhou, Xuan Tong, Shaoqi Yan, Qing Zhao, Shuyong Gao, Wenqiang Zhang

We propose a novel Emotion Forecasting (EF) task grounded in the theory that an individuals emotions are easily influenced by the emotions or other information conveyed during interactions with another person.

Benchmarking

All rivers run into the sea: Unified Modality Brain-like Emotional Central Mechanism

no code implementations22 Jul 2024 Xinji Mai, Junxiong Lin, Haoran Wang, Zeng Tao, Yan Wang, Shaoqi Yan, Xuan Tong, Jiawen Yu, Boyang Wang, Ziheng Zhou, Qing Zhao, Shuyong Gao, Wenqiang Zhang

In the field of affective computing, fully leveraging information from a variety of sensory modalities is essential for the comprehensive understanding and processing of human emotions.

All Dynamic Facial Expression Recognition +2

LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction

1 code implementation16 Jul 2024 Penghui Du, Yu Wang, Yifan Sun, Luting Wang, Yue Liao, Gang Zhang, Errui Ding, Yan Wang, Jingdong Wang, Si Liu

Existing methods enhance open-vocabulary object detection by leveraging the robust open-vocabulary recognition capabilities of Vision-Language Models (VLMs), such as CLIP. However, two main challenges emerge:(1) A deficiency in concept representation, where the category names in CLIP's text space lack textual and visual knowledge.

Language Modeling Language Modelling +3

The Oscars of AI Theater: A Survey on Role-Playing with Language Models

1 code implementation16 Jul 2024 Nuo Chen, Yan Wang, Yang Deng, Jia Li

This survey explores the burgeoning field of role-playing with language models, focusing on their development from early persona-based models to advanced character-driven simulations facilitated by Large Language Models (LLMs).

Survey

Power Optimization and Deep Learning for Channel Estimation of Active IRS-Aided IoT

no code implementations12 Jul 2024 Yan Wang, Feng Shu, Rongen Dong, Wei Gao, Qi Zhang, Jiajia Liu

In the second case, when the transmit power at the IoT devices is fixed, there exists an optimal reflective power at active IRS.

Spatially-Variant Degradation Model for Dataset-free Super-resolution

no code implementations11 Jul 2024 Shaojie Guo, Haofei Song, Qingli Li, Yan Wang

Unlike existing dataset-free BISR methods that focus on obtaining a degradation kernel for the entire image, we are the first to explicitly design a spatially-variant degradation model for each pixel.

Image Super-Resolution

VideoCoT: A Video Chain-of-Thought Dataset with Active Annotation Tool

no code implementations7 Jul 2024 Yan Wang, Yawen Zeng, Jingsheng Zheng, Xiaofen Xing, Jin Xu, Xiangmin Xu

Therefore, we try to explore the collection of CoT datasets in videos to lead to video OpenQA and improve the reasoning ability of MLLMs.

Active Learning Hallucination +1

xLSTM-UNet can be an Effective 2D & 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba Counterpart

1 code implementation1 Jul 2024 Tianrun Chen, Chaotao Ding, Lanyun Zhu, Tao Xu, Deyi Ji, Yan Wang, Ying Zang, Zejian Li

With comprehensive experiments performed, this technical report highlights the potential of xLSTM-based architectures in advancing biomedical image analysis in both 2D and 3D.

3D Medical Imaging Segmentation Image Classification +5

Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning

1 code implementation29 Jun 2024 Gautham Vasan, Yan Wang, Fahim Shahriar, James Bergstra, Martin Jagersand, A. Rupam Mahmood

Many real-world robot learning problems, such as pick-and-place or arriving at a destination, can be seen as a problem of reaching a goal state as soon as possible.

Informativeness reinforcement-learning +1

Prompting Whole Slide Image Based Genetic Biomarker Prediction

no code implementations26 Jun 2024 Ling Zhang, Boxiang Yun, Xingran Xie, Qingli Li, Xinxing Li, Yan Wang

Experimental results on two colorectal cancer datasets show the superiority of our method, achieving 91. 49% in AUC for MSI classification.

Decision Making Prediction

D2SP: Dynamic Dual-Stage Purification Framework for Dual Noise Mitigation in Vision-based Affective Recognition

no code implementations24 Jun 2024 Haoran Wang, Xinji Mai, Zeng Tao, Xuan Tong, Junxiong Lin, Yan Wang, Jiawen Yu, Boyang Wang, Shaoqi Yan, Qing Zhao, Ziheng Zhou, Shuyong Gao, Wenqiang Zhang

The contemporary state-of-the-art of Dynamic Facial Expression Recognition (DFER) technology facilitates remarkable progress by deriving emotional mappings of facial expressions from video content, underpinned by training on voluminous datasets.

Dynamic Facial Expression Recognition Facial Expression Recognition

Suppressing Uncertainties in Degradation Estimation for Blind Super-Resolution

no code implementations24 Jun 2024 Junxiong Lin, Zeng Tao, Xuan Tong, Xinji Mai, Haoran Wang, Boyang Wang, Yan Wang, Qing Zhao, Jiawen Yu, Yuxuan Lin, Shaoqi Yan, Shuyong Gao, Wenqiang Zhang

To extract Uncertainty-based Degradation Representation from LR images, the AUDE utilizes the Self-supervised Uncertainty Contrast module with Uncertainty Suppression Loss to suppress the inherent model uncertainty of the Degradation Extractor.

Blind Super-Resolution Image Super-Resolution +1

On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

no code implementations24 Jun 2024 Deng Cai, Huayang Li, Tingchen Fu, Siheng Li, Weiwen Xu, Shuaiyi Li, Bowen Cao, Zhisong Zhang, Xinting Huang, Leyang Cui, Yan Wang, Lemao Liu, Taro Watanabe, Shuming Shi

Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications.

Asynchronous Large Language Model Enhanced Planner for Autonomous Driving

1 code implementation20 Jun 2024 Yuan Chen, Zi-han Ding, Ziqin Wang, Yan Wang, Lijun Zhang, Si Liu

Despite real-time planners exhibiting remarkable performance in autonomous driving, the growing exploration of Large Language Models (LLMs) has opened avenues for enhancing the interpretability and controllability of motion planning.

Autonomous Driving Language Modeling +3

MCAD: Multi-modal Conditioned Adversarial Diffusion Model for High-Quality PET Image Reconstruction

no code implementations19 Jun 2024 Jiaqi Cui, Xinyi Zeng, Pinxian Zeng, Bo Liu, Xi Wu, Jiliu Zhou, Yan Wang

To expedite the diffusion process, we further introduce an adversarial diffusive network with a reduced number of diffusion steps.

Diagnostic Image Reconstruction

Joint Power Allocation and Beamforming Design for Active IRS-Aided Directional Modulation Secure Systems

no code implementations13 Jun 2024 Yifan Zhao, Xiaoyu Wang, Kaibo Zhou, Xuehui Wang, Yan Wang, Wei Gao, Ruiqi Liu, Feng Shu

To meet the requirements of the network performance, a power allocation (PA) strategy is proposed and adopted in the system.

AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models

no code implementations13 Jun 2024 Yuhang Wu, Wenmeng Yu, Yean Cheng, Yan Wang, Xiaohan Zhang, Jiazheng Xu, Ming Ding, Yuxiao Dong

Evaluating the alignment capabilities of large Vision-Language Models (VLMs) is essential for determining their effectiveness as helpful assistants.

Multiple-choice

AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring

4 code implementations CVPR 2024 Xintian Mao, Qingli Li, Yan Wang

Despite the recent progress in enhancing the efficacy of image deblurring, the limited decoding capability constrains the upper limit of State-Of-The-Art (SOTA) methods.

Deblurring Decoder +1

SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding

no code implementations13 Jun 2024 Jiefeng Ma, Yan Wang, Chenyu Liu, Jun Du, Yu Hu, Zhenrong Zhang, Pengfei Hu, Qing Wang, Jianshu Zhang

Accurately identifying and organizing textual content is crucial for the automation of document processing in the field of form understanding.

Form Relation Prediction

OUS: Scene-Guided Dynamic Facial Expression Recognition

no code implementations29 May 2024 Xinji Mai, Haoran Wang, Zeng Tao, Junxiong Lin, Shaoqi Yan, Yan Wang, Jing Liu, Jiawen Yu, Xuan Tong, YaTing Li, Wenqiang Zhang

By analyzing the Rigid Cognitive Problem, OUS successfully understands the complex relationship between scene context and emotional expression, closely aligning with human emotional understanding in real-world scenarios.

Dynamic Facial Expression Recognition Facial Expression Recognition

A New Method in Facial Registration in Clinics Based on Structure Light Images

no code implementations23 May 2024 Pengfei Li, Ziyue Ma, Hong Wang, Juan Deng, Yan Wang, Zhenyu Xu, Feng Yan, Wenjun Tu, Hong Sha

To abundant traditional image methods with depth information, a method in registering with depth images and traditional clinical images was investigated.

Face Recognition

Multi-Stream Keypoint Attention Network for Sign Language Recognition and Translation

1 code implementation9 May 2024 Mo Guan, Yan Wang, Guangkun Ma, Jiarui Liu, Mingzu Sun

The resulting framework is denoted as MSKA-SLR, which is expanded into a sign language translation (SLT) model through the straightforward addition of an extra translation network.

Sign Language Recognition Sign Language Translation +1

A Causal Explainable Guardrails for Large Language Models

no code implementations7 May 2024 Zhixuan Chu, Yan Wang, Longfei Li, Zhibo Wang, Zhan Qin, Kui Ren

Large Language Models (LLMs) have shown impressive performance in natural language tasks, but their outputs can exhibit undesirable attributes or biases.

Language-Image Models with 3D Understanding

no code implementations6 May 2024 Jang Hyun Cho, Boris Ivanovic, Yulong Cao, Edward Schmerling, Yue Wang, Xinshuo Weng, Boyi Li, Yurong You, Philipp Krähenbühl, Yan Wang, Marco Pavone

Our experiments on outdoor benchmarks demonstrate that Cube-LLM significantly outperforms existing baselines by 21. 3 points of AP-BEV on the Talk2Car dataset for 3D grounded reasoning and 17. 7 points on the DriveLM dataset for complex reasoning about driving scenarios, respectively.

Question Answering Visual Question Answering

I$^3$Net: Inter-Intra-slice Interpolation Network for Medical Slice Synthesis

no code implementations5 May 2024 Haofei Song, Xintian Mao, Jing Yu, Qingli Li, Yan Wang

Based on this observation, we propose an Inter-Intra-slice Interpolation Network (I$^3$Net), which fully explores information from high in-plane resolution and compensates for low through-plane resolution.

Image Reconstruction Super-Resolution +1

Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities

no code implementations CVPR 2024 Mingcheng Li, Dingkang Yang, Xiao Zhao, Shuaibing Wang, Yan Wang, Kun Yang, Mingyang Sun, Dongliang Kou, Ziyun Qian, Lihua Zhang

Specifically, we present a sample-level contrastive distillation mechanism that transfers comprehensive knowledge containing cross-sample correlations to reconstruct missing semantics.

Disentanglement Knowledge Distillation +1

Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning

no code implementations CVPR 2024 Wenjin Hou, Shiming Chen, Shuhuang Chen, Ziming Hong, Yan Wang, Xuetao Feng, Salman Khan, Fahad Shahbaz Khan, Xinge You

Generative Zero-shot learning (ZSL) learns a generator to synthesize visual samples for unseen classes, which is an effective way to advance ZSL.

Zero-Shot Learning

Adaptive Prompt Learning with Negative Textual Semantics and Uncertainty Modeling for Universal Multi-Source Domain Adaptation

no code implementations23 Apr 2024 Yuxiang Yang, Lu Wen, Yuanyuan Xu, Jiliu Zhou, Yan Wang

Universal Multi-source Domain Adaptation (UniMDA) transfers knowledge from multiple labeled source domains to an unlabeled target domain under domain shifts (different data distribution) and class shifts (unknown target classes).

Domain Adaptation

AccidentBlip: Agent of Accident Warning based on MA-former

no code implementations18 Apr 2024 Yihua Shao, Yeling Xu, Xinwei Long, Siyu Chen, Ziyang Yan, Yang Yang, Haoting Liu, Yan Wang, Hao Tang, Zhen Lei

In particular, AccidentBlip achieves SOTA performance in both accident detection and prediction tasks on the DeepAccident dataset.

Language Modelling Large Language Model +2

Causal Deconfounding via Confounder Disentanglement for Dual-Target Cross-Domain Recommendation

no code implementations17 Apr 2024 JiaJie Zhu, Yan Wang, Feng Zhu, Zhu Sun

As a result, dual-target CDR has to meet two challenges: (1) how to effectively decouple observed confounders, including single-domain confounders and cross-domain confounders, and (2) how to preserve the positive effects of observed confounders on predicted interactions, while eliminating their negative effects on capturing comprehensive user preferences.

Disentanglement

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

3 code implementations16 Apr 2024 Bin Ren, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang, Wei Zhai, Renjing Pei, Jiaming Guo, Songcen Xu, Yang Cao, ZhengJun Zha, Yan Wang, Yi Liu, Qing Wang, Gang Zhang, Liou Zhang, Shijie Zhao, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Xin Liu, Min Yan, Menghan Zhou, Yiqiang Yan, Yixuan Liu, Wensong Chan, Dehua Tang, Dong Zhou, Li Wang, Lu Tian, Barsoum Emad, Bohan Jia, Junbo Qiao, Yunshuai Zhou, Yun Zhang, Wei Li, Shaohui Lin, Shenglong Zhou, Binbin Chen, Jincheng Liao, Suiyi Zhao, Zhao Zhang, Bo wang, Yan Luo, Yanyan Wei, Feng Li, Mingshen Wang, Yawei Li, Jinhan Guan, Dehua Hu, Jiawei Yu, Qisheng Xu, Tao Sun, Long Lan, Kele Xu, Xin Lin, Jingtong Yue, Lehan Yang, Shiyi Du, Lu Qi, Chao Ren, Zeyu Han, YuHan Wang, Chaolin Chen, Haobo Li, Mingjun Zheng, Zhongbao Yang, Lianhong Song, Xingzhuo Yan, Minghan Fu, Jingyi Zhang, Baiang Li, Qi Zhu, Xiaogang Xu, Dan Guo, Chunle Guo, Jiadi Chen, Huanhuan Long, Chunjiang Duanmu, Xiaoyan Lei, Jie Liu, Weilin Jia, Weifeng Cao, Wenlong Zhang, Yanyu Mao, Ruilong Guo, Nihao Zhang, Qian Wang, Manoj Pandey, Maksym Chernozhukov, Giang Le, Shuli Cheng, Hongyuan Wang, Ziyan Wei, Qingting Tang, Liejun Wang, Yongming Li, Yanhui Guo, Hao Xu, Akram Khatami-Rizi, Ahmad Mahmoudi-Aznaveh, Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou, Amogh Joshi, Nikhil Akalwadi, Sampada Malagi, Palani Yashaswini, Chaitra Desai, Ramesh Ashok Tabib, Ujwala Patil, Uma Mudenagudi

In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking.

Image Super-Resolution

AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptides

1 code implementation15 Apr 2024 Kewei Li, Yuqian Wu, Yinheng Li, Yutong Guo, Yan Wang, Yiyang Liang, Yusi Fan, Lan Huang, Ruochi Zhang, Fengfeng Zhou

This study introduces a quantitative definition and benchmarking framework AMPCliff for the AC phenomenon in antimicrobial peptides (AMPs) composed by canonical amino acids.

Benchmarking Protein Language Model

Task-Aware Encoder Control for Deep Video Compression

no code implementations CVPR 2024 Xingtong Ge, Jixiang Luo, Xinjie Zhang, Tongda Xu, Guo Lu, Dailan He, Jing Geng, Yan Wang, Jun Zhang, Hongwei Qin

Prior research on deep video compression (DVC) for machine tasks typically necessitates training a unique codec for each specific task, mandating a dedicated decoder per task.

Decoder Video Compression

Two-Phase Multi-Dose-Level PET Image Reconstruction with Dose Level Awareness

no code implementations2 Apr 2024 Yuchen Fei, Yanmei Luo, Yan Wang, Jiaqi Cui, Yuanyuan Xu, Jiliu Zhou, Dinggang Shen

In this paper, to reconstruct high-quality SPET images from multi-dose-level LPET images, we design a novel two-phase multi-dose-level PET reconstruction algorithm with dose level awareness, containing a pre-training phase and a SPET prediction phase.

Image Reconstruction Prediction

Instruction-Driven Game Engines on Large Language Models

1 code implementation30 Mar 2024 Hongqiu Wu, Yan Wang, XingYuan Liu, Hai Zhao, Min Zhang

The Instruction-Driven Game Engine (IDGE) project aims to democratize game development by enabling a large language model (LLM) to follow free-form game rules and autonomously generate game-play processes.

Language Modelling Large Language Model

Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding

1 code implementation CVPR 2024 Zhiheng Cheng, Qingyue Wei, Hongru Zhu, Yan Wang, Liangqiong Qu, Wei Shao, Yuyin Zhou

This paper introduces H-SAM: a prompt-free adaptation of SAM tailored for efficient fine-tuning of medical images via a two-stage hierarchical decoding procedure.

Decoder Image Segmentation +4

Multi-stream Transmission for Directional Modulation Network via Distributed Multi-UAV-aided Multi-active-IRS

no code implementations26 Mar 2024 Ke Yang, Rongen Dong, Wei Gao, Feng Shu, Weiping Shi, Yan Wang, Xuehui Wang, Jiangzhou Wang

In this paper, single large-scale IRS is divided to multiple small IRSs and a novel multi-IRS-aided multi-stream DM network is proposed to achieve a point-to-point multi-stream transmission by creating $K$ ($\geq3$) DoFs, where multiple small IRSs are placed distributively via multiple unmanned aerial vehicles (UAVs).

EndoGSLAM: Real-Time Dense Reconstruction and Tracking in Endoscopic Surgeries using Gaussian Splatting

no code implementations22 Mar 2024 Kailing Wang, Chen Yang, Yuehao Wang, Sikuang Li, Yan Wang, Qi Dou, Xiaokang Yang, Wei Shen

Precise camera tracking, high-fidelity 3D tissue reconstruction, and real-time online visualization are critical for intrabody medical imaging devices such as endoscopes and capsule robots.

Simultaneous Localization and Mapping

Protein Conformation Generation via Force-Guided SE(3) Diffusion Models

1 code implementation21 Mar 2024 Yan Wang, Lihao Wang, Yuning Shen, Yiqun Wang, Huizhuo Yuan, Yue Wu, Quanquan Gu

The conformational landscape of proteins is crucial to understanding their functionality in complex biological processes.

Diversity

Super-High-Fidelity Image Compression via Hierarchical-ROI and Adaptive Quantization

no code implementations19 Mar 2024 Jixiang Luo, Yan Wang, Hongwei Qin

MSE-based models aim to improve objective metrics while generative models are leveraged to improve visual quality measured by subjective metrics.

Image Compression Quantization

GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting

1 code implementation13 Mar 2024 Xinjie Zhang, Xingtong Ge, Tongda Xu, Dailan He, Yan Wang, Hongwei Qin, Guo Lu, Jing Geng, Jun Zhang

In response, we propose a groundbreaking paradigm of image representation and compression by 2D Gaussian Splatting, named GaussianImage.

Quantization

CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression

1 code implementation13 Mar 2024 Xinjie Zhang, Shenyuan Gao, Zhening Liu, Jiawei Shao, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Jun Zhang

Existing learning-based stereo image codec adopt sophisticated transformation with simple entropy models derived from single image codecs to encode latent representations.

Decoder Image Compression

Few-shot Learning on Heterogeneous Graphs: Challenges, Progress, and Prospects

no code implementations10 Mar 2024 Pengfei Ding, Yan Wang, Guanfeng Liu

In this paper, we provide a comprehensive review of existing FLHG methods, covering challenges, research progress, and future prospects.

Few-Shot Learning

A$^{3}$lign-DFER: Pioneering Comprehensive Dynamic Affective Alignment for Dynamic Facial Expression Recognition with CLIP

no code implementations7 Mar 2024 Zeng Tao, Yan Wang, Junxiong Lin, Haoran Wang, Xinji Mai, Jiawen Yu, Xuan Tong, Ziheng Zhou, Shaoqi Yan, Qing Zhao, Liyuan Han, Wenqiang Zhang

Specifically, our A$^{3}$lign-DFER method is designed with multiple modules that work together to obtain the most suitable expanded-dimensional embeddings for classification and to achieve alignment in three key aspects: affective, dynamic, and bidirectional.

Dynamic Facial Expression Recognition Facial Expression Recognition

Dcl-Net: Dual Contrastive Learning Network for Semi-Supervised Multi-Organ Segmentation

no code implementations6 Mar 2024 Lu Wen, Zhenghao Feng, Yun Hou, Peng Wang, Xi Wu, Jiliu Zhou, Yan Wang

Semi-supervised learning is a sound measure to relieve the strict demand of abundant annotated datasets, especially for challenging multi-organ segmentation .

Contrastive Learning Organ Segmentation

CAMixerSR: Only Details Need More "Attention"

1 code implementation CVPR 2024 Yan Wang, Yi Liu, Shijie Zhao, Junlin Li, Li Zhang

To satisfy the rapidly increasing demands on the large image (2K-8K) super-resolution (SR), prevailing methods follow two independent tracks: 1) accelerate existing networks by content-aware routing, and 2) design better super-resolution networks via token mixer refining.

2k 8k +1

Boosting Neural Representations for Videos with a Conditional Decoder

1 code implementation CVPR 2024 Xinjie Zhang, Ren Yang, Dailan He, Xingtong Ge, Tongda Xu, Yan Wang, Hongwei Qin, Jun Zhang

Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing, showing remarkable versatility across various video tasks.

Decoder

EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

2 code implementations23 Feb 2024 Zhe Wang, Siqi Fan, Xiaoliang Huo, Tongda Xu, Yan Wang, Jingjing Liu, Yilun Chen, Ya-Qin Zhang

In autonomous driving, cooperative perception makes use of multi-view cameras from both vehicles and infrastructure, providing a global vantage point with rich semantic context of road conditions beyond a single vehicle viewpoint.

3D Object Detection Autonomous Driving +2

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis

1 code implementation20 Feb 2024 Nailei Hei, Qianyu Guo, ZiHao Wang, Yan Wang, Haofen Wang, Wenqiang Zhang

To bridge the distribution gap between user input behavior and model training datasets, we first construct a novel Coarse-Fine Granularity Prompts dataset (CFP) and propose a novel User-Friendly Fine-Grained Text Generation framework (UF-FGTG) for automated prompt optimization.

Image Generation Prompt Engineering +1

Consistency Model is an Effective Posterior Sample Approximation for Diffusion Inverse Solvers

no code implementations9 Feb 2024 Tongda Xu, Ziran Zhu, Jian Li, Dailan He, Yuanyuan Wang, Ming Sun, Ling Li, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang

Diffusion Inverse Solvers (DIS) are designed to sample from the conditional distribution $p_{\theta}(X_0|y)$, with a predefined diffusion model $p_{\theta}(X_0)$, an operator $f(\cdot)$, and a measurement $y=f(x'_0)$ derived from an unknown image $x'_0$.

Image Captioning Semantic Segmentation

A Lightweight Inception Boosted U-Net Neural Network for Routability Prediction

1 code implementation7 Feb 2024 Hailiang Li, Yan Huo, Yan Wang, Xu Yang, Miaohui Hao, Xiao Wang

As the modern CPU, GPU, and NPU chip design complexity and transistor counts keep increasing, and with the relentless shrinking of semiconductor technology nodes to nearly 1 nanometer, the placement and routing have gradually become the two most pivotal processes in modern very-large-scale-integrated (VLSI) circuit back-end design.

Avg SSIM

Adaptive Hypergraph Network for Trust Prediction

1 code implementation7 Feb 2024 Rongwei Xu, Guanfeng Liu, Yan Wang, Xuyun Zhang, Kai Zheng, Xiaofang Zhou

In this paper, we propose an Adaptive Hypergraph Network for Trust Prediction (AHNTP), a novel approach that improves trust prediction accuracy by using higher-order correlations.

Contrastive Learning Decision Making +1

Triplet-constraint Transformer with Multi-scale Refinement for Dose Prediction in Radiotherapy

no code implementations7 Feb 2024 Lu Wen, Qihun Zhang, Zhenghao Feng, Yuanyuan Xu, Xiao Chen, Jiliu Zhou, Yan Wang

Radiotherapy is a primary treatment for cancers with the aim of applying sufficient radiation dose to the planning target volume (PTV) while minimizing dose hazards to the organs at risk (OARs).

Triplet

Professional Agents -- Evolving Large Language Models into Autonomous Experts with Human-Level Competencies

no code implementations6 Feb 2024 Zhixuan Chu, Yan Wang, Feng Zhu, Lu Yu, Longfei Li, Jinjie Gu

The advent of large language models (LLMs) such as ChatGPT, PaLM, and GPT-4 has catalyzed remarkable advances in natural language processing, demonstrating human-like language fluency and reasoning capacities.

Position

DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models

no code implementations5 Feb 2024 Yang Sui, Huy Phan, Jinqi Xiao, Tianfang Zhang, Zijie Tang, Cong Shi, Yan Wang, Yingying Chen, Bo Yuan

In this paper, for the first time, we systematically explore the detectability of the poisoned noise input for the backdoored diffusion models, an important performance metric yet little explored in the existing works.

Backdoor Attack

Image2Points:A 3D Point-based Context Clusters GAN for High-Quality PET Image Reconstruction

1 code implementation1 Feb 2024 Jiaqi Cui, Yan Wang, Lu Wen, Pinxian Zeng, Xi Wu, Jiliu Zhou, Dinggang Shen

To obtain high-quality Positron emission tomography (PET) images while minimizing radiation exposure, numerous methods have been proposed to reconstruct standard-dose PET (SPET) images from the corresponding low-dose PET (LPET) images.

Image Reconstruction

GMC-IQA: Exploiting Global-correlation and Mean-opinion Consistency for No-reference Image Quality Assessment

no code implementations19 Jan 2024 Zewen Chen, Juan Wang, Bing Li, Chunfeng Yuan, Weiming Hu, Junxian Liu, Peng Li, Yan Wang, Youqun Zhang, Congxuan Zhang

Due to the subjective nature of image quality assessment (IQA), assessing which image has better quality among a sequence of images is more reliable than assigning an absolute mean opinion score for an image.

No-Reference Image Quality Assessment

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

no code implementations17 Jan 2024 Baoxiong Jia, Yixin Chen, Huangyue Yu, Yan Wang, Xuesong Niu, Tengyu Liu, Qing Li, Siyuan Huang

In comparison to recent advancements in the 2D domain, grounding language in 3D scenes faces several significant challenges: (i) the inherent complexity of 3D scenes due to the diverse object configurations, their rich attributes, and intricate relationships; (ii) the scarcity of paired 3D vision-language data to support grounded learning; and (iii) the absence of a unified learning framework to distill knowledge from grounded 3D data.

3D visual grounding Scene Understanding

Idempotence and Perceptual Image Compression

1 code implementation17 Jan 2024 Tongda Xu, Ziran Zhu, Dailan He, Yanghao Li, Lina Guo, Yuanyuan Wang, Zhe Wang, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang

However, we find that theoretically: 1) Conditional generative model-based perceptual codec satisfies idempotence; 2) Unconditional generative model with idempotence constraint is equivalent to conditional generative codec.

Image Compression

LLM-Guided Multi-View Hypergraph Learning for Human-Centric Explainable Recommendation

no code implementations16 Jan 2024 Zhixuan Chu, Yan Wang, Qing Cui, Longfei Li, Wenqing Chen, Zhan Qin, Kui Ren

As personalized recommendation systems become vital in the age of information overload, traditional methods relying solely on historical user interactions often fail to fully capture the multifaceted nature of human interests.

Explainable Recommendation Recommendation Systems

A Deep Learning Representation of Spatial Interaction Model for Resilient Spatial Planning of Community Business Clusters

no code implementations9 Jan 2024 Haiyan Hao, Yan Wang

To address the limitation, we propose a SIM-GAT model to predict spatiotemporal visitation flows between community business clusters and their trade areas.

Graph Attention

Energy based diffusion generator for efficient sampling of Boltzmann distributions

no code implementations4 Jan 2024 Yan Wang, Ling Guo, Hao Wu, Tao Zhou

Sampling from Boltzmann distributions, particularly those tied to high-dimensional and complex energy functions, poses a significant challenge in many fields.

Decoder

DeIL: Direct-and-Inverse CLIP for Open-World Few-Shot Learning

1 code implementation CVPR 2024 Shuai Shao, Yu Bai, Yan Wang, BaoDi Liu, Yicong Zhou

Open-World Few-Shot Learning (OFSL) is a critical field of research concentrating on the precise identification of target samples in environments with scarce data and unreliable labels thus possessing substantial practical significance.

Denoising Few-Shot Learning

PARA-Drive: Parallelized Architecture for Real-time Autonomous Driving

no code implementations CVPR 2024 Xinshuo Weng, Boris Ivanovic, Yan Wang, Yue Wang, Marco Pavone

Recent works have proposed end-to-end autonomous vehicle (AV) architectures comprised of differentiable modules achieving state-of-the-art driving performance.

Autonomous Driving

RepAn: Enhanced Annealing through Re-parameterization

1 code implementation CVPR 2024 Xiang Fei, Xiawu Zheng, Yan Wang, Fei Chao, Chenglin Wu, Liujuan Cao

The simulated annealing algorithm aims to improve model convergence through multiple restarts of training.

Incremental Learning

Set Prediction Guided by Semantic Concepts for Diverse Video Captioning

no code implementations25 Dec 2023 Yifan Lu, Ziqi Zhang, Chunfeng Yuan, Peng Li, Yan Wang, Bing Li, Weiming Hu

Each caption in the set is attached to a concept combination indicating the primary semantic content of the caption and facilitating element alignment in set prediction.

Caption Generation Diversity +2

DDistill-SR: Reparameterized Dynamic Distillation Network for Lightweight Image Super-Resolution

1 code implementation22 Dec 2023 Yan Wang, Tongtong Su, Yusen Li, Jiuwen Cao, Gang Wang, Xiaoguang Liu

Specifically, we propose a plug-in reparameterized dynamic unit (RDU) to promote the performance and inference cost trade-off.

Image Super-Resolution

Object Attribute Matters in Visual Question Answering

no code implementations20 Dec 2023 Peize Li, Qingyi Si, Peng Fu, Zheng Lin, Yan Wang

In this paper, we propose a novel VQA approach from the perspective of utilizing object attribute, aiming to achieve better object-level visual-language alignment and multimodal scene understanding.

Attribute Graph Neural Network +6

Appeal: Allow Mislabeled Samples the Chance to be Rectified in Partial Label Learning

no code implementations18 Dec 2023 Chongjie Si, Xuehui Wang, Yan Wang, Xiaokang Yang, Wei Shen

In partial label learning (PLL), each instance is associated with a set of candidate labels among which only one is ground-truth.

Partial Label Learning

CogAgent: A Visual Language Model for GUI Agents

3 code implementations CVPR 2024 Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxuan Zhang, Juanzi Li, Bin Xu, Yuxiao Dong, Ming Ding, Jie Tang

People are spending an enormous amount of time on digital devices through graphical user interfaces (GUIs), e. g., computer or smartphone screens.

Ranked #4 on on

Language Modeling +5

Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models

no code implementations CVPR 2024 Shitian Zhao, Zhuowan Li, Yadong Lu, Alan Yuille, Yan Wang

We propose Causal Context Generation, Causal-CoG, which is a prompting strategy that engages contextual information to enhance precise VQA during inference.

Question Answering Visual Question Answering

Large Language Models for Intent-Driven Session Recommendations

1 code implementation7 Dec 2023 Zhu Sun, Hongyang Liu, Xinghua Qu, Kaidong Feng, Yan Wang, Yew-Soon Ong

Intent-aware session recommendation (ISR) is pivotal in discerning user intents within sessions for precise predictions.

Unified learning-based lossy and lossless JPEG recompression

no code implementations5 Dec 2023 Jianghui Zhang, Yuanyuan Wang, Lina Guo, Jixiang Luo, Tongda Xu, Yan Wang, Zhi Wang, Hongwei Qin

Most image compression algorithms only consider uncompressed original image, while ignoring a large number of already existing JPEG images.

Image Compression Quantization

An Embodied Generalist Agent in 3D World

1 code implementation18 Nov 2023 Jiangyong Huang, Silong Yong, Xiaojian Ma, Xiongkun Linghu, Puhao Li, Yan Wang, Qing Li, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang

However, several significant challenges remain: (i) most of these models rely on 2D images yet exhibit a limited capacity for 3D input; (ii) these models rarely explore the tasks inherently defined in 3D world, e. g., 3D grounding, embodied reasoning and acting.

3D dense captioning Question Answering +3

ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code

1 code implementation16 Nov 2023 Xiangru Tang, Yuliang Liu, Zefan Cai, Yanjun Shao, Junjie Lu, Yichi Zhang, Zexuan Deng, Helan Hu, Kaikai An, Ruijun Huang, Shuzheng Si, Sheng Chen, Haozhe Zhao, Liang Chen, Yan Wang, Tianyu Liu, Zhiwei Jiang, Baobao Chang, Yin Fang, Yujia Qin, Wangchunshu Zhou, Yilun Zhao, Arman Cohan, Mark Gerstein

Despite Large Language Models (LLMs) like GPT-4 achieving impressive results in function-level code generation, they struggle with repository-scale code understanding (e. g., coming up with the right arguments for calling routines), requiring a deeper comprehension of complex file interactions.

Code Generation Navigate +1

Explainable History Distillation by Marked Temporal Point Process

no code implementations13 Nov 2023 Sishun Liu, Ke Deng, Yan Wang, Xiuzhen Zhang

To efficiently solve \acrshort{ehd}, we rewrite the task into a \gls{01ip} and directly estimate the solution to the program by a model called \acrfull{model}.

counterfactual

PepLand: a large-scale pre-trained peptide representation model for a comprehensive landscape of both canonical and non-canonical amino acids

1 code implementation8 Nov 2023 Ruochi Zhang, Haoran Wu, Yuting Xiu, Kewei Li, Ningning Chen, Yu Wang, Yan Wang, Xin Gao, Fengfeng Zhou

In recent years, the scientific community has become increasingly interested on peptides with non-canonical amino acids due to their superior stability and resistance to proteolytic degradation.

Graph Neural Network

Augmenting Lane Perception and Topology Understanding with Standard Definition Navigation Maps

1 code implementation7 Nov 2023 Katie Z Luo, Xinshuo Weng, Yan Wang, Shuang Wu, Jie Li, Kilian Q Weinberger, Yue Wang, Marco Pavone

We propose a novel framework to integrate SD maps into online map prediction and propose a Transformer-based encoder, SD Map Encoder Representations from transFormers, to leverage priors in SD maps for the lane-topology prediction task.

Autonomous Driving Lane Detection +1

Diffusion-based Radiotherapy Dose Prediction Guided by Inter-slice Aware Structure Encoding

no code implementations6 Nov 2023 Zhenghao Feng, Lu Wen, Jianghong Xiao, Yuanyuan Xu, Xi Wu, Jiliu Zhou, Xingchen Peng, Yan Wang

In the forward process, DiffDose transforms dose distribution maps into pure Gaussian noise by gradually adding small noise and a noise predictor is simultaneously trained to estimate the noise added at each timestep.

Towards End-to-End Unsupervised Saliency Detection with Self-Supervised Top-Down Context

no code implementations14 Oct 2023 Yicheng Song, Shuyong Gao, Haozhe Xing, Yiting Cheng, Yan Wang, Wenqiang Zhang

Unsupervised salient object detection aims to detect salient objects without using supervision signals eliminating the tedious task of manually labeling salient objects.

Contrastive Learning object-detection +3

3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers

3 code implementations11 Oct 2023 Jieneng Chen, Jieru Mei, Xianhang Li, Yongyi Lu, Qihang Yu, Qingyue Wei, Xiangde Luo, Yutong Xie, Ehsan Adeli, Yan Wang, Matthew Lungren, Lei Xing, Le Lu, Alan Yuille, Yuyin Zhou

In this paper, we extend the 2D TransUNet architecture to a 3D network by building upon the state-of-the-art nnU-Net architecture, and fully exploring Transformers' potential in both the encoder and decoder design.

Decoder Image Segmentation +4

Bandwidth-efficient Inference for Neural Image Compression

no code implementations6 Sep 2023 Shanzhi Yin, Tongda Xu, Yongsheng Liang, Yuanyuan Wang, Yanghao Li, Yan Wang, Jingjing Liu

With neural networks growing deeper and feature maps growing larger, limited communication bandwidth with external memory (or DRAM) and power constraints become a bottleneck in implementing network inference on mobile and edge devices.

Data Compression Image Compression +1

Gene-induced Multimodal Pre-training for Image-omic Classification

no code implementations6 Sep 2023 Ting Jin, Xingran Xie, Renjie Wan, Qingli Li, Yan Wang

Histology analysis of the tumor micro-environment integrated with genomic assays is the gold standard for most cancers in modern medicine.

Classification Triplet +1

Mind vs. Mouth: On Measuring Re-judge Inconsistency of Social Bias in Large Language Models

no code implementations24 Aug 2023 Yachao Zhao, Bo wang, Dongming Zhao, Kun Huang, Yan Wang, Ruifang He, Yuexian Hou

We propose that this re-judge inconsistency can be similar to the inconsistency between human's unaware implicit social bias and their aware explicit social bias.

A Unified Framework for 3D Point Cloud Visual Grounding

1 code implementation23 Aug 2023 Haojia Lin, Yongdong Luo, Xiawu Zheng, Lijiang Li, Fei Chao, Taisong Jin, Donghao Luo, Yan Wang, Liujuan Cao, Rongrong Ji

This elaborate design enables 3DRefTR to achieve both well-performing 3DRES and 3DREC capacities with only a 6% additional latency compared to the original 3DREC model.

Referring Expression Referring Expression Comprehension +1

Polymerized Feature-based Domain Adaptation for Cervical Cancer Dose Map Prediction

no code implementations20 Aug 2023 Jie Zeng, Zeyu Han, Xingchen Peng, Jianghong Xiao, Peng Wang, Yan Wang

Recently, deep learning (DL) has automated and accelerated the clinical radiation therapy (RT) planning significantly by predicting accurate dose maps.

Domain Adaptation

Contrastive Diffusion Model with Auxiliary Guidance for Coarse-to-Fine PET Reconstruction

1 code implementation20 Aug 2023 Zeyu Han, YuHan Wang, Luping Zhou, Peng Wang, Binyu Yan, Jiliu Zhou, Yan Wang, Dinggang Shen

To obtain high-quality positron emission tomography (PET) scans while reducing radiation exposure to the human body, various approaches have been proposed to reconstruct standard-dose PET (SPET) images from low-dose PET (LPET) images.

Conditional Perceptual Quality Preserving Image Compression

no code implementations16 Aug 2023 Tongda Xu, Qian Zhang, Yanghao Li, Dailan He, Zhe Wang, Yuanyuan Wang, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang

We propose conditional perceptual quality, an extension of the perceptual quality defined in \citet{blau2018perception}, by conditioning it on user defined information.

Image Compression

TriDo-Former: A Triple-Domain Transformer for Direct PET Reconstruction from Low-Dose Sinograms

no code implementations10 Aug 2023 Jiaqi Cui, Pinxian Zeng, Xinyi Zeng, Peng Wang, Xi Wu, Jiliu Zhou, Yan Wang, Dinggang Shen

Specifically, the TriDo-Former consists of two cascaded networks, i. e., a sinogram enhancement transformer (SE-Former) for denoising the input LPET sinograms and a spatial-spectral reconstruction transformer (SSR-Former) for reconstructing SPET images from the denoised sinograms.

Denoising Image Reconstruction +1

Cross-heterogeneity Graph Few-shot Learning

no code implementations10 Aug 2023 Pengfei Ding, Yan Wang, Guanfeng Liu

In recent years, heterogeneous graph few-shot learning has been proposed to address the label sparsity issue in heterogeneous graphs (HGs), which contain various types of nodes and edges.

Few-Shot Learning Graph Neural Network +1

Kairos: Practical Intrusion Detection and Investigation using Whole-system Provenance

1 code implementation9 Aug 2023 Zijun Cheng, Qiujian Lv, Jinyuan Liang, Yan Wang, Degang Sun, Thomas Pasquier, Xueyuan Han

Sifting through their design documents, we identify four common dimensions that drive the development of provenance-based intrusion detection systems (PIDSes): scope (can PIDSes detect modern attacks that infiltrate across application boundaries?

Decoder Graph Neural Network +1

Color Image Recovery Using Generalized Matrix Completion over Higher-Order Finite Dimensional Algebra

no code implementations4 Aug 2023 Liang Liao, Zhuang Guo, Qi Gao, Yan Wang, Fajun Yu, Qifeng Zhao, Stephen Johh Maybank

To improve the accuracy of color image completion with missing entries, we present a recovery method based on generalized higher-order scalars.

Matrix Completion

Continual Learning in Predictive Autoscaling

no code implementations29 Jul 2023 Hongyan Hao, Zhixuan Chu, Shiyi Zhu, Gangwei Jiang, Yan Wang, Caigao Jiang, James Zhang, Wei Jiang, Siqiao Xue, Jun Zhou

In order to surmount this challenge and effectively integrate new sample distribution, we propose a density-based sample selection strategy that utilizes kernel density estimation to calculate sample density as a reference to compute sample weight, and employs weight sampling to construct a new memory set.

Continual Learning Density Estimation

Domain Disentanglement with Interpolative Data Augmentation for Dual-Target Cross-Domain Recommendation

no code implementations26 Jul 2023 JiaJie Zhu, Yan Wang, Feng Zhu, Zhu Sun

In DIDA-CDR, we first propose an interpolative data augmentation approach to generating both relevant and diverse augmented user representations to augment sparser domain and explore potential user preferences.

Data Augmentation Disentanglement

DiffDP: Radiotherapy Dose Prediction via a Diffusion Model

no code implementations19 Jul 2023 Zhenghao Feng, Lu Wen, Peng Wang, Binyu Yan, Xi Wu, Jiliu Zhou, Yan Wang

To alleviate this limitation, we innovatively introduce a diffusion-based dose prediction (DiffDP) model for predicting the radiotherapy dose distribution of cancer patients.

Anatomy model +1

EasyTPP: Towards Open Benchmarking Temporal Point Processes

1 code implementation16 Jul 2023 Siqiao Xue, Xiaoming Shi, Zhixuan Chu, Yan Wang, Hongyan Hao, Fan Zhou, Caigao Jiang, Chen Pan, James Y. Zhang, Qingsong Wen, Jun Zhou, Hongyuan Mei

In this paper, we present EasyTPP, the first central repository of research assets (e. g., data, models, evaluation programs, documentations) in the area of event sequence modeling.

Benchmarking Point Processes

Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling

1 code implementation16 Jul 2023 Longyue Wang, Zefeng Du, Donghuai Liu, Deng Cai, Dian Yu, Haiyun Jiang, Yan Wang, Leyang Cui, Shuming Shi, Zhaopeng Tu

Modeling discourse -- the linguistic phenomena that go beyond individual sentences, is a fundamental yet challenging aspect of natural language processing (NLP).

Diagnostic Language Modelling +1

Copy Is All You Need

1 code implementation13 Jul 2023 Tian Lan, Deng Cai, Yan Wang, Heyan Huang, Xian-Ling Mao

The dominant text generation models compose the output by sequentially selecting words from a fixed vocabulary.

All Domain Adaptation +3

SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency

no code implementations1 Jul 2023 Yan Wang, Yuhang Li, Ruihao Gong, Aishan Liu, Yanfei Wang, Jian Hu, Yongqiang Yao, Yunchen Zhang, Tianzi Xiao, Fengwei Yu, Xianglong Liu

Extensive studies have shown that deep learning models are vulnerable to adversarial and natural noises, yet little is known about model robustness on noises caused by different system implementations.

Benchmarking Data Augmentation +6

Improving the Transferability of Time Series Forecasting with Decomposition Adaptation

no code implementations30 Jun 2023 Yan Gao, Yan Wang, Qiang Wang

However, in time series forecasting, it is difficult to obtain enough data, which limits the performance of neural forecasting models.

Multivariate Time Series Forecasting Time Series +1

A Unified Framework for Online Data-Driven Predictive Control with Robust Safety Guarantees

no code implementations29 Jun 2023 Amin Vahidi-Moghaddam, Kaian Chen, Kaixiang Zhang, Zhaojian Li, Yan Wang, Kai Wu

Despite great successes, model predictive control (MPC) relies on an accurate dynamical model and requires high onboard computational power, impeding its wider adoption in engineering systems, especially for nonlinear real-time systems with limited computation power.

Model Predictive Control

Extended Neighboring Extremal Optimal Control with State and Preview Perturbations

no code implementations7 Jun 2023 Amin Vahidi-Moghaddam, Kaixiang Zhang, Zhaojian Li, Xunyuan Yin, Ziyou Song, Yan Wang

In this work, an extended NE (ENE) framework is developed to systematically adapt the nominal control to both state and preview perturbations.

Model Predictive Control

Asymptotic Performance Analysis of Large-Scale Active IRS-Aided Wireless Network

no code implementations31 May 2023 Yan Wang, Feng Shu, Zhihong Zhuang, Rongen Dong, Qi Zhang, Di wu, Liang Yang, Jiangzhou Wang

Numerical simulation results show that a 3-bit discrete phase shifter is required to achieve a trivial performance loss for a large-scale active IRS.

Quantization

MedNgage: A Dataset for Understanding Engagement in Patient-Nurse Conversations

no code implementations31 May 2023 Yan Wang, Heidi Ann Scharf Donovan, Sabit Hassan, Mailhe Alikhani

In this paper, we present a novel dataset (MedNgage), which consists of patient-nurse conversations about cancer symptom management.

Management

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

1 code implementation30 May 2023 Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Shuming Shi, Zhaopeng Tu

To address the DoT problem, we propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution.

Arithmetic Reasoning Machine Translation

Joint Uplink and Downlink Resource Allocation Towards Energy-efficient Transmission for URLLC

no code implementations25 May 2023 Kang Li, Pengcheng Zhu, Yan Wang, Fu-Chun Zheng, Xiaohu You

With the proposed packet delivery mechanism, we jointly optimize bandwidth allocation and power control of uplink and downlink, antenna configuration, and subchannel assignment to minimize the average total power under the constraint of URLLC transmission requirements.

PandaGPT: One Model To Instruction-Follow Them All

1 code implementation25 May 2023 Yixuan Su, Tian Lan, Huayang Li, Jialu Xu, Yan Wang, Deng Cai

To do so, PandaGPT combines the multimodal encoders from ImageBind and the large language models from Vicuna.

All Instruction Following

Privacy-preserving Adversarial Facial Features

no code implementations CVPR 2023 Zhibo Wang, He Wang, Shuaifan Jin, Wenwen Zhang, Jiahui Hu, Yan Wang, Peng Sun, Wei Yuan, Kaixin Liu, Kui Ren

In this paper, we propose an adversarial features-based face privacy protection (AdvFace) approach to generate privacy-preserving adversarial features, which can disrupt the mapping from adversarial features to facial images to defend against reconstruction attacks.

Face Recognition Privacy Preserving

Uncertainty Quantification in Machine Learning for Engineering Design and Health Prognostics: A Tutorial

1 code implementation7 May 2023 Venkat Nemani, Luca Biggio, Xun Huan, Zhen Hu, Olga Fink, Anh Tran, Yan Wang, Xiaoge Zhang, Chao Hu

In this tutorial, we aim to provide a holistic lens on emerging UQ methods for ML models with a particular focus on neural networks and the applications of these UQ methods in tackling engineering design as well as prognostics and health management problems.

Decision Making Management +2

Cannot find the paper you are looking for? You can Submit a new open access paper.