Search Results for author: Wenhao Wang

Found 42 papers, 26 papers with code

Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity

no code implementations12 May 2025 Guang Yan, Yuhui Zhang, Zimu Guo, Lutan Zhao, Xiaojun Chen, Chen Wang, Wenhao Wang, Dan Meng, Rui Hou

With the growing use of large language models (LLMs) hosted on cloud platforms to offer inference services, privacy concerns about the potential leakage of sensitive information are escalating.

Language Modeling Language Modelling +1

VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization

no code implementations17 Apr 2025 Menglan Chen, Xianghe Pang, Jingjing Dong, Wenhao Wang, Yaxin Du, Siheng Chen

Aligning Vision-Language Models (VLMs) with safety standards is essential to mitigate risks arising from their multimodal complexity, where integrating vision and language unveils subtle threats beyond the reach of conventional safeguards.

Multimodal Reasoning Safety Alignment

Cross-Document Cross-Lingual NLI via RST-Enhanced Graph Fusion and Interpretability Prediction

no code implementations11 Apr 2025 Mengying Yuan, Wenhao Wang, Zixuan Wang, Yujie Huang, Kangli Wei, Fei Li, Chong Teng, Donghong Ji

Our work sheds light on the study of NLI and will bring research interest on cross-document cross-lingual context understanding, hallucination elimination and interpretability inference.

Cross-Lingual Natural Language Inference Graph Attention +4

FedMABench: Benchmarking Mobile Agents on Decentralized Heterogeneous User Data

1 code implementation7 Mar 2025 Wenhao Wang, Zijie Yu, Rui Ye, Jianqing Zhang, Siheng Chen, Yanfeng Wang

FedMABench features 6 datasets with 30+ subsets, 8 federated algorithms, 10+ base models, and over 800 apps across 5 categories, providing a comprehensive framework for evaluating mobile agents across diverse environments.

Benchmarking Federated Learning

VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation

1 code implementation3 Mar 2025 Wenhao Wang, Yi Yang

The VideoUFO comprises over 1. 09 million video clips, each paired with both a brief and a detailed caption (description).

Text-to-Video Generation Video Generation

GSCE: A Prompt Framework with Enhanced Reasoning for Reliable LLM-driven Drone Control

no code implementations18 Feb 2025 Wenhao Wang, Yanyan Li, Long Jiao, Jiawei Yuan

The integration of Large Language Models (LLMs) into robotic control, including drones, has the potential to revolutionize autonomous systems.

Code Generation

A Unified Modeling Framework for Automated Penetration Testing

no code implementations17 Feb 2025 Yunfei Wang, Shixuan Liu, Wenhao Wang, Changling Zhou, Chao Zhang, Jiandong Jin, Cheng Zhu

The integration of artificial intelligence into automated penetration testing (AutoPT) has highlighted the necessity of simulation modeling for the training of intelligent agents, due to its cost-efficiency and swift feedback capabilities.

Captured by Captions: On Memorization and its Mitigation in CLIP Models

no code implementations11 Feb 2025 Wenhao Wang, Adam Dziedzic, Grace C. Kim, Michael Backes, Franziska Boenisch

Multi-modal models, such as CLIP, have demonstrated strong performance in aligning visual and textual representations, excelling in tasks like image retrieval and zero-shot classification.

Image Retrieval Memorization +3

MobileA3gent: Training Mobile GUI Agents Using Decentralized Self-Sourced Data from Diverse Users

no code implementations5 Feb 2025 Wenhao Wang, Mengying Yuan, Zijie Yu, Guangyi Liu, Rui Ye, Tian Jin, Siheng Chen, Yanfeng Wang

Given the vast population of global mobile phone users, if automated data collection from them becomes feasible, the resulting data volume and the subsequently trained mobile agents could reach unprecedented levels.

Origin Identification for Text-Guided Image-to-Image Diffusion Models

no code implementations4 Jan 2025 Wenhao Wang, Yifan Sun, Zongxin Yang, Zhentao Tan, Zhengdong Hu, Yi Yang

Subsequently, it is demonstrated that such a simple linear transformation can be generalized across different diffusion models.

Misinformation

A Variance Minimization Approach to Temporal-Difference Learning

no code implementations10 Nov 2024 Xingguo Chen, Yu Gong, Shangdong Yang, Wenhao Wang

Fast-converging algorithms are a contemporary requirement in reinforcement learning.

TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation

no code implementations5 Nov 2024 Wenhao Wang, Yi Yang

In this paper, we introduce TIP-I2V, the first large-scale dataset of over 1. 70 million unique user-provided Text and Image Prompts specifically for Image-to-Video generation.

Image to Video Generation Misinformation

Generalizable Humanoid Manipulation with 3D Diffusion Policies

1 code implementation14 Oct 2024 Yanjie Ze, Zixuan Chen, Wenhao Wang, Tianyi Chen, Xialin He, Ying Yuan, Xue Bin Peng, Jiajun Wu

However, autonomous manipulation by humanoid robots has largely been restricted to one specific scene, primarily due to the difficulty of acquiring generalizable skills and the expensiveness of in-the-wild humanoid robot data.

Camera Calibration Point Cloud Segmentation

Image Copy Detection for Diffusion Models

no code implementations30 Sep 2024 Wenhao Wang, Yifan Sun, Zhentao Tan, Yi Yang

Existing Image Copy Detection (ICD) models, though accurate in detecting hand-crafted replicas, overlook the challenge from diffusion models.

Copy Detection Marketing

Localizing Memorization in SSL Vision Encoders

no code implementations27 Sep 2024 Wenhao Wang, Adam Dziedzic, Michael Backes, Franziska Boenisch

Recent work on studying memorization in self-supervised learning (SSL) suggests that even though SSL encoders are trained on millions of images, they still memorize individual data points.

Memorization Self-Supervised Learning

MonoFormer: One Transformer for Both Diffusion and Autoregression

1 code implementation24 Sep 2024 Chuyang Zhao, Yuxing Song, Wenhao Wang, Haocheng Feng, Errui Ding, Yifan Sun, Xinyan Xiao, Jingdong Wang

Most existing multimodality methods use separate backbones for autoregression-based discrete text generation and diffusion-based continuous visual generation, or the same backbone by discretizing the visual data to use autoregression for both text and visual generation.

Image Generation Text Generation

Replication in Visual Diffusion Models: A Survey and Outlook

1 code implementation7 Jul 2024 Wenhao Wang, Yifan Sun, Zongxin Yang, Zhengdong Hu, Zhentao Tan, Yi Yang

In this survey, we provide the first comprehensive review of replication in visual diffusion models, marking a novel contribution to the field by systematically categorizing the existing studies into unveiling, understanding, and mitigating this phenomenon.

Benchmarking Survey

AnyPattern: Towards In-context Image Copy Detection

2 code implementations21 Apr 2024 Wenhao Wang, Yifan Sun, Zhentao Tan, Yi Yang

To accommodate the "seen $\rightarrow$ unseen" generalization scenario, we construct the first large-scale pattern dataset named AnyPattern, which has the largest number of tamper patterns ($90$ for training and $10$ for testing) among all the existing ones.

Copy Detection In-Context Learning

TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision

1 code implementation10 Mar 2024 Ruiwen Zhou, Yingxuan Yang, Muning Wen, Ying Wen, Wenhao Wang, Chunling Xi, Guoqiang Xu, Yong Yu, Weinan Zhang

Among these works, many of them utilize in-context examples to achieve generalization without the need for fine-tuning, while few of them have considered the problem of how to select and effectively utilize these examples.

Language Modelling Large Language Model +2

VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

1 code implementation10 Mar 2024 Wenhao Wang, Yi Yang

However, Sora, along with other text-to-video diffusion models, is highly reliant on prompts, and there is no publicly available dataset that features a study of text-to-video prompts.

Copy Detection Image Generation +3

FedRSU: Federated Learning for Scene Flow Estimation on Roadside Units

1 code implementation23 Jan 2024 Shaoheng Fang, Rui Ye, Wenhao Wang, Zuhong Liu, Yuxiao Wang, Yafei Wang, Siheng Chen, Yanfeng Wang

In this paper, we introduce FedRSU, an innovative federated learning framework for self-supervised scene flow estimation.

Autonomous Vehicles Federated Learning +2

Memorization in Self-Supervised Learning Improves Downstream Generalization

1 code implementation19 Jan 2024 Wenhao Wang, Muhammad Ahmad Kaleem, Adam Dziedzic, Michael Backes, Nicolas Papernot, Franziska Boenisch

Our definition compares the difference in alignment of representations for data points and their augmented views returned by both encoders that were trained on these data points and encoders that were not.

Memorization Self-Supervised Learning

MS-DETR: Efficient DETR Training with Mixed Supervision

1 code implementation CVPR 2024 Chuyang Zhao, Yifan Sun, Wenhao Wang, Qiang Chen, Errui Ding, Yi Yang, Jingdong Wang

The traditional training procedure using one-to-one supervision in the original DETR lacks direct supervision for the object detection candidates.

Decoder Object +2

Two-Factor Authentication Approach Based on Behavior Patterns for Defeating Puppet Attacks

no code implementations17 Nov 2023 Wenhao Wang, Guyue Li, Zhiming Chu, Haobo Li, Daniele Faccio

Furthermore, we conducted comparative experiments to validate the superiority of combining image features and timing characteristics within PUPGUARD for enhancing resistance against puppet attacks.

feature selection One-class classifier

Feature-compatible Progressive Learning for Video Copy Detection

2 code implementations20 Apr 2023 Wenhao Wang, Yifan Sun, Yi Yang

Video Copy Detection (VCD) has been developed to identify instances of unauthorized or duplicated video content.

Copy Detection Video Similarity

TransHP: Image Classification with Hierarchical Prompting

1 code implementation NeurIPS 2023 Wenhao Wang, Yifan Sun, Wei Li, Yi Yang

This paper explores a hierarchical prompting mechanism for the hierarchical image classification (HIC) task.

Classification image-classification +1

V$^2$L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval

1 code implementation26 Jul 2022 Wenhao Wang, Yifan Sun, Zongxin Yang, Yi Yang

While model ensemble is common, we show that combining the vision models and vision-language models brings particular benefits from their complementarity and is a key factor to our superiority.

Metric Learning Retrieval

A Benchmark and Asymmetrical-Similarity Learning for Practical Image Copy Detection

1 code implementation24 May 2022 Wenhao Wang, Yifan Sun, Yi Yang

Moreover, this paper further reveals a unique difficulty for solving the hard negative problem in ICD, i. e., there is a fundamental conflict between current metric learning and ICD.

Copy Detection Metric Learning

Bag of Tricks and A Strong baseline for Image Copy Detection

1 code implementation13 Nov 2021 Wenhao Wang, Weipu Zhang, Yifan Sun, Yi Yang

In this paper, a bag of tricks and a strong baseline are proposed for image copy detection.

Copy Detection Unsupervised Pre-training

D$^2$LV: A Data-Driven and Local-Verification Approach for Image Copy Detection

1 code implementation13 Nov 2021 Wenhao Wang, Yifan Sun, Weipu Zhang, Yi Yang

In this paper, a data-driven and local-verification (D$^2$LV) approach is proposed to compete for Image Similarity Challenge: Matching Track at NeurIPS'21.

Copy Detection Unsupervised Pre-training

Learning Anchored Unsigned Distance Functions with Gradient Direction Alignment for Single-view Garment Reconstruction

1 code implementation ICCV 2021 Fang Zhao, Wenhao Wang, Shengcai Liao, Ling Shao

While single-view 3D reconstruction has made significant progress benefiting from deep shape representations in recent years, garment reconstruction is still not solved well due to open surfaces, diverse topologies and complex geometric details.

Garment Reconstruction Single-View 3D Reconstruction

Scenario Forecast of Cross-border Electric Interconnection towards Renewables in South America

no code implementations11 Sep 2020 Wenhao Wang, Jing Meng, Duan Chen, Wei Cong

Cross-border Electric Interconnection towards renewables is a promising solution for electric sector under the UN 2030 sustainable development goals which is widely promoted in emerging economies.

Attentive WaveBlock: Complementarity-enhanced Mutual Networks for Unsupervised Domain Adaptation in Person Re-identification and Beyond

1 code implementation11 Jun 2020 Wenhao Wang, Fang Zhao, Shengcai Liao, Ling Shao

This paper proposes a novel light-weight module, the Attentive WaveBlock (AWB), which can be integrated into the dual networks of mutual learning to enhance the complementarity and further depress noise in the pseudo-labels.

Clustering image-classification +4

A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework

no code implementations13 Mar 2020 Yifan Gong, Zheng Zhan, Zhengang Li, Wei Niu, Xiaolong Ma, Wenhao Wang, Bin Ren, Caiwen Ding, Xue Lin, Xiao-Lin Xu, Yanzhi Wang

Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices.

Model Compression Privacy Preserving

Adapted Center and Scale Prediction: More Stable and More Accurate

1 code implementation20 Feb 2020 Wenhao Wang

Therefore, in order to enjoy the simplicity of anchor-free detectors and the accuracy of two-stage ones simultaneously, we propose some adaptations based on a detector, Center and Scale Prediction(CSP).

 Ranked #1 on Pedestrian Detection on CityPersons (Bare MR^-2 metric, using extra training data)

object-detection Object Detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.