no code implementations • 12 May 2025 • Guang Yan, Yuhui Zhang, Zimu Guo, Lutan Zhao, Xiaojun Chen, Chen Wang, Wenhao Wang, Dan Meng, Rui Hou
With the growing use of large language models (LLMs) hosted on cloud platforms to offer inference services, privacy concerns about the potential leakage of sensitive information are escalating.
no code implementations • 17 Apr 2025 • Menglan Chen, Xianghe Pang, Jingjing Dong, Wenhao Wang, Yaxin Du, Siheng Chen
Aligning Vision-Language Models (VLMs) with safety standards is essential to mitigate risks arising from their multimodal complexity, where integrating vision and language unveils subtle threats beyond the reach of conventional safeguards.
no code implementations • 11 Apr 2025 • Mengying Yuan, Wenhao Wang, Zixuan Wang, Yujie Huang, Kangli Wei, Fei Li, Chong Teng, Donghong Ji
Our work sheds light on the study of NLI and will bring research interest on cross-document cross-lingual context understanding, hallucination elimination and interpretability inference.
1 code implementation • 9 Mar 2025 • AgiBot-World-Contributors, Qingwen Bu, Jisong Cai, Li Chen, Xiuqi Cui, Yan Ding, Siyuan Feng, Shenyuan Gao, Xindong He, Xuan Hu, Xu Huang, Shu Jiang, Yuxin Jiang, Cheng Jing, Hongyang Li, Jialu Li, Chiming Liu, Yi Liu, Yuxiang Lu, Jianlan Luo, Ping Luo, Yao Mu, Yuehan Niu, Yixuan Pan, Jiangmiao Pang, Yu Qiao, Guanghui Ren, Cheng Ruan, Jiaqi Shan, Yongjian Shen, Chengshi Shi, Mingkang Shi, Modi shi, Chonghao Sima, Jianheng Song, Huijie Wang, Wenhao Wang, Dafeng Wei, Chengen Xie, Guo Xu, Junchi Yan, Cunbiao Yang, Lei Yang, Shukai Yang, Maoqing Yao, Jia Zeng, Chi Zhang, Qinglin Zhang, Bin Zhao, Chengyue Zhao, Jiaqi Zhao, Jianchao Zhu
Introducing AgiBot World, a large-scale platform comprising over 1 million trajectories across 217 tasks in five deployment scenarios, we achieve an order-of-magnitude increase in data scale compared to existing datasets.
1 code implementation • 7 Mar 2025 • Wenhao Wang, Zijie Yu, Rui Ye, Jianqing Zhang, Siheng Chen, Yanfeng Wang
FedMABench features 6 datasets with 30+ subsets, 8 federated algorithms, 10+ base models, and over 800 apps across 5 categories, providing a comprehensive framework for evaluating mobile agents across diverse environments.
1 code implementation • 3 Mar 2025 • Wenhao Wang, Yi Yang
The VideoUFO comprises over 1. 09 million video clips, each paired with both a brief and a detailed caption (description).
no code implementations • 18 Feb 2025 • Wenhao Wang, Yanyan Li, Long Jiao, Jiawei Yuan
The integration of Large Language Models (LLMs) into robotic control, including drones, has the potential to revolutionize autonomous systems.
no code implementations • 17 Feb 2025 • Yunfei Wang, Shixuan Liu, Wenhao Wang, Changling Zhou, Chao Zhang, Jiandong Jin, Cheng Zhu
The integration of artificial intelligence into automated penetration testing (AutoPT) has highlighted the necessity of simulation modeling for the training of intelligent agents, due to its cost-efficiency and swift feedback capabilities.
no code implementations • 11 Feb 2025 • Wenhao Wang, Adam Dziedzic, Grace C. Kim, Michael Backes, Franziska Boenisch
Multi-modal models, such as CLIP, have demonstrated strong performance in aligning visual and textual representations, excelling in tasks like image retrieval and zero-shot classification.
no code implementations • 5 Feb 2025 • Wenhao Wang, Mengying Yuan, Zijie Yu, Guangyi Liu, Rui Ye, Tian Jin, Siheng Chen, Yanfeng Wang
Given the vast population of global mobile phone users, if automated data collection from them becomes feasible, the resulting data volume and the subsequently trained mobile agents could reach unprecedented levels.
no code implementations • 4 Jan 2025 • Wenhao Wang, Yifan Sun, Zongxin Yang, Zhentao Tan, Zhengdong Hu, Yi Yang
Subsequently, it is demonstrated that such a simple linear transformation can be generalized across different diffusion models.
no code implementations • 10 Nov 2024 • Xingguo Chen, Yu Gong, Shangdong Yang, Wenhao Wang
Fast-converging algorithms are a contemporary requirement in reinforcement learning.
no code implementations • 5 Nov 2024 • Wenhao Wang, Yi Yang
In this paper, we introduce TIP-I2V, the first large-scale dataset of over 1. 70 million unique user-provided Text and Image Prompts specifically for Image-to-Video generation.
1 code implementation • 14 Oct 2024 • Yanjie Ze, Zixuan Chen, Wenhao Wang, Tianyi Chen, Xialin He, Ying Yuan, Xue Bin Peng, Jiajun Wu
However, autonomous manipulation by humanoid robots has largely been restricted to one specific scene, primarily due to the difficulty of acquiring generalizable skills and the expensiveness of in-the-wild humanoid robot data.
1 code implementation • 8 Oct 2024 • Wenhao Wang, Xiaoyu Liang, Rui Ye, Jingyi Chai, Siheng Chen, Yanfeng Wang
The success of large language models (LLMs) facilitate many parties to fine-tune LLMs on their own private data.
no code implementations • 30 Sep 2024 • Wenhao Wang, Yifan Sun, Zhentao Tan, Yi Yang
Existing Image Copy Detection (ICD) models, though accurate in detecting hand-crafted replicas, overlook the challenge from diffusion models.
no code implementations • 27 Sep 2024 • Wenhao Wang, Adam Dziedzic, Michael Backes, Franziska Boenisch
Recent work on studying memorization in self-supervised learning (SSL) suggests that even though SSL encoders are trained on millions of images, they still memorize individual data points.
1 code implementation • 24 Sep 2024 • Chuyang Zhao, Yuxing Song, Wenhao Wang, Haocheng Feng, Errui Ding, Yifan Sun, Xinyan Xiao, Jingdong Wang
Most existing multimodality methods use separate backbones for autoregression-based discrete text generation and diffusion-based continuous visual generation, or the same backbone by discretizing the visual data to use autoregression for both text and visual generation.
1 code implementation • 7 Jul 2024 • Wenhao Wang, Yifan Sun, Zongxin Yang, Zhengdong Hu, Zhentao Tan, Yi Yang
In this survey, we provide the first comprehensive review of replication in visual diffusion models, marking a novel contribution to the field by systematically categorizing the existing studies into unveiling, understanding, and mitigating this phenomenon.
2 code implementations • 21 Apr 2024 • Wenhao Wang, Yifan Sun, Zhentao Tan, Yi Yang
To accommodate the "seen $\rightarrow$ unseen" generalization scenario, we construct the first large-scale pattern dataset named AnyPattern, which has the largest number of tamper patterns ($90$ for training and $10$ for testing) among all the existing ones.
1 code implementation • 10 Mar 2024 • Ruiwen Zhou, Yingxuan Yang, Muning Wen, Ying Wen, Wenhao Wang, Chunling Xi, Guoqiang Xu, Yong Yu, Weinan Zhang
Among these works, many of them utilize in-context examples to achieve generalization without the need for fine-tuning, while few of them have considered the problem of how to select and effectively utilize these examples.
1 code implementation • 10 Mar 2024 • Wenhao Wang, Yi Yang
However, Sora, along with other text-to-video diffusion models, is highly reliant on prompts, and there is no publicly available dataset that features a study of text-to-video prompts.
3 code implementations • 10 Feb 2024 • Rui Ye, Wenhao Wang, Jingyi Chai, Dihan Li, Zexi Li, Yinda Xu, Yaxin Du, Yanfeng Wang, Siheng Chen
Trained on massive publicly available data, large language models (LLMs) have demonstrated tremendous success across various fields.
1 code implementation • 23 Jan 2024 • Shaoheng Fang, Rui Ye, Wenhao Wang, Zuhong Liu, Yuxiao Wang, Yafei Wang, Siheng Chen, Yanfeng Wang
In this paper, we introduce FedRSU, an innovative federated learning framework for self-supervised scene flow estimation.
1 code implementation • 19 Jan 2024 • Wenhao Wang, Muhammad Ahmad Kaleem, Adam Dziedzic, Michael Backes, Nicolas Papernot, Franziska Boenisch
Our definition compares the difference in alignment of representations for data points and their augmented views returned by both encoders that were trained on these data points and encoders that were not.
1 code implementation • CVPR 2024 • Chuyang Zhao, Yifan Sun, Wenhao Wang, Qiang Chen, Errui Ding, Yi Yang, Jingdong Wang
The traditional training procedure using one-to-one supervision in the original DETR lacks direct supervision for the object detection candidates.
no code implementations • 17 Nov 2023 • Wenhao Wang, Guyue Li, Zhiming Chu, Haobo Li, Daniele Faccio
Furthermore, we conducted comparative experiments to validate the superiority of combining image features and timing characteristics within PUPGUARD for enhancing resistance against puppet attacks.
2 code implementations • 20 Apr 2023 • Wenhao Wang, Yifan Sun, Yi Yang
Video Copy Detection (VCD) has been developed to identify instances of unauthorized or duplicated video content.
1 code implementation • NeurIPS 2023 • Wenhao Wang, Yifan Sun, Wei Li, Yi Yang
This paper explores a hierarchical prompting mechanism for the hierarchical image classification (HIC) task.
1 code implementation • 26 Jul 2022 • Wenhao Wang, Yifan Sun, Zongxin Yang, Yi Yang
While model ensemble is common, we show that combining the vision models and vision-language models brings particular benefits from their complementarity and is a key factor to our superiority.
1 code implementation • 24 May 2022 • Wenhao Wang, Yifan Sun, Yi Yang
Moreover, this paper further reveals a unique difficulty for solving the hard negative problem in ICD, i. e., there is a fundamental conflict between current metric learning and ICD.
no code implementations • 8 Feb 2022 • Zoë Papakipos, Giorgos Tolias, Tomas Jenicek, Ed Pizzi, Shuhei Yokoo, Wenhao Wang, Yifan Sun, Weipu Zhang, Yi Yang, Sanjay Addicam, Sergio Manuel Papadakis, Cristian Canton Ferrer, Ondrej Chum, Matthijs Douze
The 2021 Image Similarity Challenge introduced a dataset to serve as a new benchmark to evaluate recent image copy detection methods.
1 code implementation • 13 Nov 2021 • Wenhao Wang, Weipu Zhang, Yifan Sun, Yi Yang
In this paper, a bag of tricks and a strong baseline are proposed for image copy detection.
1 code implementation • 13 Nov 2021 • Wenhao Wang, Yifan Sun, Weipu Zhang, Yi Yang
In this paper, a data-driven and local-verification (D$^2$LV) approach is proposed to compete for Image Similarity Challenge: Matching Track at NeurIPS'21.
1 code implementation • ICCV 2021 • Fang Zhao, Wenhao Wang, Shengcai Liao, Ling Shao
While single-view 3D reconstruction has made significant progress benefiting from deep shape representations in recent years, garment reconstruction is still not solved well due to open surfaces, diverse topologies and complex geometric details.
1 code implementation • 24 Nov 2020 • Wenhao Wang, Shengcai Liao, Fang Zhao, Cuicui Kang, Ling Shao
In this way, human annotations are no longer required, and it is scalable to large and diverse real-world datasets.
Generalizable Person Re-identification
Unsupervised Domain Adaptation
3 code implementations • 15 Sep 2020 • Kai Zhang, Martin Danelljan, Yawei Li, Radu Timofte, Jie Liu, Jie Tang, Gangshan Wu, Yu Zhu, Xiangyu He, Wenjie Xu, Chenghua Li, Cong Leng, Jian Cheng, Guangyang Wu, Wenyi Wang, Xiaohong Liu, Hengyuan Zhao, Xiangtao Kong, Jingwen He, Yu Qiao, Chao Dong, Maitreya Suin, Kuldeep Purohit, A. N. Rajagopalan, Xiaochuan Li, Zhiqiang Lang, Jiangtao Nie, Wei Wei, Lei Zhang, Abdul Muqeet, Jiwon Hwang, Subin Yang, JungHeum Kang, Sung-Ho Bae, Yongwoo Kim, Geun-Woo Jeon, Jun-Ho Choi, Jun-Hyuk Kim, Jong-Seok Lee, Steven Marty, Eric Marty, Dongliang Xiong, Siang Chen, Lin Zha, Jiande Jiang, Xinbo Gao, Wen Lu, Haicheng Wang, Vineeth Bhaskara, Alex Levinshtein, Stavros Tsogkas, Allan Jepson, Xiangzhen Kong, Tongtong Zhao, Shanshan Zhao, Hrishikesh P. S, Densen Puthussery, Jiji C. V, Nan Nan, Shuai Liu, Jie Cai, Zibo Meng, Jiaming Ding, Chiu Man Ho, Xuehui Wang, Qiong Yan, Yuzhi Zhao, Long Chen, Jiangtao Zhang, Xiaotong Luo, Liang Chen, Yanyun Qu, Long Sun, Wenhao Wang, Zhenbing Liu, Rushi Lan, Rao Muhammad Umer, Christian Micheloni
This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results.
no code implementations • 11 Sep 2020 • Wenhao Wang, Jing Meng, Duan Chen, Wei Cong
Cross-border Electric Interconnection towards renewables is a promising solution for electric sector under the UN 2030 sustainable development goals which is widely promoted in emerging economies.
1 code implementation • 11 Jun 2020 • Wenhao Wang, Fang Zhao, Shengcai Liao, Ling Shao
This paper proposes a novel light-weight module, the Attentive WaveBlock (AWB), which can be integrated into the dual networks of mutual learning to enhance the complementarity and further depress noise in the pseudo-labels.
Ranked #3 on
Unsupervised Domain Adaptation
on Duke to MSMT
no code implementations • 13 Mar 2020 • Yifan Gong, Zheng Zhan, Zhengang Li, Wei Niu, Xiaolong Ma, Wenhao Wang, Bin Ren, Caiwen Ding, Xue Lin, Xiao-Lin Xu, Yanzhi Wang
Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices.
1 code implementation • 20 Feb 2020 • Wenhao Wang
Therefore, in order to enjoy the simplicity of anchor-free detectors and the accuracy of two-stage ones simultaneously, we propose some adaptations based on a detector, Center and Scale Prediction(CSP).
Ranked #1 on
Pedestrian Detection
on CityPersons
(Bare MR^-2 metric, using extra
training data)
2 code implementations • 4 Nov 2019 • Kai Zhang, Shuhang Gu, Radu Timofte, Zheng Hui, Xiumei Wang, Xinbo Gao, Dongliang Xiong, Shuai Liu, Ruipeng Gang, Nan Nan, Chenghua Li, Xueyi Zou, Ning Kang, Zhan Wang, Hang Xu, Chaofeng Wang, Zheng Li, Lin-Lin Wang, Jun Shi, Wenyu Sun, Zhiqiang Lang, Jiangtao Nie, Wei Wei, Lei Zhang, Yazhe Niu, Peijin Zhuo, Xiangzhen Kong, Long Sun, Wenhao Wang
The challenge had 3 tracks.