1 code implementation • 27 Feb 2025 • Jiacheng Ye, Zhenyu Wu, Jiahui Gao, Zhiyong Wu, Xin Jiang, Zhenguo Li, Lingpeng Kong
Furthermore, DiffuSearch demonstrates a notable 30% enhancement in puzzle-solving abilities compared to explicit search-based policies, along with a significant 540 Elo increase in game-playing strength assessment.
no code implementations • 13 Feb 2025 • Xingyu Qi, He Li, Linjie Li, Zhenyu Wu
To address these gaps, this paper introduces the EmoAssist Benchmark, a comprehensive benchmark designed to evaluate the assistive performance of LMMs for the VI community.
1 code implementation • 23 Jan 2025 • Zhaoxuan Tan, Zinan Zeng, Qingkai Zeng, Zhenyu Wu, Zheyuan Liu, Fengran Mo, Meng Jiang
To address this, we introduce PerRecBench, disassociating the evaluation from these two factors and assessing recommendation techniques on capturing the personal preferences in a grouped ranking manner.
1 code implementation • 29 Dec 2024 • Xiaona Sun, Zhenyu Wu, ZhiQiang Zhan, Yang Ji
Thus, we propose contrastive conditional alignment based on label shift calibration (CCA-LSC) for IDA, to address both covariate shift and label shift.
no code implementations • 27 Dec 2024 • Qiushi Sun, Kanzhi Cheng, Zichen Ding, Chuanyang Jin, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, Ben Kao, Guohao Li, Junxian He, Yu Qiao, Zhiyong Wu
To address these challenges, we propose OS-Genesis, a novel GUI data synthesis pipeline that reverses the conventional trajectory collection process.
1 code implementation • 30 Oct 2024 • Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, Yu Qiao
Existing efforts in building GUI agents heavily rely on the availability of robust commercial Vision-Language Models (VLMs) such as GPT-4o and GeminiProVision.
Ranked #3 on
Natural Language Visual Grounding
on ScreenSpot
no code implementations • 16 Oct 2024 • Zhenyu Wu, Qingkai Zeng, Zhihan Zhang, Zhaoxuan Tan, Chao Shen, Meng Jiang
Best-of-N decoding methods instruct large language models (LLMs) to generate multiple solutions, score each using a scoring function, and select the highest scored as the final answer to mathematical reasoning problems.
no code implementations • 10 Oct 2024 • Hang Yin, Xiuwei Xu, Zhenyu Wu, Jie zhou, Jiwen Lu
Existing zero-shot object navigation methods prompt LLM with the text of spatially closed objects, which lacks enough scene context for in-depth reasoning.
1 code implementation • 17 Aug 2024 • Qingkai Zeng, Yuyang Bai, Zhaoxuan Tan, Zhenyu Wu, Shangbin Feng, Meng Jiang
Taxonomies play a crucial role in various applications by providing a structural representation of knowledge.
no code implementations • 17 Jun 2024 • Zhenyu Wu, Ziwei Wang, Xiuwei Xu, Jiwen Lu, Haibin Yan
For the task planner, we generate the feasible step-by-step plans for human goal accomplishment according to the task completion process and the known visual clues.
no code implementations • 23 May 2024 • Zhenyu Wu, Qingkai Zeng, Zhihan Zhang, Zhaoxuan Tan, Chao Shen, Meng Jiang
The condition can be an entity in an open-domain question or a numeric value in a math question, which requires minimal effort (via prompting) to identify.
1 code implementation • 19 Mar 2024 • Zhenyu Wu, Chao Shen, Meng Jiang
Lastly it instructs the LLMs with the verification on relevant and irrelevant conditions to avoid confusion and improve reasoning paths.
no code implementations • 13 Mar 2024 • Zhuoxin Chen, Zhenyu Wu, Yang Ji
In the second stage, DFL-FS employs federated feature regeneration based on global feature statistics and utilizes resampling and weighted covariance to calibrate the global classifier to enhance the model's adaptability to long-tailed data distributions.
no code implementations • 8 Feb 2024 • Linjie Li, Zhenyu Wu, Jiaming Liu, Yang Ji
Existing methods mainly focus on preserving representative samples from previous classes to combat catastrophic forgetting.
no code implementations • 22 Jan 2024 • Zhenyu Wu, Fengmao Lv, Chenglizhao Chen, Aimin Hao, Shuo Li
Colorectal polyp segmentation (CPS), an essential problem in medical image analysis, has garnered growing research attention.
1 code implementation • CVPR 2024 • Guohao Peng, Heshan Li, Yangyang Zhao, Jun Zhang, Zhenyu Wu, Pengyu Zheng, Danwei Wang
To validate TransLoc4D we construct two datasets and set up benchmarks for 4D radar place recognition.
1 code implementation • 11 Dec 2023 • Zhenyu Wu, Meng Jiang, Chao Shen
Given an initial answer from CoT, PRP iterates a verify-then-rectify process to progressively identify incorrect answers and rectify the reasoning paths.
no code implementations • 9 Oct 2023 • Zhenyu Wu, Xiuwei Xu, Ziwei Wang, Chong Xia, Linqing Zhao, Jiwen Lu, Haibin Yan
Existing methods only consider fixed frames of input data for a single detector, such as monocular RGB-D images or point clouds reconstructed from dense multi-view RGB-D images.
1 code implementation • ICCV 2023 • Yuanhao Zhai, Ziyi Liu, Zhenyu Wu, Yi Wu, Chunluan Zhou, David Doermann, Junsong Yuan, Gang Hua
The former prevents the decoder from reconstructing the video background given video features, and thus helps reduce the background information in feature learning.
1 code implementation • 2 Sep 2023 • Jun Zhang, Huayang Zhuge, Yiyao Liu, Guohao Peng, Zhenyu Wu, Haoyuan Zhang, Qiyang Lyu, Heshan Li, Chunyang Zhao, Dogan Kircali, Sanat Mharolkar, Xun Yang, Su Yi, Yuanzhe Wang, Danwei Wang
5) Considered both middle- and large- scale outdoor environments, i. e., the 6 trajectories range from 246m to 6. 95km.
no code implementations • 21 Aug 2023 • Xiaona Sun, Zhenyu Wu, Yichen Liu, Saier Hu, ZhiQiang Zhan, Yang Ji
Unsupervised Domain Adaptation (UDA) approaches address the covariate shift problem by minimizing the distribution discrepancy between the source and target domains, assuming that the label distribution is invariant across domains.
no code implementations • 16 Aug 2023 • Jialin Guo, Zhenyu Wu, ZhiQiang Zhan, Yang Ji
Moreover, we noticed that the traditional calibration evaluation metric, Excepted Calibration Error (ECE), gives a higher weight to low-confidence samples in the minority classes, which leads to inaccurate evaluation of model calibration.
1 code implementation • 4 Jul 2023 • Zhenyu Wu, Ziwei Wang, Xiuwei Xu, Jiwen Lu, Haibin Yan
Equipping embodied agents with commonsense is important for robots to successfully complete complex human instructions in general environments.
1 code implementation • International Conference on Robotics and Automation (ICRA) 2023 • Jun Zhang∗, Huayang Zhuge∗, Zhenyu Wu, Guohao Peng, Mingxing Wen, Yiyao Liu, Danwei Wang
LiDAR-based SLAM may easily fail in adverse weathers (e. g., rain, snow, smoke, fog), while mmWave Radar remains unaffected.
no code implementations • 26 Apr 2023 • Fu Chen, Junkang Zou, Lingfeng Zhou, Zekai Xu, Zhenyu Wu
In this article, we will research the Recommender System's implementation about how it works and the algorithms used.
3 code implementations • 6 Mar 2023 • Zhenyu Wu, Yaoxiang Wang, Jiacheng Ye, Jiangtao Feng, Jingjing Xu, Yu Qiao, Zhiyong Wu
However, the implementation of ICL is sophisticated due to the diverse retrieval and inference methods involved, as well as the varying pre-processing requirements for different models, datasets, and tasks.
no code implementations • 23 Feb 2023 • Zhenyu Wu, Ziwei Wang, Jiwen Lu, Haibin Yan
Then we fuse the feature maps representing the visual information of multi-view RGB images and the pixel affinity learned from the clutter point cloud, where the acquired instance segmentation masks of multi-view RGB images are projected to partition the clutter point cloud.
no code implementations • 13 Dec 2022 • Zhenyu Wu, Lin Wang, Wei Wang, Qing Xia, Chenglizhao Chen, Aimin Hao, Shuo Li
This paper attempts to answer this unexplored question by proving a hypothesis: there is a point-labeled dataset where saliency models trained on it can achieve equivalent performance when trained on the densely annotated dataset.
1 code implementation • 25 Oct 2022 • Zhenyu Wu, Shuai Li, Chenglizhao Chen, Hong Qin, Aimin Hao
First, instead of using the vanilla convolution with fixed kernel sizes for the encoder design, we propose the dynamic pyramid convolution (DPConv), which dynamically selects the best-suited kernel sizes w. r. t.
1 code implementation • 25 Oct 2022 • Zhenyu Wu, Lin Wang, Wei Wang, Tengfei Shi, Chenglizhao Chen, Aimin Hao, Shuo Li
In this paper, we propose a novel yet effective method for SOD, coined SODGAN, which can generate infinite high-quality image-mask pairs requiring only a few labeled data, and these synthesized pairs can replace the human-labeled DUTS-TR to train any off-the-shelf SOD model.
no code implementations • 13 Jun 2022 • Priya Narayanan, Xin Hu, Zhenyu Wu, Matthew D Thielke, John G Rogers, Andre V Harrison, John A D'Agostino, James D Brown, Long P Quang, James R Uplinger, Heesung Kwon, Zhangyang Wang
The full dataset presented in this paper, including the ground truth object classification bounding boxes and haze density measurements, is provided for the community to evaluate their algorithms at: https://a2i2-archangel. vision.
1 code implementation • 5 Jun 2022 • Zhenyu Hu, Zhenyu Wu, Pengcheng Pi, Yunhe Xue, Jiayi Shen, Jianchao Tan, Xiangru Lian, Zhangyang Wang, Ji Liu
Unmanned Aerial Vehicles (UAVs) based video text spotting has been extensively used in civil and military domains.
1 code implementation • 9 Apr 2022 • Xin Hu, Zhenyu Wu, Hao-Yu Miao, Siqi Fan, Taiyu Long, Zhenyu Hu, Pengcheng Pi, Yi Wu, Zhou Ren, Zhangyang Wang, Gang Hua
Video action detection (spatio-temporal action localization) is usually the starting point for human-centric intelligent analysis of videos nowadays.
1 code implementation • 26 Nov 2021 • Kumara Kahatapitiya, Zhou Ren, Haoxiang Li, Zhenyu Wu, Michael S. Ryoo, Gang Hua
However, such pretrained models are not ideal for downstream detection, due to the disparity between the pretraining and the downstream fine-tuning tasks.
Ranked #3 on
Action Detection
on Charades
no code implementations • 19 Aug 2021 • Guohao Peng, Yufeng Yue, Jun Zhang, Zhenyu Wu, Xiaoyu Tang, Danwei Wang
(2) By exploiting the interpretability of the local weighting scheme, a semantic constrained initialization is proposed so that the local attention can be reinforced by semantic priors.
1 code implementation • 23 Jul 2021 • Zhenyu Wu, Zhaowen Wang, Ye Yuan, Jianming Zhang, Zhangyang Wang, Hailin Jin
Existing diversity tests of samples from GANs are usually conducted qualitatively on a small scale, and/or depends on the access to original training data as well as the trained model parameters.
no code implementations • 6 Oct 2020 • Yuli Zheng, Zhenyu Wu, Ye Yuan, Tianlong Chen, Zhangyang Wang
While machine learning is increasingly used in this field, the resulting large-scale collection of user private information has reinvigorated the privacy debate, considering dozens of data breach incidents every year caused by unauthorized hackers, and (potentially even more) information misuse/abuse by authorized parties.
1 code implementation • 2 Oct 2020 • Zhenyu Wu, Duc Hoang, Shih-Yao Lin, Yusheng Xie, Liangjian Chen, Yen-Yu Lin, Zhangyang Wang, Wei Fan
Estimating the 3D hand pose from a monocular RGB image is important but challenging.
no code implementations • 25 Sep 2019 • Zhenyu Wu, Ye Yuan, Zhaowen Wang, Jianming Zhang, Zhangyang Wang, Hailin Jin
Generative adversarial networks (GANs) nowadays are capable of producing im-ages of incredible realism.