1 code implementation • 9 Jan 2025 • Hao Wen, Ziqian Lu, Fengli Shen, Zhe-Ming Lu, Jialin Cui
We propose a new action recognition framework introducing object nodes to supplement absent interactive object information.
no code implementations • 24 Dec 2024 • Hao Wen, Shizuo Tian, Borislav Pavlov, Wenjie Du, Yixuan Li, Ge Chang, Shanhui Zhao, Jiacheng Liu, Yunxin Liu, Ya-Qin Zhang, Yuanchun Li
Inspired by the remarkable coding abilities of recent small language models (SLMs), we propose to convert the UI task automation problem to a code generation problem, which can be effectively solved by an on-device SLM and efficiently executed with an on-device code interpreter.
no code implementations • 19 Dec 2024 • Yiyu Zhuang, Jiaxi Lv, Hao Wen, Qing Shuai, Ailing Zeng, Hao Zhu, Shifeng Chen, Yujiu Yang, Xun Cao, Wei Liu
Creating a high-fidelity, animatable 3D full-body avatar from a single image is a challenging task due to the diverse appearance and poses of humans and the limited availability of high-quality training data.
1 code implementation • 13 Dec 2024 • Jiacheng Liu, Yuanchun Li, Liangyan Li, Yi Sun, Hao Wen, Xiangyu Li, Yao Guo, Yunxin Liu
Many applications demand context sensing to offer personalized and timely services.
1 code implementation • 24 Nov 2024 • Chunhui Zhang, Li Liu, Hao Wen, Xi Zhou, Yanfeng Wang
Night unmanned aerial vehicle (UAV) tracking is impeded by the challenges of poor illumination, with previous daylight-optimized methods demonstrating suboptimal performance in low-light conditions, limiting the utility of UAV applications.
no code implementations • 9 Nov 2024 • Jing Huang, Hao Wen, Tianyi Zhou, Haozhe Lin, Yu-Kun Lai, Kun Li
This paper aims to reconstruct hundreds of people's 3D poses, shapes, and locations from a single image with unknown camera parameters.
2 code implementations • 25 Sep 2024 • Chunhui Zhang, Li Liu, Guanjie Huang, Hao Wen, Xi Zhou, Yanfeng Wang
Based on the proposed dataset, this paper first comprehensively evaluates current advanced visual object tracking methods and SAM- and SAM2-based trackers in challenging underwater environments.
no code implementations • 9 Sep 2024 • Shiming Ge, Kangkai Zhang, Haolin Liu, Yingying Hua, Shengwei Zhao, Xin Jin, Hao Wen
In spite of great success in many image recognition tasks achieved by recent deep models, directly applying them to recognize low-resolution images may suffer from low accuracy due to the missing of informative details during resolution degradation.
no code implementations • 21 Aug 2024 • Yaowen Bi, Yuteng Lian, Jie Cui, Jun Liu, Peijian Wang, Guanghui Li, Xuejun Chen, Jinglin Zhao, Hao Wen, Jing Zhang, Zhaoqi Zhang, Wenzhuo Song, Yang Sun, Weiwei Zhang, Mingchen Cai, Jian Dong, Guanxing Zhang
DTN introduces multiple diversified task-specific feature interaction methods and task-sensitive network in MTL networks, enabling the model to learn task-specific diversified feature interaction representations, which improves the efficiency of joint representation learning in a general setup.
1 code implementation • 12 Jul 2024 • Endong Gu, Yongxin Chen, Hao Wen, Xingju Cai, Deren Han
This paper proposes LCFL, a novel clustering metric for evaluating clients' data distributions in federated learning.
1 code implementation • 5 Jun 2024 • Hao Wen, Zehuan Huang, Yaohui Wang, Xinyuan Chen, Yu Qiao, Lu Sheng
However, training these two stages separately leads to significant data bias in the inference phase, thus affecting the quality of reconstructed results.
1 code implementation • 30 May 2024 • Chunhui Zhang, Li Liu, Guanjie Huang, Hao Wen, Xi Zhou, Yanfeng Wang
Most existing trackers are tailored for open-air environments, leading to performance degradation when applied to UOT due to domain gaps.
5 code implementations • 23 May 2024 • Chunhui Zhang, Li Liu, Hao Wen, Xi Zhou, Yanfeng Wang
To leverage more modalities, some recent efforts have been made to learn a unified visual object tracking model for any modality.
no code implementations • 13 Apr 2024 • Chenming Shang, Hengyuan Zhang, Hao Wen, Yujiu Yang
The multimodal deep neural networks, represented by CLIP, have generated rich downstream applications owing to their excellent performance, thus making understanding the decision-making process of CLIP an essential research topic.
2 code implementations • 10 Jan 2024 • Yuanchun Li, Hao Wen, Weijun Wang, Xiangyu Li, Yizhen Yuan, Guohong Liu, Jiacheng Liu, Wenxing Xu, Xiang Wang, Yi Sun, Rui Kong, Yile Wang, Hanfei Geng, Jian Luan, Xuefeng Jin, Zilong Ye, Guanjing Xiong, Fan Zhang, Xiang Li, Mengwei Xu, Zhijun Li, Peng Li, Yang Liu, Ya-Qin Zhang, Yunxin Liu
Next, we discuss several key challenges to achieve intelligent, efficient and secure Personal LLM Agents, followed by a comprehensive survey of representative solutions to address these challenges.
1 code implementation • CVPR 2024 • Zehuan Huang, Hao Wen, Junting Dong, Yaohui Wang, Yangguang Li, Xinyuan Chen, Yan-Pei Cao, Ding Liang, Yu Qiao, Bo Dai, Lu Sheng
Generating multiview images from a single view facilitates the rapid generation of a 3D mesh conditioned on a single image.
no code implementations • 6 Dec 2023 • Hao Wen, Jakob Zeitler, Connor Rupnow
To minimize station downtime and maximize experimental throughput, it is practical to run experiments in asynchronous parallel, in which multiple experiments are being performed at once in different stages.
no code implementations • 29 Aug 2023 • Wenxing Xu, Yuanchun Li, Jiacheng Liu, Yi Sun, Zhengyang Cao, Yixuan Li, Hao Wen, Yunxin Liu
Unlike cloud-based deep learning models that are often large and uniform, edge-deployed models usually demand customization for domain-specific tasks and resource-limited environments.
1 code implementation • 29 Aug 2023 • Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, Yunxin Liu
Mobile task automation is an attractive technique that aims to enable voice-based hands-free user interaction with smartphones.
1 code implementation • ICCV 2023 • Zhiqiang Shen, Xiaoxiao Sheng, Hehe Fan, Longguang Wang, Yulan Guo, Qiong Liu, Hao Wen, Xi Zhou
In this paper, we propose a Masked Spatio-Temporal Structure Prediction (MaST-Pre) method to capture the structure of point cloud videos without human annotations.
no code implementations • 14 Aug 2023 • Hao Wen, Jie Wang, Xiaodong Qiao
The recognition of abstracts is crucial for effectively locating the content and clarifying the article.
1 code implementation • 23 Jun 2023 • Xiaogang Peng, Xiao Zhou, Yikai Luo, Hao Wen, Yu Ding, Zizhao Wu
We believe that the proposed MI-Motion benchmark dataset and baseline will facilitate future research in this area, ultimately leading to better understanding and modeling of multi-person interactions.
1 code implementation • 30 May 2023 • Xiaogang Peng, Hao Wen, Yikai Luo, Xiao Zhou, Keyang Yu, Ping Yang, Zizhao Wu
To overcome this, we propose HyperVD, a novel framework that learns snippet embeddings in hyperbolic space to improve model discrimination.
1 code implementation • 14 Apr 2023 • Hao Wen, Hongming Wang, Jiaxuan Liu, Yuanchun Li
Given a natural language description of a desired task, DroidBot-GPT can automatically generate and execute actions that navigate the app to complete the task.
no code implementations • 13 Mar 2023 • Hao Wen, Yuanchun Li, Zunshuai Zhang, Shiqi Jiang, Xiaozhou Ye, Ye Ouyang, Ya-Qin Zhang, Yunxin Liu
Model elastification generates a high-quality search space of model architectures with the guidance of a developer-specified oracle model.
1 code implementation • 6 Mar 2023 • Hao Wen, Jingsu Kang
Aim: The George B. Moody PhysioNet Challenge 2022 raised problems of heart murmur detection and related abnormal cardiac function identification from phonocardiograms (PCGs).
no code implementations • CVPR 2023 • Hao Wen, Jing Huang, Huili Cui, Haozhe Lin, Yukun Lai, Lu Fang, Kun Li
However, existing methods cannot deal with large scenes containing hundreds of people, which encounter the challenges of large number of people, large variations in human scale, and complex spatial distribution.
no code implementations • 9 Dec 2022 • Xinzhe Ni, Yong liu, Hao Wen, Yatai Ji, Jing Xiao, Yujiu Yang
Then in the visual flow, visual prototypes are computed by a visual prototype-computed module.
1 code implementation • 30 Jul 2022 • Hao Wen, Yunze Liu, Jingwei Huang, Bo Duan, Li Yi
This paper proposes a 4D backbone for long-term point cloud video understanding.
1 code implementation • 1 Jul 2021 • Xiongjie Chen, Hao Wen, Yunpeng Li
Differentiable particle filters provide a flexible mechanism to adaptively train dynamic and measurement models by learning from observed data.
1 code implementation • 11 Nov 2020 • Hao Wen, Xiongjie Chen, Georgios Papagiannis, Conghui Hu, Yunpeng Li
Recent advances in incorporating neural networks into particle filters provide the desired flexibility to apply particle filters in large-scale real-world applications.
no code implementations • Aerospace Science and Technology 2020 • Shidong Xu ∗, Hao Wen, Zheng Huang, Dongping Jin
In the meantime, the asymmetric constraint on tether tension is handled in controller design and stability analysis such that tether tension can be kept within a prescribed positive range during the deployment.