Search Results for author: Hao Wen

Found 32 papers, 19 papers with code

Improving Skeleton-based Action Recognition with Interactive Object Information

1 code implementation9 Jan 2025 Hao Wen, Ziqian Lu, Fengli Shen, Zhe-Ming Lu, Jialin Cui

We propose a new action recognition framework introducing object nodes to supplement absent interactive object information.

Action Recognition Data Augmentation +3

AutoDroid-V2: Boosting SLM-based GUI Agents via Code Generation

no code implementations24 Dec 2024 Hao Wen, Shizuo Tian, Borislav Pavlov, Wenjie Du, Yixuan Li, Ge Chang, Shanhui Zhao, Jiacheng Liu, Yunxin Liu, Ya-Qin Zhang, Yuanchun Li

Inspired by the remarkable coding abilities of recent small language models (SLMs), we propose to convert the UI task automation problem to a code generation problem, which can be effectively solved by an on-device SLM and efficiently executed with an on-device code interpreter.

Code Generation

IDOL: Instant Photorealistic 3D Human Creation from a Single Image

no code implementations19 Dec 2024 Yiyu Zhuang, Jiaxi Lv, Hao Wen, Qing Shuai, Ailing Zeng, Hao Zhu, Shifeng Chen, Yujiu Yang, Xun Cao, Wei Liu

Creating a high-fidelity, animatable 3D full-body avatar from a single image is a challenging task due to the diverse appearance and poses of humans and the limited availability of high-quality training data.

MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking

1 code implementation24 Nov 2024 Chunhui Zhang, Li Liu, Hao Wen, Xi Zhou, Yanfeng Wang

Night unmanned aerial vehicle (UAV) tracking is impeded by the challenges of poor illumination, with previous daylight-optimized methods demonstrating suboptimal performance in low-light conditions, limiting the utility of UAV applications.

Image Enhancement Mamba +1

Crowd3D++: Robust Monocular Crowd Reconstruction with Upright Space

no code implementations9 Nov 2024 Jing Huang, Hao Wen, Tianyi Zhou, Haozhe Lin, Yu-Kun Lai, Kun Li

This paper aims to reconstruct hundreds of people's 3D poses, shapes, and locations from a single image with unknown camera parameters.

Towards Underwater Camouflaged Object Tracking: Benchmark and Baselines

2 code implementations25 Sep 2024 Chunhui Zhang, Li Liu, Guanjie Huang, Hao Wen, Xi Zhou, Yanfeng Wang

Based on the proposed dataset, this paper first comprehensively evaluates current advanced visual object tracking methods and SAM- and SAM2-based trackers in challenging underwater environments.

Object Video Segmentation +2

Look One and More: Distilling Hybrid Order Relational Knowledge for Cross-Resolution Image Recognition

no code implementations9 Sep 2024 Shiming Ge, Kangkai Zhang, Haolin Liu, Yingying Hua, Shengwei Zhao, Xin Jin, Hao Wen

In spite of great success in many image recognition tasks achieved by recent deep models, directly applying them to recognize low-resolution images may suffer from low accuracy due to the missing of informative details during resolution degradation.

Face Recognition Image Classification +3

DTN: Deep Multiple Task-specific Feature Interactions Network for Multi-Task Recommendation

no code implementations21 Aug 2024 Yaowen Bi, Yuteng Lian, Jie Cui, Jun Liu, Peijian Wang, Guanghui Li, Xuejun Chen, Jinglin Zhao, Hao Wen, Jing Zhang, Zhaoqi Zhang, Wenzhuo Song, Yang Sun, Weiwei Zhang, Mingchen Cai, Jian Dong, Guanxing Zhang

DTN introduces multiple diversified task-specific feature interaction methods and task-sensitive network in MTL networks, enabling the model to learn task-specific diversified feature interaction representations, which improves the efficiency of joint representation learning in a general setup.

Feature Importance Multi-Task Learning +2

Novel clustered federated learning based on local loss

1 code implementation12 Jul 2024 Endong Gu, Yongxin Chen, Hao Wen, Xingju Cai, Deren Han

This paper proposes LCFL, a novel clustering metric for evaluating clients' data distributions in federated learning.

Clustering Federated Learning

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

1 code implementation5 Jun 2024 Hao Wen, Zehuan Huang, Yaohui Wang, Xinyuan Chen, Yu Qiao, Lu Sheng

However, training these two stages separately leads to significant data bias in the inference phase, thus affecting the quality of reconstructed results.

3D Generation 3D Reconstruction +2

WebUOT-1M: Advancing Deep Underwater Object Tracking with A Million-Scale Benchmark

1 code implementation30 May 2024 Chunhui Zhang, Li Liu, Guanjie Huang, Hao Wen, Xi Zhou, Yanfeng Wang

Most existing trackers are tailored for open-air environments, leading to performance degradation when applied to UOT due to domain gaps.

Knowledge Distillation Object Tracking

Awesome Multi-modal Object Tracking

5 code implementations23 May 2024 Chunhui Zhang, Li Liu, Hao Wen, Xi Zhou, Yanfeng Wang

To leverage more modalities, some recent efforts have been made to learn a unified visual object tracking model for any modality.

Autonomous Driving Knowledge Distillation +5

Understanding Multimodal Deep Neural Networks: A Concept Selection View

no code implementations13 Apr 2024 Chenming Shang, Hengyuan Zhang, Hao Wen, Yujiu Yang

The multimodal deep neural networks, represented by CLIP, have generated rich downstream applications owing to their excellent performance, thus making understanding the decision-making process of CLIP an essential research topic.

Decision Making

Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security

2 code implementations10 Jan 2024 Yuanchun Li, Hao Wen, Weijun Wang, Xiangyu Li, Yizhen Yuan, Guohong Liu, Jiacheng Liu, Wenxing Xu, Xiang Wang, Yi Sun, Rui Kong, Yile Wang, Hanfei Geng, Jian Luan, Xuefeng Jin, Zilong Ye, Guanjing Xiong, Fan Zhang, Xiang Li, Mengwei Xu, Zhijun Li, Peng Li, Yang Liu, Ya-Qin Zhang, Yunxin Liu

Next, we discuss several key challenges to achieve intelligent, efficient and secure Personal LLM Agents, followed by a comprehensive survey of representative solutions to address these challenges.

Task Planning

Search Strategies for Self-driving Laboratories with Pending Experiments

no code implementations6 Dec 2023 Hao Wen, Jakob Zeitler, Connor Rupnow

To minimize station downtime and maximize experimental throughput, it is practical to run experiments in asynchronous parallel, in which multiple experiments are being performed at once in different stages.

Bayesian Optimisation

Generative Model for Models: Rapid DNN Customization for Diverse Tasks and Resource Constraints

no code implementations29 Aug 2023 Wenxing Xu, Yuanchun Li, Jiacheng Liu, Yi Sun, Zhengyang Cao, Yixuan Li, Hao Wen, Yunxin Liu

Unlike cloud-based deep learning models that are often large and uniform, edge-deployed models usually demand customization for domain-specific tasks and resource-limited environments.

Image Classification object-detection +1

AutoDroid: LLM-powered Task Automation in Android

1 code implementation29 Aug 2023 Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, Yunxin Liu

Mobile task automation is an attractive technique that aims to enable voice-based hands-free user interaction with smartphones.

Language Modelling

A Novel Ehanced Move Recognition Algorithm Based on Pre-trained Models with Positional Embeddings

no code implementations14 Aug 2023 Hao Wen, Jie Wang, Xiaodong Qiao

The recognition of abstracts is crucial for effectively locating the content and clarifying the article.

Position

The MI-Motion Dataset and Benchmark for 3D Multi-Person Motion Prediction

1 code implementation23 Jun 2023 Xiaogang Peng, Xiao Zhou, Yikai Luo, Hao Wen, Yu Ding, Zizhao Wu

We believe that the proposed MI-Motion benchmark dataset and baseline will facilitate future research in this area, ultimately leading to better understanding and modeling of multi-person interactions.

motion prediction Prediction

Learning Weakly Supervised Audio-Visual Violence Detection in Hyperbolic Space

1 code implementation30 May 2023 Xiaogang Peng, Hao Wen, Yikai Luo, Xiao Zhou, Keyang Yu, Ping Yang, Zizhao Wu

To overcome this, we propose HyperVD, a novel framework that learns snippet embeddings in hyperbolic space to improve model discrimination.

Anomaly Detection In Surveillance Videos

DroidBot-GPT: GPT-powered UI Automation for Android

1 code implementation14 Apr 2023 Hao Wen, Hongming Wang, Jiaxuan Liu, Yuanchun Li

Given a natural language description of a desired task, DroidBot-GPT can automatically generate and execute actions that navigate the app to complete the task.

Navigate

AdaptiveNet: Post-deployment Neural Architecture Adaptation for Diverse Edge Environments

no code implementations13 Mar 2023 Hao Wen, Yuanchun Li, Zunshuai Zhang, Shiqi Jiang, Xiaozhou Ye, Ye Ouyang, Ya-Qin Zhang, Yunxin Liu

Model elastification generates a high-quality search space of model architectures with the guidance of a developer-specified oracle model.

valid

Searching for Effective Neural Network Architectures for Heart Murmur Detection from Phonocardiogram

1 code implementation6 Mar 2023 Hao Wen, Jingsu Kang

Aim: The George B. Moody PhysioNet Challenge 2022 raised problems of heart murmur detection and related abnormal cardiac function identification from phonocardiograms (PCGs).

Model Selection Multi-Task Learning

Crowd3D: Towards Hundreds of People Reconstruction from a Single Image

no code implementations CVPR 2023 Hao Wen, Jing Huang, Huili Cui, Haozhe Lin, Yukun Lai, Lu Fang, Kun Li

However, existing methods cannot deal with large scenes containing hundreds of people, which encounter the challenges of large number of people, large variations in human scale, and complex spatial distribution.

Differentiable Particle Filters through Conditional Normalizing Flow

1 code implementation1 Jul 2021 Xiongjie Chen, Hao Wen, Yunpeng Li

Differentiable particle filters provide a flexible mechanism to adaptively train dynamic and measurement models by learning from observed data.

Visual Tracking

End-To-End Semi-supervised Learning for Differentiable Particle Filters

1 code implementation11 Nov 2020 Hao Wen, Xiongjie Chen, Georgios Papagiannis, Conghui Hu, Yunpeng Li

Recent advances in incorporating neural networks into particle filters provide the desired flexibility to apply particle filters in large-scale real-world applications.

A fuzzy control scheme for deployment of space tethered system with tension constraint

no code implementations Aerospace Science and Technology 2020 Shidong Xu ∗, Hao Wen, Zheng Huang, Dongping Jin

In the meantime, the asymmetric constraint on tether tension is handled in controller design and stability analysis such that tether tension can be kept within a prescribed positive range during the deployment.

Cannot find the paper you are looking for? You can Submit a new open access paper.