no code implementations • 17 Jun 2025 • Ziqiao Peng, Wentao Hu, Junyuan Ma, Xiangyu Zhu, Xiaomei Zhang, Hao Zhao, Hui Tian, Jun He, Hongyan Liu, Zhaoxin Fan
A lifelike talking head requires synchronized coordination of subject identity, lip movements, facial expressions, and head poses.
no code implementations • 7 Jun 2025 • Shiying Duan, Pei Ren, Nanxiang Jiang, Zhengping Che, Jian Tang, Yifan Sun, Zhaoxin Fan, Wenjun Wu
Dual-arm robots play a crucial role in improving efficiency and flexibility in complex multitasking scenarios.
no code implementations • 26 May 2025 • Ming Meng, Qi Dong, Jiajie Li, Zhe Zhu, Xingyu Wang, Zhaoxin Fan, Wei Zhao, Wenjun Wu
Virtual try-on technology has become increasingly important in the fashion and retail industries, enabling the generation of high-fidelity garment images that adapt seamlessly to target human models.
no code implementations • CVPR 2025 • Ziqiao Peng, Yanbo Fan, HaoYu Wu, Xuan Wang, Hongyan Liu, Jun He, Zhaoxin Fan
To address this issue, we propose a new task -- multi-round dual-speaker interaction for 3D talking head generation -- which requires models to handle and generate both speaking and listening behaviors in continuous conversation.
no code implementations • 22 May 2025 • Xiaobei Yan, Yiming Li, Zhaoxin Fan, Han Qiu, Tianwei Zhang
Large language models (LLMs) have shown impressive capabilities across a wide range of applications, but their ever-increasing size and resource demands make them vulnerable to inference cost attacks, where attackers induce victim LLMs to generate the longest possible output content.
no code implementations • 21 May 2025 • Tianbao Zhang, Jian Zhao, Yuer Li, Zheng Zhu, Ping Hu, Zhaoxin Fan, Wenjun Wu, Xuelong Li
Whole-body audio-driven avatar pose and expression generation is a critical task for creating lifelike digital humans and enhancing the capabilities of interactive virtual agents, with wide-ranging applications in virtual reality, digital entertainment, and remote communication.
no code implementations • 19 May 2025 • Yuanze Hu, Zhaoxin Fan, Xinyu Wang, Gen Li, Ye Qiu, Zhichao Yang, Wenjun Wu, Kejian Wu, Yifan Sun, Xiaotie Deng, Jin Dong
Our work thus offers a practical pathway for developing more capable lightweight VLMs while introducing a fresh theoretical lens to better understand and address alignment bottlenecks in constrained multimodal systems.
no code implementations • 17 May 2025 • Zhiying Li, GuangGang Geng, Yeying Jin, Zhizhi Guo, Bruce Gu, Jidong Huo, Zhaoxin Fan, Wenjun Wu
These findings underscore the urgent need to address and mitigate security risks associated with digital human generation systems.
no code implementations • 22 Apr 2025 • Kun Wang, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Shicheng Xu, Junyuan Mao, Yu Wang, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Wenjie Qu, Yue Liu, Chengwei Liu, Yifan Zhang, Qiankun Li, Chongye Guo, Yalan Qin, Zhaoxin Fan, Kai Wang, Yi Ding, Donghai Hong, Jiaming Ji, Yingxin Lai, Zitong Yu, Xinfeng Li, Yifan Jiang, Yanhui Li, Xinyu Deng, Junlin Wu, Dongxia Wang, Yihao Huang, Yufei Guo, Jen-tse Huang, Qiufeng Wang, Xiaolong Jin, Wenxuan Wang, Dongrui Liu, Yanwei Yue, Wenke Huang, Guancheng Wan, Heng Chang, Tianlin Li, Yi Yu, Chenghao Li, Jiawei Li, Lei Bai, Jie Zhang, Qing Guo, Jingyi Wang, Tianlong Chen, Joey Tianyi Zhou, Xiaojun Jia, Weisong Sun, Cong Wu, Jing Chen, Xuming Hu, Yiming Li, Xiao Wang, Ningyu Zhang, Luu Anh Tuan, Guowen Xu, Jiaheng Zhang, Tianwei Zhang, Xingjun Ma, Jindong Gu, Liang Pang, Xiang Wang, Bo An, Jun Sun, Mohit Bansal, Shirui Pan, Lingjuan Lyu, Yuval Elovici, Bhavya Kailkhura, Yaodong Yang, Hongwei Li, Wenyuan Xu, Yizhou Sun, Wei Wang, Qing Li, Ke Tang, Yu-Gang Jiang, Felix Juefei-Xu, Hui Xiong, XiaoFeng Wang, DaCheng Tao, Philip S. Yu, Qingsong Wen, Yang Liu
Currently, existing surveys on LLM safety primarily focus on specific stages of the LLM lifecycle, e. g., deployment phase or fine-tuning phase, lacking a comprehensive understanding of the entire "lifechain" of LLMs.
1 code implementation • 28 Mar 2025 • Xiaomin Yu, Pengxiang Ding, Wenjie Zhang, Siteng Huang, Songyang Gao, Chengwei Qin, Kejian Wu, Zhaoxin Fan, Ziyue Qiao, Donglin Wang
By eliminating the dependency on real images while maintaining data quality and diversity, our framework offers a cost-effective and scalable solution for VLMs training.
no code implementations • 27 Mar 2025 • Yongxu Wang, Xu Cao, Weiyun Yi, Zhaoxin Fan
Simultaneous Localization and Mapping (SLAM) is a critical task in robotics, enabling systems to autonomously navigate and understand complex environments.
1 code implementation • 12 Mar 2025 • Jihao Zhao, Zhiyuan Ji, Zhaoxin Fan, Hanyu Wang, Simin Niu, Bo Tang, Feiyu Xiong, Zhiyu Li
Retrieval-Augmented Generation (RAG), while serving as a viable complement to large language models (LLMs), often overlooks the crucial aspect of text chunking within its pipeline.
no code implementations • 9 Mar 2025 • Xukun Zhou, Fengxin Li, Ming Chen, Yan Zhou, Pengfei Wan, Di Zhang, Hongyan Liu, Jun He, Zhaoxin Fan
Audio-driven human gesture synthesis is a crucial task with broad applications in virtual avatars, human-computer interaction, and creative content generation.
no code implementations • 19 Feb 2025 • Feiyuan Zhang, Dezhi Zhu, James Ming, Yilun Jin, Di Chai, Liu Yang, Han Tian, Zhaoxin Fan, Kai Chen
Retrieval-Augmented Generation (RAG) systems have shown substantial benefits in applications such as question answering and multi-turn dialogue \citep{lewis2020retrieval}.
1 code implementation • 15 Feb 2025 • Ming Meng, Ke Mu, Yonggui Zhu, Zhe Zhu, Haoyu Sun, Heyang Yan, Zhaoxin Fan
Generating expressive and diverse human gestures from audio is crucial in fields like human-computer interaction, virtual reality, and animation.
1 code implementation • 26 Jan 2025 • Xingjian Zhang, Xi Weng, Yihao Yue, Zhaoxin Fan, Wenjun Wu, Lei Huang
We present the TinyLLaVA-Video, a video understanding model with parameters not exceeding 4B that processes video sequences in a simple manner, without the need for complex architectures, supporting both fps sampling and uniform frame sampling.
no code implementations • CVPR 2025 • Yifan Wang, Jian Zhao, Zhaoxin Fan, Xin Zhang, Xuecheng Wu, Yudian Zhang, Lei Jin, Xinyue Li, Gang Wang, Mengxi Jia, Ping Hu, Zheng Zhu, Xuelong Li
To benchmark this task, we introduce the TDUAV dataset, the largest dataset for joint UAV tracking and intent understanding, featuring 1, 328 challenging video sequences, over 163K annotated thermal frames, and 3K VQA pairs.
1 code implementation • 29 Dec 2024 • Daiheng Gao, Shilin Lu, Shaw Walters, Wenbo Zhou, Jiaming Chu, Jie Zhang, Bang Zhang, Mengxi Jia, Jian Zhao, Zhaoxin Fan, Weiming Zhang
Removing unwanted concepts from large-scale text-to-image (T2I) diffusion models while maintaining their overall generative quality remains an open challenge.
no code implementations • CVPR 2025 • Shuo Wang, Wanting Li, Yongcai Wang, Zhaoxin Fan, Zhe Huang, Xudong Cai, Jian Zhao, Deying Li
To address this challenge, this paper proposes MambaVO, which conducts robust initialization, Mamba-based sequential matching refinement, and smoothed training to enhance the matching quality and improve the pose estimation in deep visual odometry.
no code implementations • 27 Dec 2024 • Xudong Cai, Yongcai Wang, Zhaoxin Fan, Deng Haoran, Shuo Wang, Wanting Li, Deying Li, Lun Luo, Minhang Wang, Jintao Xu
To refine the 3D model at novel viewpoints, we propose a Confidence Aware Depth Alignment (CADA) module to refine the coarse depth maps by aligning their confident parts with estimated depths by a Mono-depth model.
no code implementations • 12 Dec 2024 • Bofang Jia, Pengxiang Ding, Can Cui, Mingyang Sun, Pengfang Qian, Siteng Huang, Zhaoxin Fan, Donglin Wang
Visual-motor policy learning has advanced with architectures like diffusion-based policies, known for modeling complex robotic trajectories.
no code implementations • 10 Dec 2024 • Wan Jiang, He Wang, Xin Zhang, Dan Guo, Zhaoxin Fan, Yunfeng Diao, Richang Hong
To fill this gap, we first examine the current 'gold standard' in Machine Unlearning (MU), i. e., re-training the model after removing the undesirable training data, and find it does not work in SGMs.
no code implementations • 9 Dec 2024 • Zhefei Gong, Pengxiang Ding, Shangke Lyu, Siteng Huang, Mingyang Sun, Wei Zhao, Zhaoxin Fan, Donglin Wang
In this paper, we introduce Coarse-to-Fine AutoRegressive Policy (CARP), a novel paradigm for visuomotor policy learning that redefines the autoregressive action generation process as a coarse-to-fine, next-scale approach.
no code implementations • 15 Sep 2024 • HaoYu Wu, Ziqiao Peng, Xukun Zhou, Yunfei Cheng, Jun He, Hongyan Liu, Zhaoxin Fan
Specifically, VGG-Tex includes a Facial Attributes Encoding Module, a Geometry-Guided Texture Generator, and a Visibility-Enhanced Texture Completion Module.
no code implementations • 21 Aug 2024 • Yihong Lin, Liang Peng, Zhaoxin Fan, Xianjia Wu, Jianqiao Hu, Xiandong Li, Wenxiong Kang, Songju Lei
EmoFace employs a novel Mesh Attention mechanism to analyse and fuse the emotion features and content features.
no code implementations • 18 Aug 2024 • Xukun Zhou, Fengxin Li, Ziqiao Peng, Kejian Wu, Jun He, Biao Qin, Zhaoxin Fan, Hongyan Liu
Audio-driven 3D face animation is increasingly vital in live streaming and augmented reality applications.
no code implementations • 3 Aug 2024 • Yihong Lin, Zhaoxin Fan, Xianjia Wu, Lingyu Xiong, Liang Peng, Xiandong Li, Wenxiong Kang, Songju Lei, Huang Xu
Speech-driven talking head generation is a critical yet challenging task with applications in augmented reality and virtual human modeling.
no code implementations • 23 Jun 2024 • Jian Yang, Jiakun Li, Guoming Li, Zhen Shen, Huai-Yu Wu, Zhaoxin Fan, Heng Huang
Multi-view hand mesh reconstruction is a critical task for applications in virtual reality and human-computer interaction, but it remains a formidable challenge.
no code implementations • 15 Jun 2024 • Ming Meng, Yufei Zhao, Bo Zhang, Yonggui Zhu, Weimin Shi, Maxwell Wen, Zhaoxin Fan
Talking head synthesis, an advanced method for generating portrait videos from a still image driven by specific content, has garnered widespread attention in virtual reality, augmented reality and game production.
1 code implementation • 5 Apr 2024 • JunHao Chen, Xiang Li, Xiaojun Ye, Chao Li, Zhaoxin Fan, Hao Zhao
Recently, this success has been extended to 3D AIGC, with state-of-the-art methods generating textured 3D models from single images or text.
1 code implementation • 18 Mar 2024 • Mingjin Chen, JunHao Chen, Xiaojun Ye, Huan-ang Gao, Xiaoxue Chen, Zhaoxin Fan, Hao Zhao
In this paper, we propose a new method called \emph{Ultraman} for fast reconstruction of textured 3D human models from a single image.
Ranked #2 on
Lifelike 3D Human Generation
on THuman2.0 Dataset
no code implementations • 11 Mar 2024 • Zhenbo Song, Wenhao Gao, Kaihao Zhang, Wenhan Luo, Zhaoxin Fan, Jianfeng Lu
Extensive experiments demonstrate the efficacy of the degradation objective on state-of-the-art face restoration models.
1 code implementation • 5 Mar 2024 • Zhaoxin Fan, Runmin Jiang, Junhao Wu, Xin Huang, Tianyang Wang, Heng Huang, Min Xu
3D medical image segmentation is a challenging task with crucial implications for disease diagnosis and treatment planning.
no code implementations • 21 Feb 2024 • Zhenbo Song, Zhenyuan Zhang, Kaihao Zhang, Zhaoxin Fan, Jianfeng Lu
This study delves into the enhancement of Under-Display Camera (UDC) image restoration models, focusing on their robustness against adversarial attacks.
1 code implementation • CVPR 2024 • Ziqiao Peng, Wentao Hu, Yue Shi, Xiangyu Zhu, Xiaomei Zhang, Hao Zhao, Jun He, Hongyan Liu, Zhaoxin Fan
A lifelike talking head requires synchronized coordination of subject identity, lip movements, facial expressions, and head poses.
no code implementations • 16 Oct 2023 • Kaixing Yang, Xukun Zhou, Xulong Tang, Ran Diao, Hongyan Liu, Jun He, Zhaoxin Fan
Dance and music are closely related forms of expression, with mutual retrieval between dance videos and music being a fundamental task in various fields like education, art, and sports.
no code implementations • 15 Sep 2023 • Xukun Zhou, Zhenbo Song, Jun He, Hongyan Liu, Zhaoxin Fan
Scene Graph Generation is a critical enabler of environmental comprehension for autonomous robotic systems.
no code implementations • 12 Sep 2023 • Yixing Lu, Zhaoxin Fan, Min Xu
In this paper, we introduce a novel semi-supervised learning framework tailored for medical image segmentation.
1 code implementation • ICCV 2023 • Xueting Yang, Yihao Luo, Yuliang Xiu, Wei Wang, Hao Xu, Zhaoxin Fan
In this paper, we propose replacing the implicit value with an adaptive uncertainty distribution, to differentiate between points based on their distance to the surface.
1 code implementation • 1 Aug 2023 • Zhenyuan Zhang, Zhenbo Song, Kaihao Zhang, Zhaoxin Fan, Jianfeng Lu
To the best of our knowledge, these two datasets are the first largest-scale UHD datasets for SIRR.
no code implementations • 13 Jul 2023 • Zhaoxin Fan, Puquan Pan, Zeren Zhang, Ce Chen, Tianyang Wang, Siyang Zheng, Min Xu
Few-shot medical image semantic segmentation is of paramount importance in the domain of medical image analysis.
1 code implementation • 19 Jun 2023 • Ziqiao Peng, Yihao Luo, Yue Shi, Hao Xu, Xiangyu Zhu, Jun He, Hongyan Liu, Zhaoxin Fan
To enhance the visual accuracy of generated lip movement while reducing the dependence on labeled data, we propose a novel framework SelfTalk, by involving self-supervision in a cross-modals network system to learn 3D talking faces.
2 code implementations • ICCV 2023 • Ziqiao Peng, HaoYu Wu, Zhenbo Song, Hao Xu, Xiangyu Zhu, Jun He, Hongyan Liu, Zhaoxin Fan
Specifically, we introduce the emotion disentangling encoder (EDE) to disentangle the emotion and content in the speech by cross-reconstructed speech signals with different emotion labels.
1 code implementation • CVPR 2023 • Zhenbo Song, Zhenyuan Zhang, Kaihao Zhang, Wenhan Luo, Zhaoxin Fan, Wenqi Ren, Jianfeng Lu
This paper addresses the problem of robust deep single-image reflection removal (SIRR) against adversarial attacks.
Ranked #4 on
Reflection Removal
on Real20
1 code implementation • 22 Dec 2022 • Zhaoxin Fan, Kaixing Yang, Min Zhang, Zhenbo Song, Hongyan Liu, Jun He
In stage 1, a novel devices detection and tracking scheme is introduced, which accurately locate the height limit devices in the left or right image.
1 code implementation • 30 Nov 2022 • Zhaoxin Fan, Yuqing Pan, Hao Xu, Zhenbo Song, Zhicheng Wang, Kejian Wu, Hongyan Liu, Jun He
These novel elements of FuRPE not only serve to further refine the model but also to reduce potential biases that may arise from inaccuracies in pseudo labels, thereby optimizing the network's training process and enhancing the robustness of the model.
no code implementations • 23 Sep 2022 • Zhaoxin Fan, Zhenbo Song, Hongyan Liu, Jun He
Large-scale place recognition is a fundamental but challenging task, which plays an increasingly important role in autonomous driving and robotics.
no code implementations • 17 Sep 2022 • Zhaoxin Fan, Fengxin Li, Hongyan Liu, Jun He, Xiaoyong Du
In this paper, we research the new topic of object effects recommendation in micro-video platforms, which is a challenging but important task for many practical applications such as advertisement insertion.
1 code implementation • 19 Aug 2022 • Han Sun, Zhaoxin Fan, Zhenbo Song, Zhicheng Wang, Kejian Wu, Jianfeng Lu
The insight behind introducing MonoSIM is that we propose to simulate the feature learning behaviors of a point cloud based detector for monocular detector during the training period.
no code implementations • 20 Apr 2022 • Zhaoxin Fan, Yulin He, Zhicheng Wang, Kejian Wu, Hongyan Liu, Jun He
Real-world sensors often produce incomplete, irregular, and noisy point clouds, making point cloud completion increasingly important.
no code implementations • 4 Apr 2022 • Zhaoxin Fan, Zhenbo Song, Jian Xu, Zhicheng Wang, Kejian Wu, Hongyan Liu, Jun He
Recently, RGBD-based category-level 6D object pose estimation has achieved promising improvement in performance, however, the requirement of depth information prohibits broader applications.
no code implementations • 20 Nov 2021 • Zhaoxin Fan, Zhengbo Song, Jian Xu, Zhicheng Wang, Kejian Wu, Hongyan Liu, Jun He
ACR-Pose consists of a Reconstructor and a Discriminator.
no code implementations • 29 Aug 2021 • Zhaoxin Fan, Zhenbo Song, Wenping Zhang, Hongyan Liu, Jun He, Xiaoyong Du
Third, we apply these kernels to previous point cloud features to generate new features, which is the well-known SO(3) mapping process.
no code implementations • 29 May 2021 • Zhaoxin Fan, Yazhi Zhu, Yulin He, Qi Sun, Hongyan Liu, Jun He
Therefore, this study presents a comprehensive review of recent progress in object pose detection and tracking that belongs to the deep learning technical route.
no code implementations • 1 May 2021 • Zhaoxin Fan, Zhenbo Song, Hongyan Liu, Zhiwu Lu, Jun He, Xiaoyong Du
Point cloud-based large scale place recognition is fundamental for many applications like Simultaneous Localization and Mapping (SLAM).
Ranked #2 on
3D Place Recognition
on Oxford RobotCar Dataset