no code implementations • 17 Dec 2024 • Mazeyu Ji, Xuanbin Peng, Fangchen Liu, Jialong Li, Ge Yang, Xuxin Cheng, Xiaolong Wang
This paper enables real-world humanoid robots to maintain stability while performing expressive motions like humans do.
1 code implementation • 12 Dec 2024 • Hongxiang Li, Yaowei Li, Yuhang Yang, Junjie Cao, Zhihong Zhu, Xuxin Cheng, Long Chen
Specifically, we generate a dense motion field from a sparse motion field and the reference image, which provides region-level dense guidance while maintaining the generalization of the sparse pose control.
no code implementations • 10 Dec 2024 • Chenhao Lu, Xuxin Cheng, Jialong Li, Shiqi Yang, Mazeyu Ji, Chengjing Yuan, Ge Yang, Sha Yi, Xiaolong Wang
The locomotion policy is trained conditioned on this upper-body motion representation, ensuring that the system remains robust with both manipulation and locomotion.
no code implementations • 30 Sep 2024 • Qi Wu, Zipeng Fu, Xuxin Cheng, Xiaolong Wang, Chelsea Finn
We present a system for quadrupedal mobile manipulation in indoor environments.
no code implementations • 16 Sep 2024 • Yifei Xin, Xuxin Cheng, Zhihong Zhu, Xusheng Yang, Yuexian Zou
To this end, we present a diffusion-based ATR framework (DiffATR), which models ATR as an iterative procedure that progressively generates joint distribution from noise.
no code implementations • 21 Aug 2024 • Shiqi Yang, Minghuan Liu, Yuzhe Qin, Runyu Ding, Jialong Li, Xuxin Cheng, Ruihan Yang, Sha Yi, Xiaolong Wang
Compared to previous systems, which often require hardware customization according to different robots, our single system can generalize to humanoid hands, arm-hands, arm-gripper, and quadruped-gripper systems with high-precision teleoperation.
no code implementations • 18 Aug 2024 • Ziyu Yao, Xuxin Cheng, Zhiqi Huang
Therefore, we propose a Facial Decoupled Diffusion model for Talking head generation called FD2Talk, which fully leverages the advantages of diffusion models and decouples the complex facial details through multi-stages.
1 code implementation • 24 Jul 2024 • Siwei Wu, Kang Zhu, Yu Bai, Yiming Liang, Yizhi Li, HaoNing Wu, J. H. Liu, Ruibo Liu, Xingwei Qu, Xuxin Cheng, Ge Zhang, Wenhao Huang, Chenghua Lin
Moreover, we explored the ability of LVLMs to perceive image sequences within the context of our multi-image association task.
no code implementations • 1 Jul 2024 • Jingheng Ye, Zishan Xu, Yinghui Li, Xuxin Cheng, Linlin Song, Qingyu Zhou, Hai-Tao Zheng, Ying Shen, Xin Su
The paper focuses on improving the interpretability of Grammatical Error Correction (GEC) metrics, which receives little attention in previous studies.
no code implementations • 1 Jul 2024 • Xuxin Cheng, Jialong Li, Shiqi Yang, Ge Yang, Xiaolong Wang
Teleoperation serves as a powerful method for collecting on-robot data essential for robot learning from demonstrations.
no code implementations • 1 Jul 2024 • Jingheng Ye, Shang Qin, Yinghui Li, Xuxin Cheng, Libo Qin, Hai-Tao Zheng, Peng Xing, Zishan Xu, Guo Cheng, Zhao Wei
Existing studies explore the explainability of Grammatical Error Correction (GEC) in a limited scenario, where they ignore the interaction between corrections and explanations.
no code implementations • 31 May 2024 • Xuxin Cheng, Wanshi Xu, Zhihong Zhu, Hongxiang Li, Yuexian Zou
SLU usually consists of two subtasks, including intent detection and slot filling.
no code implementations • 30 May 2024 • Xuan Wu, Hongxiang Li, Yuanjiang Luo, Xuxin Cheng, Xianwei Zhuang, Meng Cao, Keren Fu
Sign language video retrieval plays a key role in facilitating information access for the deaf community.
no code implementations • 17 May 2024 • Cheng Niu, Xingguang Wang, Xuxin Cheng, Juntong Song, Tong Zhang
Then a two-stage fine-tuning on LLaMA 2 is performed on the generated data and the real data for the DST prediction.
no code implementations • 25 Mar 2024 • Minghuan Liu, Zixuan Chen, Xuxin Cheng, Yandong Ji, Ri-Zhao Qiu, Ruihan Yang, Xiaolong Wang
We propose a framework that can conduct the whole-body control autonomously with visual observations.
1 code implementation • 27 Feb 2024 • Bowen Cao, Deng Cai, Leyang Cui, Xuxin Cheng, Wei Bi, Yuexian Zou, Shuming Shi
To address this, we propose to initialize the training oracles using linguistic heuristics and, more importantly, bootstrap the oracles through iterative self-reinforcement.
no code implementations • 26 Feb 2024 • Xuxin Cheng, Yandong Ji, Junming Chen, Ruihan Yang, Ge Yang, Xiaolong Wang
Can we enable humanoid robots to generate rich, diverse, and expressive motions in the real world?
1 code implementation • 30 Jan 2024 • Bang Yang, Yong Dai, Xuxin Cheng, Yaowei Li, Asif Raza, Yuexian Zou
To alleviate CF raised by covariate shift and lexical overlap, we further propose a novel approach that ensures the identical distribution of all token embeddings during initialization and regularizes token embedding learning during training.
no code implementations • 19 Nov 2023 • Xuxin Cheng, Bowen Cao, Qichen Ye, Zhihong Zhu, Hongxiang Li, Yuexian Zou
Specifically, in fine-tuning, we apply mutual learning and train two SLU models on the manual transcripts and the ASR transcripts, respectively, aiming to iteratively share knowledge between these two models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
1 code implementation • 13 Oct 2023 • Qichen Ye, Junling Liu, Dading Chong, Peilin Zhou, Yining Hua, Fenglin Liu, Meng Cao, ZiMing Wang, Xuxin Cheng, Zhu Lei, Zhenhua Guo
In the CPT and SFT phases, Qilin-Med achieved 38. 4% and 40. 0% accuracy on the CMExam test set, respectively.
no code implementations • 25 Sep 2023 • Xuxin Cheng, Kexin Shi, Ananye Agarwal, Deepak Pathak
In this paper, we take a similar approach to developing robot parkour on a small low-cost robot with imprecise actuation and a single front-facing depth camera for perception which is low-frequency, jittery, and prone to artifacts.
no code implementations • ICCV 2023 • Hongxiang Li, Meng Cao, Xuxin Cheng, Yaowei Li, Zhihong Zhu, Yuexian Zou
Due to two annoying issues in video grounding: (1) the co-existence of some visual entities in both ground truth and other moments, \ie semantic overlapping; (2) only a few moments in the video are annotated, \ie sparse annotation dilemma, vanilla contrastive learning is unable to model the correlations between temporally distant moments and learned inconsistent video representations.
no code implementations • 5 Jun 2023 • Qianqian Dong, Zhiying Huang, Qiao Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yuping Wang, Mingxuan Wang, Yuxuan Wang
For the speech synthesis part, we adopt the existing VALL-E X approach and build a unit-based audio language model.
no code implementations • ICCV 2023 • Yaowei Li, Bang Yang, Xuxin Cheng, Zhihong Zhu, Hongxiang Li, Yuexian Zou
Automatic radiology report generation has attracted enormous research interest due to its practical value in reducing the workload of radiologists.
no code implementations • 20 Mar 2023 • Xuxin Cheng, Ashish Kumar, Deepak Pathak
Locomotion has seen dramatic progress for walking or running across challenging terrains.
1 code implementation • 15 Mar 2023 • Ziyu Yao, Xuxin Cheng, Yuexian Zou
Moreover, we introduce a pose-level method, PoseRAC, which is based on this representation and achieves state-of-the-art performance on two new version datasets by using Pose Saliency Annotation to annotate salient poses for training.
Ranked #2 on
Repetitive Action Counting
on RepCount
(using extra training data)
no code implementations • 15 Jan 2023 • Hongxiang Li, Meng Cao, Xuxin Cheng, Zhihong Zhu, Yaowei Li, Yuexian Zou
Video grounding aims to locate a moment of interest matching the given query sentence from an untrimmed video.
no code implementations • 7 Dec 2022 • Xuxin Cheng, Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Yuexian Zou
How to solve the data scarcity problem for end-to-end speech-to-text translation (ST)?
no code implementations • 8 Nov 2022 • Zhihong Zhu, Weiyuan Xu, Xuxin Cheng, Tengtao Song, Yuexian Zou
Multi-intent detection and slot filling joint models are gaining increasing traction since they are closer to complicated real-world scenarios.
Ranked #1 on
Intent Detection
on MixATIS
no code implementations • 18 Oct 2022 • Zipeng Fu, Xuxin Cheng, Deepak Pathak
The standard hierarchical control pipeline for such legged manipulators is to decouple the controller into that of manipulation and locomotion.
no code implementations • 26 Mar 2021 • Zhongyu Li, Xuxin Cheng, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath
Developing robust walking controllers for bipedal robots is a challenging endeavor.
no code implementations • 7 Feb 2020 • Fei Ye, Xuxin Cheng, Pin Wang, Ching-Yao Chan, Jiucai Zhang
The simulation results demonstrate the lane change maneuvers can be efficiently learned and executed in a safe, smooth, and efficient manner.
no code implementations • 23 Apr 2019 • Tianyu Shi, Pin Wang, Xuxin Cheng, Ching-Yao Chan, Ding Huang
We apply Deep Q-network (DQN) with the consideration of safety during the task for deciding whether to conduct the maneuver.