Search Results for author: Xuxin Cheng

Found 33 papers, 6 papers with code

ExBody2: Advanced Expressive Humanoid Whole-Body Control

no code implementations17 Dec 2024 Mazeyu Ji, Xuanbin Peng, Fangchen Liu, Jialong Li, Ge Yang, Xuxin Cheng, Xiaolong Wang

This paper enables real-world humanoid robots to maintain stability while performing expressive motions like humans do.

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

1 code implementation12 Dec 2024 Hongxiang Li, Yaowei Li, Yuhang Yang, Junjie Cao, Zhihong Zhu, Xuxin Cheng, Long Chen

Specifically, we generate a dense motion field from a sparse motion field and the reference image, which provides region-level dense guidance while maintaining the generalization of the sparse pose control.

Image Animation

Mobile-TeleVision: Predictive Motion Priors for Humanoid Whole-Body Control

no code implementations10 Dec 2024 Chenhao Lu, Xuxin Cheng, Jialong Li, Shiqi Yang, Mazeyu Ji, Chengjing Yuan, Ge Yang, Sha Yi, Xiaolong Wang

The locomotion policy is trained conditioned on this upper-body motion representation, ensuring that the system remains robust with both manipulation and locomotion.

motion retargeting Reinforcement Learning (RL)

DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval

no code implementations16 Sep 2024 Yifei Xin, Xuxin Cheng, Zhihong Zhu, Xusheng Yang, Yuexian Zou

To this end, we present a diffusion-based ATR framework (DiffATR), which models ATR as an iterative procedure that progressively generates joint distribution from noise.

AudioCaps Text Retrieval

ACE: A Cross-Platform Visual-Exoskeletons System for Low-Cost Dexterous Teleoperation

no code implementations21 Aug 2024 Shiqi Yang, Minghuan Liu, Yuzhe Qin, Runyu Ding, Jialong Li, Xuxin Cheng, Ruihan Yang, Sha Yi, Xiaolong Wang

Compared to previous systems, which often require hardware customization according to different robots, our single system can generalize to humanoid hands, arm-hands, arm-gripper, and quadruped-gripper systems with high-precision teleoperation.

Imitation Learning

FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model

no code implementations18 Aug 2024 Ziyu Yao, Xuxin Cheng, Zhiqi Huang

Therefore, we propose a Facial Decoupled Diffusion model for Talking head generation called FD2Talk, which fully leverages the advantages of diffusion models and decouples the complex facial details through multi-stages.

Talking Head Generation

CLEME2.0: Towards More Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

no code implementations1 Jul 2024 Jingheng Ye, Zishan Xu, Yinghui Li, Xuxin Cheng, Linlin Song, Qingyu Zhou, Hai-Tao Zheng, Ying Shen, Xin Su

The paper focuses on improving the interpretability of Grammatical Error Correction (GEC) metrics, which receives little attention in previous studies.

Grammatical Error Correction

Open-TeleVision: Teleoperation with Immersive Active Visual Feedback

no code implementations1 Jul 2024 Xuxin Cheng, Jialong Li, Shiqi Yang, Ge Yang, Xiaolong Wang

Teleoperation serves as a powerful method for collecting on-robot data essential for robot learning from demonstrations.

Imitation Learning

EXCGEC: A Benchmark of Edit-wise Explainable Chinese Grammatical Error Correction

no code implementations1 Jul 2024 Jingheng Ye, Shang Qin, Yinghui Li, Xuxin Cheng, Libo Qin, Hai-Tao Zheng, Peng Xing, Zishan Xu, Guo Cheng, Zhao Wei

Existing studies explore the explainability of Grammatical Error Correction (GEC) in a limited scenario, where they ignore the interaction between corrections and explanations.

Grammatical Error Correction

Visual Whole-Body Control for Legged Loco-Manipulation

no code implementations25 Mar 2024 Minghuan Liu, Zixuan Chen, Xuxin Cheng, Yandong Ji, Ri-Zhao Qiu, Ruihan Yang, Xiaolong Wang

We propose a framework that can conduct the whole-body control autonomously with visual observations.

Position

Retrieval is Accurate Generation

1 code implementation27 Feb 2024 Bowen Cao, Deng Cai, Leyang Cui, Xuxin Cheng, Wei Bi, Yuexian Zou, Shuming Shi

To address this, we propose to initialize the training oracles using linguistic heuristics and, more importantly, bootstrap the oracles through iterative self-reinforcement.

Language Modeling Language Modelling +2

Expressive Whole-Body Control for Humanoid Robots

no code implementations26 Feb 2024 Xuxin Cheng, Yandong Ji, Junming Chen, Ruihan Yang, Ge Yang, Xiaolong Wang

Can we enable humanoid robots to generate rich, diverse, and expressive motions in the real world?

Imitation Learning

Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning

1 code implementation30 Jan 2024 Bang Yang, Yong Dai, Xuxin Cheng, Yaowei Li, Asif Raza, Yuexian Zou

To alleviate CF raised by covariate shift and lexical overlap, we further propose a novel approach that ensures the identical distribution of all token embeddings during initialization and regularizes token embedding learning during training.

Diversity Image-text Retrieval +1

ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding

no code implementations19 Nov 2023 Xuxin Cheng, Bowen Cao, Qichen Ye, Zhihong Zhu, Hongxiang Li, Yuexian Zou

Specifically, in fine-tuning, we apply mutual learning and train two SLU models on the manual transcripts and the ASR transcripts, respectively, aiming to iteratively share knowledge between these two models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Extreme Parkour with Legged Robots

no code implementations25 Sep 2023 Xuxin Cheng, Kexin Shi, Ananye Agarwal, Deepak Pathak

In this paper, we take a similar approach to developing robot parkour on a small low-cost robot with imprecise actuation and a single front-facing depth camera for perception which is low-frequency, jittery, and prone to artifacts.

G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory

no code implementations ICCV 2023 Hongxiang Li, Meng Cao, Xuxin Cheng, Yaowei Li, Zhihong Zhu, Yuexian Zou

Due to two annoying issues in video grounding: (1) the co-existence of some visual entities in both ground truth and other moments, \ie semantic overlapping; (2) only a few moments in the video are annotated, \ie sparse annotation dilemma, vanilla contrastive learning is unable to model the correlations between temporally distant moments and learned inconsistent video representations.

Contrastive Learning Video Grounding

Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation

no code implementations ICCV 2023 Yaowei Li, Bang Yang, Xuxin Cheng, Zhihong Zhu, Hongxiang Li, Yuexian Zou

Automatic radiology report generation has attracted enormous research interest due to its practical value in reducing the workload of radiologists.

Sentence Triplet

Legs as Manipulator: Pushing Quadrupedal Agility Beyond Locomotion

no code implementations20 Mar 2023 Xuxin Cheng, Ashish Kumar, Deepak Pathak

Locomotion has seen dramatic progress for walking or running across challenging terrains.

PoseRAC: Pose Saliency Transformer for Repetitive Action Counting

1 code implementation15 Mar 2023 Ziyu Yao, Xuxin Cheng, Yuexian Zou

Moreover, we introduce a pose-level method, PoseRAC, which is based on this representation and achieves state-of-the-art performance on two new version datasets by using Pose Saliency Annotation to annotate salient poses for training.

Ranked #2 on Repetitive Action Counting on RepCount (using extra training data)

Repetitive Action Counting

Exploiting Auxiliary Caption for Video Grounding

no code implementations15 Jan 2023 Hongxiang Li, Meng Cao, Xuxin Cheng, Zhihong Zhu, Yaowei Li, Yuexian Zou

Video grounding aims to locate a moment of interest matching the given query sentence from an untrimmed video.

Contrastive Learning Dense Video Captioning +2

A Dynamic Graph Interactive Framework with Label-Semantic Injection for Spoken Language Understanding

no code implementations8 Nov 2022 Zhihong Zhu, Weiyuan Xu, Xuxin Cheng, Tengtao Song, Yuexian Zou

Multi-intent detection and slot filling joint models are gaining increasing traction since they are closer to complicated real-world scenarios.

Intent Detection Semantic Frame Parsing +3

Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion

no code implementations18 Oct 2022 Zipeng Fu, Xuxin Cheng, Deepak Pathak

The standard hierarchical control pipeline for such legged manipulators is to decouple the controller into that of manipulation and locomotion.

Automated Lane Change Strategy using Proximal Policy Optimization-based Deep Reinforcement Learning

no code implementations7 Feb 2020 Fei Ye, Xuxin Cheng, Pin Wang, Ching-Yao Chan, Jiucai Zhang

The simulation results demonstrate the lane change maneuvers can be efficiently learned and executed in a safe, smooth, and efficient manner.

Autonomous Driving Deep Reinforcement Learning +2

Driving Decision and Control for Autonomous Lane Change based on Deep Reinforcement Learning

no code implementations23 Apr 2019 Tianyu Shi, Pin Wang, Xuxin Cheng, Ching-Yao Chan, Ding Huang

We apply Deep Q-network (DQN) with the consideration of safety during the task for deciding whether to conduct the maneuver.

Autonomous Driving Decision Making +4

Cannot find the paper you are looking for? You can Submit a new open access paper.