no code implementations • 6 Jun 2025 • Mu Yang, Szu-Jui Chen, Jiamin Xie, John Hansen
One challenge of integrating speech input with large language models (LLMs) stems from the discrepancy between the continuous nature of audio data and the discrete token-based paradigm of LLMs.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 29 May 2025 • Jianbo Zhao, Taiyu Ban, Xiyang Wang, Qibin Zhou, Hangning Zhou, Zhihao Liu, Mu Yang, Lei Liu, Bin Li
Controllable trajectory generation guided by high-level semantic decisions, termed meta-actions, is crucial for autonomous driving systems.
no code implementations • 19 Mar 2025 • Jianbo Zhao, Taiyu Ban, Zhihao Liu, Hangning Zhou, Xiyang Wang, Qibin Zhou, Hailong Qin, Mu Yang, Lei Liu, Bin Li
We theoretically analyze DRoPE's correctness and efficiency, demonstrating its capability to simultaneously optimize trajectory generation accuracy, time complexity, and space complexity.
no code implementations • CVPR 2025 • Bohan Li, Jiazhe Guo, Hongsi Liu, Yingshuang Zou, Yikang Ding, Xiwu Chen, Hu Zhu, Feiyang Tan, Chi Zhang, Tiancai Wang, Shuchang Zhou, Li Zhang, Xiaojuan Qi, Hao Zhao, Mu Yang, Wenjun Zeng, Xin Jin
UniScene employs a progressive generation process that decomposes the complex task of scene generation into two hierarchical steps: (a) first generating semantic occupancy from a customized scene layout as a meta scene representation rich in both semantic and geometric information, and then (b) conditioned on occupancy, generating video and LiDAR data, respectively, with two novel transfer strategies of Gaussian-based Joint Rendering and Prior-guided Sparse Modeling.
no code implementations • 7 Nov 2024 • Mu Yang, Bowen Shi, Matthew Le, Wei-Ning Hsu, Andros Tjandra
This work focuses on improving Text-To-Audio (TTA) generation on zero-shot and few-shot settings (i. e. generating unseen or uncommon audio events).
1 code implementation • 23 Sep 2024 • Xiyang Wang, Shouzheng Qi, Jieyou Zhao, Hangning Zhou, Siyu Zhang, Guoan Wang, Kai Tu, Songlin Guo, Jianbo Zhao, Jian Li, Mu Yang
This paper introduces MCTrack, a new 3D multi-object tracking method that achieves state-of-the-art (SOTA) performance across KITTI, nuScenes, and Waymo datasets.
1 code implementation • 14 Sep 2023 • Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka
End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion.
no code implementations • 10 Jun 2023 • Mu Yang, Ram C. M. C. Shekar, Okim Kang, John H. L. Hansen
This study is focused on understanding and quantifying the change in phoneme and prosody information encoded in the Self-Supervised Learning (SSL) model, brought by an accent identification (AID) fine-tuning task.
no code implementations • 13 Sep 2022 • Mu Yang, Andros Tjandra, Chunxi Liu, David Zhang, Duc Le, Ozlem Kalinli
Neural network pruning compresses automatic speech recognition (ASR) models effectively.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
2 code implementations • 29 Mar 2022 • Mu Yang, Kevin Hirschi, Stephen D. Looney, Okim Kang, John H. L. Hansen
We show that fine-tuning with pseudo labels achieves a 5. 35% phoneme error rate reduction and 2. 48% MDD F1 score improvement over a labeled-samples-only fine-tuning baseline.
1 code implementation • 31 Dec 2021 • Chin-Tung Lin, Mu Yang
We conduct experiments with human evaluation on VMT, SeqSeq model (our baseline), and the original piano version soundtrack.
1 code implementation • 9 Oct 2021 • Mu Yang, Shaojin Ding, Tianlong Chen, Tong Wang, Zhangyang Wang
This work presents a lifelong learning approach to train a multilingual Text-To-Speech (TTS) system, where each language was seen as an individual task and was learned sequentially and continually.
1 code implementation • NAACL 2021 • Mingyu Derek Ma, Jiao Sun, Mu Yang, Kung-Hsiang Huang, Nuan Wen, Shikhar Singh, Rujun Han, Nanyun Peng
We present EventPlus, a temporal event understanding pipeline that integrates various state-of-the-art event understanding components including event trigger and type detection, event argument detection, event duration and temporal relation extraction.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Kung-Hsiang Huang, Mu Yang, Nanyun Peng
To better recognize the trigger words, each sentence is first grounded to a sentence graph based on a jointly modeled hierarchical knowledge graph from UMLS.
Ranked #2 on
Event Extraction
on GENIA
no code implementations • LREC 2020 • Mu Yang, Chi-Yen Chen, Yi-Hui Lee, Qian-hui Zeng, Wei-Yun Ma, Chen-Yang Shih, Wei-Jhih Chen
In this paper, we design headword-oriented entity linking (HEL), a specialized entity linking problem in which only the headwords of the entities are to be linked to knowledge bases; mention scopes of the entities do not need to be identified in the problem setting.
1 code implementation • CONLL 2019 • Rujun Han, I-Hung Hsu, Mu Yang, Aram Galstyan, Ralph Weischedel, Nanyun Peng
We propose a novel deep structured learning framework for event temporal relation extraction.
1 code implementation • 7 Apr 2019 • Prashanth Gurunath Shivakumar, Mu Yang, Panayiotis Georgiou
In this paper, we address the spoken language intent detection under noisy conditions imposed by automatic speech recognition (ASR) systems.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 19 Sep 2018 • Mu Yang, Zheng-Hao Liu, Ze-Di Cheng, Jin-Shi Xu, Chuan-Feng Li, Guang-Can Guo
A well-trained deep neural network is shown to gain capability of simultaneously restoring two kinds of images, which are completely destroyed by two distinct scattering medias respectively.