2 code implementations • 19 May 2025 • Sand. ai, Hansi Teng, Hongyu Jia, Lei Sun, Lingzhi Li, Maolin Li, Mingqiu Tang, Shuai Han, Tianning Zhang, W. Q. Zhang, Weifeng Luo, Xiaoyang Kang, Yuchen Sun, Yue Cao, Yunpeng Huang, Yutong Lin, Yuxin Fang, Zewei Tao, Zheng Zhang, Zhongshu Wang, Zixun Liu, Dai Shi, Guoli Su, Hanwen Sun, Hong Pan, Jie Wang, Jiexin Sheng, Min Cui, Min Hu, Ming Yan, Shucheng Yin, Siran Zhang, Tingting Liu, Xianping Yin, Xiaoyu Yang, Xin Song, Xuan Hu, Yankai Zhang, Yuqiao Li
We present MAGI-1, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames.
no code implementations • 19 May 2025 • Xiaoyu Yang, Jie Lu, En Yu
This paper uncovers a critical yet overlooked phenomenon in multi-modal large language models (MLLMs): detrimental concept drift within chain-of-thought (CoT) reasoning during non-stationary reinforcement fine-tuning (RFT), where reasoning token distributions evolve unpredictably, thereby introducing significant biases in final predictions.
no code implementations • 19 May 2025 • En Yu, Jie Lu, Xiaoyu Yang, Guangquan Zhang, Zhen Fang
Modern machine learning models struggle to maintain performance in dynamic environments where temporal distribution shifts, \emph{i. e., concept drift}, are prevalent.
no code implementations • 16 May 2025 • Qing Yu, Xiaobei Wang, Shuchang Liu, Yandong Bai, Xiaoyu Yang, Xueliang Wang, Chang Meng, Shanshan Wu, Hailan Yang, Huihui Xiao, Xiang Li, Fan Yang, Xiaoqiang Feng, Lantao Hu, Han Li, Kun Gai, Lixin Zou
Recommender systems filter contents/items valuable to users by inferring preferences from user features and historical behaviors.
no code implementations • 22 Apr 2025 • Hailan Yang, Zhenyu Qi, Shuchang Liu, Xiaoyu Yang, Xiaobei Wang, Xiang Li, Lantao Hu, Han Li, Kun Gai
Reranking models solve the final recommendation lists that best fulfill users' demands.
no code implementations • 10 Apr 2025 • Shanshan Wu, Shuchang Liu, Shuai Zhang, Xiaoyu Yang, Xiang Li, Lantao Hu, Han Li
To improve the prediction accuracy for such an uncertain behavior, existing approaches show that one can either reduce the noise through duration bias modeling or formulate a distribution modeling task to capture the uncertainty.
1 code implementation • 28 Feb 2025 • Haitao Li, Yifan Chen, Yiran Hu, Qingyao Ai, Junjie Chen, Xiaoyu Yang, Jianhui Yang, Yueyue Wu, Zeyang Liu, Yiqun Liu
To fill this gap, we propose LexRAG, the first benchmark to evaluate RAG systems for multi-turn legal consultations.
no code implementations • 11 Feb 2025 • Xiaoyu Yang, Jie Lu, En Yu
A critical emerging challenge is the effective pre-training of models on dynamic data streams characterized by concept drift, unpredictable changes in the underlying data distribution.
no code implementations • 24 Jan 2025 • Guangjin Pan, Yuan Gao, Yilin Gao, Zhiyong Zhong, Xiaoyu Yang, Xinyu Guo, Shugong Xu
Based on the AI/ML-assisted positioning and direct AI/ML positioning schemes outlined in the standards, we conduct an in-depth investigation of related research.
no code implementations • 27 Nov 2024 • Wenyi Yu, Siyin Wang, Xiaoyu Yang, Xianzhao Chen, Xiaohai Tian, Jun Zhang, Guangzhi Sun, Lu Lu, Yuxuan Wang, Chao Zhang
Unlike traditional modularised conversational AI systems, which separate speech recognition, understanding, and text-to-speech generation into distinct components, multimodal LLMs operate as single end-to-end models.
1 code implementation • 26 Nov 2024 • Yifan Yang, Jianheng Zhuo, Zengrui Jin, Ziyang Ma, Xiaoyu Yang, Zengwei Yao, Liyong Guo, Wei Kang, Fangjun Kuang, Long Lin, Daniel Povey, Xie Chen
Self-supervised learning (SSL) has achieved great success in speech-related tasks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 15 Nov 2024 • Xiaoyu Yang, Lijian Xu
This paper proposes a scalable and straightforward pre-training paradigm for efficient visual conceptual representation called masked image contrastive learning (MiCL).
1 code implementation • 7 Oct 2024 • Zengwei Yao, Wei Kang, Xiaoyu Yang, Fangjun Kuang, Liyong Guo, Han Zhu, Zengrui Jin, Zhaoqing Li, Long Lin, Daniel Povey
Connectionist Temporal Classification (CTC) is a widely used method for automatic speech recognition (ASR), renowned for its simplicity and computational efficiency.
Ranked #1 on
Speech Recognition
on GigaSpeech TEST
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 25 Sep 2024 • Xiaoyu Yang, Qiujia Li, Chao Zhang, Phil Woodland
In this work, MT2KD, a novel two-stage multi-task learning framework is proposed to build a general-purpose speech and audio encoder that jointly performs three fundamental tasks: automatic speech recognition (ASR), audio tagging (AT) and speaker verification (SV).
no code implementations • 1 Sep 2024 • Zengrui Jin, Yifan Yang, Mohan Shi, Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Lingwei Meng, Long Lin, Yong Xu, Shi-Xiong Zhang, Daniel Povey
This paper presents a large-scale far-field overlapping speech dataset, crafted to advance research in speech separation, recognition, and speaker diarization.
no code implementations • 7 Jul 2024 • Yangyang Niu, Zhiqing Wei, Dingyou Ma, Xiaoyu Yang, Huici Wu, Zhiyong Feng, Jianhua Yuan
The integrated sensing and communication (ISAC) system under multi-input multi-output (MIMO) architecture achieves dual functionalities of sensing and communication on the same platform by utilizing spatial gain, which provides a feasible paradigm facing spectrum congestion.
1 code implementation • 3 Jun 2024 • Quandong Wang, Yuxuan Yuan, Xiaoyu Yang, Ruike Zhang, Kang Zhao, Wei Liu, Jian Luan, Daniel Povey, Bin Wang
In inference, it boosts speeds by up to 37% and reduces memory by 1GB per GPU.
no code implementations • 22 May 2024 • Xiaoyu Yang, Jie Lu, En Yu
This mainly includes gradual drift due to long-tailed data and sudden drift from Out-Of-Distribution (OOD) data, both of which have increasingly drawn the attention of the research community.
no code implementations • 21 Nov 2023 • Xiaoyu Yang, Lijian Xu, Hao Sun, Hongsheng Li, Shaoting Zhang
Furthermore, we contribute a VG dataset, especially with multi-tasks.
1 code implementation • 17 Oct 2023 • Zengwei Yao, Liyong Guo, Xiaoyu Yang, Wei Kang, Fangjun Kuang, Yifan Yang, Zengrui Jin, Long Lin, Daniel Povey
The Conformer has become the most popular encoder model for automatic speech recognition (ASR).
Ranked #3 on
Speech Recognition
on WenetSpeech
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
2 code implementations • 15 Sep 2023 • Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Yifan Yang, Liyong Guo, Long Lin, Daniel Povey
In this paper, we introduce Libriheavy, a large-scale ASR corpus consisting of 50, 000 hours of read English speech derived from LibriVox.
2 code implementations • 14 Sep 2023 • Xiaoyu Yang, Wei Kang, Zengwei Yao, Yifan Yang, Liyong Guo, Fangjun Kuang, Long Lin, Daniel Povey
An additional style prompt can be given to the text encoder and guide the ASR system to output different styles of transcriptions.
no code implementations • 13 Aug 2023 • Zhiqing Wei, Ruizhong Xu, Zhiyong Feng, Huici Wu, Ning Zhang, Wangjun Jiang, Xiaoyu Yang
This work may provide a guideline for the design of multi-BS cooperative sensing system to exploit the widely deployed networked mobile communication system.
1 code implementation • 19 May 2023 • Zengwei Yao, Wei Kang, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Yifan Yang, Long Lin, Daniel Povey
Our work is open-sourced and publicly available https://github. com/k2-fsa/k2.
1 code implementation • 19 May 2023 • Yifan Yang, Xiaoyu Yang, Liyong Guo, Zengwei Yao, Wei Kang, Fangjun Kuang, Long Lin, Xie Chen, Daniel Povey
Neural Transducer and connectionist temporal classification (CTC) are popular end-to-end automatic speech recognition systems.
no code implementations • 7 May 2023 • Xiaoyu Yang, Lijian Xu, Simon Yu, Qing Xia, Hongsheng Li, Shaoting Zhang
3) A dataset named CCA-200 is collected, consisting of 200 CCTA images with coronary artery disease.
no code implementations • 20 Mar 2023 • Xiaoyu Yang, Qiujia Li, Chao Zhang, Philip C. Woodland
The performance of the student model can be further enhanced when multiple teachers are used jointly, achieving word error rate reductions (WERRs) of 17. 5% and 10. 6%.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 11 Nov 2022 • Zhiqing Wei, Xinyi Yang, Chunwei Meng, Xiaoyu Yang, Kaifeng Han, Chen Qiu, Huici Wu
This paper proves the efficiency of IRS enabled ISAC system, which motivates the implementation of IRS to enhance the sensing capability in ISAC system.
1 code implementation • 31 Oct 2022 • Wei Kang, Liyong Guo, Fangjun Kuang, Long Lin, Mingshuang Luo, Zengwei Yao, Xiaoyu Yang, Piotr Żelasko, Daniel Povey
In this work, we introduce a constrained version of transducer loss to learn strictly monotonic alignments between the sequences; we also improve the standard greedy search and beam search algorithms by limiting the number of symbols that can be emitted per time step in transducer decoding, making it more efficient to decode in parallel with batches.
1 code implementation • 31 Oct 2022 • Liyong Guo, Xiaoyu Yang, Quandong Wang, Yuxiang Kong, Zengwei Yao, Fan Cui, Fangjun Kuang, Wei Kang, Long Lin, Mingshuang Luo, Piotr Zelasko, Daniel Povey
Although on-the-fly teacher label generation tackles this issue, the training speed is significantly slower as the teacher model has to be evaluated every batch.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • 31 Oct 2022 • Wei Kang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Long Lin, Piotr Żelasko, Daniel Povey
In streaming automatic speech recognition (ASR), it is desirable to reduce latency as much as possible while having minimum impact on recognition accuracy.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 9 Mar 2022 • Yufei Feng, Xiaoyu Yang, Xiaodan Zhu, Michael Greenspan
We introduce a neuro-symbolic natural logic framework based on reinforcement learning with introspective revision.
1 code implementation • 15 Jan 2022 • Siming Zheng, Xiaoyu Yang, Xin Yuan
We consider the reconstruction problem of video compressive sensing (VCS) under the deep unfolding/rolling structure.
no code implementations • 15 Jan 2022 • Yuefei Lyu, Xiaoyu Yang, Jiaxin Liu, Philip S. Yu, Sihong Xie, Xi Zhang
To discover subtle vulnerabilities, we design a powerful attacking algorithm to camouflage rumors in social networks based on reinforcement learning that can interact with and attack any black-box detectors.
no code implementations • 7 Oct 2021 • Xiaoyu Yang, Qiujia Li, Philip C. Woodland
Self-supervised pre-training is an effective approach to leveraging a large amount of unlabelled data to reduce word error rates (WERs) of automatic speech recognition (ASR) systems.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • Findings (EMNLP) 2021 • Xiaoyu Yang, Xiaodan Zhu
Fact verification based on structured data is challenging as it requires models to understand both natural language and symbolic operations performed over tables.
no code implementations • 8 Sep 2021 • Xiaoyu Yang, Xiaodan Zhu, Zhan Shi, Tianda Li
There have been two lines of approaches that can be used to further address the limitation: (1) unsupervised pretraining can leverage knowledge in much larger unstructured text data; (2) structured (often human-curated) knowledge has started to be considered in neural-network-based models for NLI.
1 code implementation • SEMEVAL 2021 • Boyuan Zheng, Xiaoyu Yang, Yu-Ping Ruan, ZhenHua Ling, Quan Liu, Si Wei, Xiaodan Zhu
Given a passage and the corresponding question, a participating system is expected to choose the correct answer from five candidates of abstract concepts in a cloze-style machine reading comprehension setup.
no code implementations • 27 May 2021 • Mingdong Fan, Robert W. Brown, Xi Gao, Soumyajit Mandal, Labros Petropoulos, Xiaoyu Yang, Shinya Handa, Hiroyuki Fujita
Non-conductive transmission solutions based on fiber-optic cables are considered to be one of the alternatives, but are limited by the high dynamic range ($>80$~dB) of typical MRI signals.
no code implementations • 19 Mar 2021 • Xiaoyu Yang, Huiyun Li
In dense and dynamic scenarios, planning a safe and comfortable trajectory is full of challenges when traffic participants are driving at high speed.
1 code implementation • 22 Dec 2020 • Chao-Hong Tan, Xiaoyu Yang, Zi'ou Zheng, Tianda Li, Yufei Feng, Jia-Chen Gu, Quan Liu, Dan Liu, Zhen-Hua Ling, Xiaodan Zhu
Task-oriented conversational modeling with unstructured knowledge access, as track 1 of the 9th Dialogue System Technology Challenges (DSTC 9), requests to build a system to generate response given dialogue history and knowledge access.
1 code implementation • EMNLP 2020 • Xiaoyu Yang, Feng Nie, Yufei Feng, Quan Liu, Zhigang Chen, Xiaodan Zhu
Built on that, we construct the graph attention verification networks, which are designed to fuse different sources of evidences from verbalized program execution, program structures, and the original statements and tables, to make the final verification decision.
1 code implementation • SEMEVAL 2020 • Xiaoyu Yang, Stephen Obadinma, Huasha Zhao, Qiong Zhang, Stan Matwin, Xiaodan Zhu
Subtask-1 aims to determine whether a given sentence is a counterfactual statement or not.
no code implementations • 16 Jul 2020 • Weijun Zhu, Fengyuan LU, Xiaoyu Yang, En Li
How to accurately classify and diagnose whether an individual has Coronary Stenosis (CS) without invasive physical examination?
no code implementations • 24 Feb 2020 • Ye Li, Guangqiang Yin, Chunhui Liu, Xiaoyu Yang, Zhiguo Wang
Triplet loss processes batch construction in a complicated and fussy way and converges slowly.