Search Results for author: Xiaoyu Yang

Found 45 papers, 20 papers with code

Walking the Tightrope: Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning

no code implementations19 May 2025 Xiaoyu Yang, Jie Lu, En Yu

This paper uncovers a critical yet overlooked phenomenon in multi-modal large language models (MLLMs): detrimental concept drift within chain-of-thought (CoT) reasoning during non-stationary reinforcement fine-tuning (RFT), where reasoning token distributions evolve unpredictably, thereby introducing significant biases in final predictions.

counterfactual Counterfactual Reasoning

Learning Robust Spectral Dynamics for Temporal Domain Generalization

no code implementations19 May 2025 En Yu, Jie Lu, Xiaoyu Yang, Guangquan Zhang, Zhen Fang

Modern machine learning models struggle to maintain performance in dynamic environments where temporal distribution shifts, \emph{i. e., concept drift}, are prevalent.

Domain Generalization

Explicit Uncertainty Modeling for Video Watch Time Prediction

no code implementations10 Apr 2025 Shanshan Wu, Shuchang Liu, Shuai Zhang, Xiaoyu Yang, Xiang Li, Lantao Hu, Han Li

To improve the prediction accuracy for such an uncertain behavior, existing approaches show that one can either reduce the noise through duration bias modeling or formulate a distribution modeling task to capture the uncertainty.

Prediction

Rolling with the Punches: Resilient Contrastive Pre-training under Non-Stationary Drift

no code implementations11 Feb 2025 Xiaoyu Yang, Jie Lu, En Yu

A critical emerging challenge is the effective pre-training of models on dynamic data streams characterized by concept drift, unpredictable changes in the underlying data distribution.

Causal Inference Contrastive Learning

AI-driven Wireless Positioning: Fundamentals, Standards, State-of-the-art, and Challenges

no code implementations24 Jan 2025 Guangjin Pan, Yuan Gao, Yilin Gao, Zhiyong Zhong, Xiaoyu Yang, Xinyu Guo, Shugong Xu

Based on the AI/ML-assisted positioning and direct AI/ML positioning schemes outlined in the standards, we conduct an in-depth investigation of related research.

Autonomous Driving

SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation

no code implementations27 Nov 2024 Wenyi Yu, Siyin Wang, Xiaoyu Yang, Xianzhao Chen, Xiaohai Tian, Jun Zhang, Guangzhi Sun, Lu Lu, Yuxuan Wang, Chao Zhang

Unlike traditional modularised conversational AI systems, which separate speech recognition, understanding, and text-to-speech generation into distinct components, multimodal LLMs operate as single end-to-end models.

Question Answering Speech Enhancement +4

Masked Image Contrastive Learning for Efficient Visual Conceptual Pre-training

no code implementations15 Nov 2024 Xiaoyu Yang, Lijian Xu

This paper proposes a scalable and straightforward pre-training paradigm for efficient visual conceptual representation called masked image contrastive learning (MiCL).

Contrastive Learning Image Reconstruction

CR-CTC: Consistency regularization on CTC for improved speech recognition

1 code implementation7 Oct 2024 Zengwei Yao, Wei Kang, Xiaoyu Yang, Fangjun Kuang, Liyong Guo, Han Zhu, Zengrui Jin, Zhaoqing Li, Long Lin, Daniel Povey

Connectionist Temporal Classification (CTC) is a widely used method for automatic speech recognition (ASR), renowned for its simplicity and computational efficiency.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events

no code implementations25 Sep 2024 Xiaoyu Yang, Qiujia Li, Chao Zhang, Phil Woodland

In this work, MT2KD, a novel two-stage multi-task learning framework is proposed to build a general-purpose speech and audio encoder that jointly performs three fundamental tasks: automatic speech recognition (ASR), audio tagging (AT) and speaker verification (SV).

Audio Tagging Automatic Speech Recognition +5

Interference Management in MIMO-ISAC Systems: A Transceiver Design Approach

no code implementations7 Jul 2024 Yangyang Niu, Zhiqing Wei, Dingyou Ma, Xiaoyu Yang, Huici Wu, Zhiyong Feng, Jianhua Yuan

The integrated sensing and communication (ISAC) system under multi-input multi-output (MIMO) architecture achieves dual functionalities of sensing and communication on the same platform by utilizing spatial gain, which provides a feasible paradigm facing spectrum congestion.

Integrated sensing and communication ISAC +1

Adapting Multi-modal Large Language Model to Concept Drift From Pre-training Onwards

no code implementations22 May 2024 Xiaoyu Yang, Jie Lu, En Yu

This mainly includes gradual drift due to long-tailed data and sudden drift from Out-Of-Distribution (OOD) data, both of which have increasingly drawn the attention of the research community.

Language Modeling Language Modelling +1

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

2 code implementations15 Sep 2023 Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Yifan Yang, Liyong Guo, Long Lin, Daniel Povey

In this paper, we introduce Libriheavy, a large-scale ASR corpus consisting of 50, 000 hours of read English speech derived from LibriVox.

PromptASR for contextualized ASR with controllable style

2 code implementations14 Sep 2023 Xiaoyu Yang, Wei Kang, Zengwei Yao, Yifan Yang, Liyong Guo, Fangjun Kuang, Long Lin, Daniel Povey

An additional style prompt can be given to the text encoder and guide the ASR system to output different styles of transcriptions.

Automatic Speech Recognition speech-recognition +1

Symbol-level Integrated Sensing and Communication enabled Multiple Base Stations Cooperative Sensing

no code implementations13 Aug 2023 Zhiqing Wei, Ruizhong Xu, Zhiyong Feng, Huici Wu, Ning Zhang, Wangjun Jiang, Xiaoyu Yang

This work may provide a guideline for the design of multi-BS cooperative sensing system to exploit the widely deployed networked mobile communication system.

Integrated sensing and communication ISAC

Blank-regularized CTC for Frame Skipping in Neural Transducer

1 code implementation19 May 2023 Yifan Yang, Xiaoyu Yang, Liyong Guo, Zengwei Yao, Wei Kang, Fangjun Kuang, Long Lin, Xie Chen, Daniel Povey

Neural Transducer and connectionist temporal classification (CTC) are popular end-to-end automatic speech recognition systems.

Automatic Speech Recognition speech-recognition +1

Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition

no code implementations20 Mar 2023 Xiaoyu Yang, Qiujia Li, Chao Zhang, Philip C. Woodland

The performance of the student model can be further enhanced when multiple teachers are used jointly, achieving word error rate reductions (WERRs) of 17. 5% and 10. 6%.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Intelligent Reflecting Surface assisted Integrated Sensing and Communication System

no code implementations11 Nov 2022 Zhiqing Wei, Xinyi Yang, Chunwei Meng, Xiaoyu Yang, Kaifeng Han, Chen Qiu, Huici Wu

This paper proves the efficiency of IRS enabled ISAC system, which motivates the implementation of IRS to enhance the sensing capability in ISAC system.

Integrated sensing and communication ISAC

Fast and parallel decoding for transducer

1 code implementation31 Oct 2022 Wei Kang, Liyong Guo, Fangjun Kuang, Long Lin, Mingshuang Luo, Zengwei Yao, Xiaoyu Yang, Piotr Żelasko, Daniel Povey

In this work, we introduce a constrained version of transducer loss to learn strictly monotonic alignments between the sequences; we also improve the standard greedy search and beam search algorithms by limiting the number of symbols that can be emitted per time step in transducer decoding, making it more efficient to decode in parallel with batches.

speech-recognition Speech Recognition

Delay-penalized transducer for low-latency streaming ASR

1 code implementation31 Oct 2022 Wei Kang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Long Lin, Piotr Żelasko, Daniel Povey

In streaming automatic speech recognition (ASR), it is desirable to reduce latency as much as possible while having minimum impact on recognition accuracy.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Two-Stage is Enough: A Concise Deep Unfolding Reconstruction Network for Flexible Video Compressive Sensing

1 code implementation15 Jan 2022 Siming Zheng, Xiaoyu Yang, Xin Yuan

We consider the reconstruction problem of video compressive sensing (VCS) under the deep unfolding/rolling structure.

Compressive Sensing Demosaicking +1

Interpretable and Effective Reinforcement Learning for Attacking against Graph-based Rumor Detection

no code implementations15 Jan 2022 Yuefei Lyu, Xiaoyu Yang, Jiaxin Liu, Philip S. Yu, Sihong Xie, Xi Zhang

To discover subtle vulnerabilities, we design a powerful attacking algorithm to camouflage rumors in social networks based on reinforcement learning that can interact with and attack any black-box detectors.

reinforcement-learning Reinforcement Learning (RL)

Knowledge Distillation for Neural Transducers from Large Self-Supervised Pre-trained Models

no code implementations7 Oct 2021 Xiaoyu Yang, Qiujia Li, Philip C. Woodland

Self-supervised pre-training is an effective approach to leveraging a large amount of unlabelled data to reduce word error rates (WERs) of automatic speech recognition (ASR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Exploring Decomposition for Table-based Fact Verification

1 code implementation Findings (EMNLP) 2021 Xiaoyu Yang, Xiaodan Zhu

Fact verification based on structured data is challenging as it requires models to understand both natural language and symbolic operations performed over tables.

Fact Verification Table-based Fact Verification

Unsupervised Pre-training with Structured Knowledge for Improving Natural Language Inference

no code implementations8 Sep 2021 Xiaoyu Yang, Xiaodan Zhu, Zhan Shi, Tianda Li

There have been two lines of approaches that can be used to further address the limitation: (1) unsupervised pretraining can leverage knowledge in much larger unstructured text data; (2) structured (often human-curated) knowledge has started to be considered in neural-network-based models for NLI.

Natural Language Inference Sentence +2

SemEval-2021 Task 4: Reading Comprehension of Abstract Meaning

1 code implementation SEMEVAL 2021 Boyuan Zheng, Xiaoyu Yang, Yu-Ping Ruan, ZhenHua Ling, Quan Liu, Si Wei, Xiaodan Zhu

Given a passage and the corresponding question, a participating system is expected to choose the correct answer from five candidates of abstract concepts in a cloze-style machine reading comprehension setup.

Machine Reading Comprehension

A High-Dynamic-Range Digital RF-Over-Fiber Link for MRI Receive Coils Using Delta-Sigma Modulation

no code implementations27 May 2021 Mingdong Fan, Robert W. Brown, Xi Gao, Soumyajit Mandal, Labros Petropoulos, Xiaoyu Yang, Shinya Handa, Hiroyuki Fujita

Non-conductive transmission solutions based on fiber-optic cables are considered to be one of the alternatives, but are limited by the high dynamic range ($>80$~dB) of typical MRI signals.

IA Planner: Motion Planning Using Instantaneous Analysis for Autonomous Vehicle in the Dense Dynamic Scenarios on Highways

no code implementations19 Mar 2021 Xiaoyu Yang, Huiyun Li

In dense and dynamic scenarios, planning a safe and comfortable trajectory is full of challenges when traffic participants are driving at high speed.

Motion Planning Trajectory Planning

Learning to Retrieve Entity-Aware Knowledge and Generate Responses with Copy Mechanism for Task-Oriented Dialogue Systems

1 code implementation22 Dec 2020 Chao-Hong Tan, Xiaoyu Yang, Zi'ou Zheng, Tianda Li, Yufei Feng, Jia-Chen Gu, Quan Liu, Dan Liu, Zhen-Hua Ling, Xiaodan Zhu

Task-oriented conversational modeling with unstructured knowledge access, as track 1 of the 9th Dialogue System Technology Challenges (DSTC 9), requests to build a system to generate response given dialogue history and knowledge access.

Response Generation Task-Oriented Dialogue Systems

Program Enhanced Fact Verification with Verbalization and Graph Attention Network

1 code implementation EMNLP 2020 Xiaoyu Yang, Feng Nie, Yufei Feng, Quan Liu, Zhigang Chen, Xiaodan Zhu

Built on that, we construct the graph attention verification networks, which are designed to fuse different sources of evidences from verbalized program execution, program structures, and the original statements and tables, to make the final verification decision.

Fact Verification Graph Attention

Auxiliary Diagnosing Coronary Stenosis Using Machine Learning

no code implementations16 Jul 2020 Weijun Zhu, Fengyuan LU, Xiaoyu Yang, En Li

How to accurately classify and diagnose whether an individual has Coronary Stenosis (CS) without invasive physical examination?

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.