1 code implementation • 28 Feb 2025 • Jiaqi Bai, Hongcheng Guo, Zhongyuan Peng, Jian Yang, Zhoujun Li, Mohan Li, Zhihong Tian
Furthermore, we propose an entropy-based noise-controlling strategy to enable the injected noise to be adaptively constrained regarding the smoothness of the similarity distribution.
no code implementations • 18 Feb 2025 • Wenpeng Xing, Minghao Li, Mohan Li, Meng Han
Embodied AI systems, including robots and autonomous vehicles, are increasingly integrated into real-world applications, where they encounter a range of vulnerabilities stemming from both environmental and system-level factors.
1 code implementation • 16 Jan 2025 • Yixiao Xu, Binxing Fang, Rui Wang, Yinghai Zhou, Shouling Ji, YuAn Liu, Mohan Li, Zhihong Tian
Guided by the model, we further introduce: (1) a similarity-based training-free watermarking method for plug-and-play and flexible watermarking, and (2) a distribution-based multi-step watermark information transmission strategy for robust watermarking.
no code implementations • 7 Jan 2025 • Mohan Li, Martin Gjoreski, Pietro Barbiero, Gašper Slapničar, Mitja Luštrek, Nicholas D. Lane, Marc Langheinrich
However, its reliance on detailed and often privacy-sensitive data as the basis for its machine learning (ML) models raises significant legal and ethical concerns.
no code implementations • 29 Aug 2024 • Mohan Li, Cong-Thanh Do, Simon Keizer, Youmna Farag, Svetlana Stoyanchev, Rama Doddipatla
Speech large language models (speech-LLMs) integrate speech and text-based foundation models to provide a unified framework for handling a wide range of downstream tasks.
no code implementations • 21 Jun 2024 • Mohan Li, Simon Keizer, Rama Doddipatla
The system is efficiently trained with prefix-tuning, optimising a minimal set of parameters rather than the entire Whisper model.
no code implementations • CVPR 2024 • Chao Zhang, Mohan Li, Ignas Budvytis, Stephan Liwicki
However, most existing works in embodied dialog research focus on navigation and leave the localization task understudied.
no code implementations • 6 Jan 2024 • Yue Chen, Mohan Li
Driven by such motivation, we conduct an attribution analysis based on the general framework of their model to further prove the importance of the economic factors and identify the specific identity of significant factors.
no code implementations • 24 Apr 2023 • Mohan Li, Rama Doddipatla, Catalin Zorila
In previous works, latency was optimised by truncating the online attention weights based on the hard alignments obtained from conventional ASR models, without taking into account the potential loss of ASR accuracy.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 21 Apr 2023 • Mohan Li, Rama Doddipatla
This paper presents the use of non-autoregressive (NAR) approaches for joint automatic speech recognition (ASR) and spoken language understanding (SLU) tasks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 29 Jul 2022 • Cong-Thanh Do, Mohan Li, Rama Doddipatla
The multiple-hypothesis approach yields a relative reduction of 3. 3% WER on the CHiME-4's single-channel real noisy evaluation set when compared with the single-hypothesis approach.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 11 Mar 2022 • Mohan Li, Shucong Zhang, Catalin Zorila, Rama Doddipatla
In this paper, we propose an online attention mechanism, known as cumulative attention (CA), for streaming Transformer-based automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 26 Apr 2021 • Mohan Li, Catalin Zorila, Rama Doddipatla
Online Transformer-based automatic speech recognition (ASR) systems have been extensively studied due to the increasing demand for streaming applications.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 30 Aug 2018 • Mohan Li, Min Liu, Masanori Hattori
In this paper, we present Adaptive Computation Steps (ACS) algo-rithm, which enables end-to-end speech recognition models to dy-namically decide how many frames should be processed to predict a linguistic output.
Ranked #18 on
Speech Recognition
on AISHELL-1