Search Results for author: Yingzhi Wang

Found 7 papers, 3 papers with code

ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs

no code implementations26 May 2025 Pooneh Mousavi, Yingzhi Wang, Mirco Ravanelli, Cem Subakan

A key consideration for these models is the cross-modal alignment between text and audio modalities, which is a telltale sign as to whether or not LLM is able to associate semantic meaning to audio segments.

cross-modal alignment Emotion Recognition +2

Open Universal Arabic ASR Leaderboard

1 code implementation18 Dec 2024 Yingzhi Wang, Anas Alhmoud, Muhammad Alqurishi

In recent years, the enhanced capabilities of ASR models and the emergence of multi-dialect datasets have increasingly pushed Arabic ASR model development toward an all-dialect-in-one direction.

Benchmarking

What Are They Doing? Joint Audio-Speech Co-Reasoning

1 code implementation22 Sep 2024 Yingzhi Wang, Pooneh Mousavi, Artem Ploujnikov, Mirco Ravanelli

In audio and speech processing, tasks usually focus on either the audio or speech modality, even when both sounds and human speech are present in the same audio clip.

Cannot find the paper you are looking for? You can Submit a new open access paper.