1 code implementation • 11 Jun 2024 • Payal Mohapatra, Shamika Likhite, Subrata Biswas, Bashima Islam, Qi Zhu
In experiments across five disfluency-detection tasks, our unified multimodal approach significantly outperforms Audio-only unimodal methods, yielding an average absolute improvement of 10% (i. e., 10 percentage point increase) when both video and audio modalities are always available, and 7% even when video modality is missing in half of the samples.