no code implementations • 24 Mar 2024 • Yunlong Tang, Daiki Shimada, Jing Bi, Chenliang Xu
In everyday communication, humans frequently use speech and gestures to refer to specific areas or objects, a process known as Referential Dialogue (RD).
no code implementations • 1 Jun 2021 • Tokuhiro Nishikawa, Daiki Shimada, Jerry Jun Yokono
Although several research works have been reported on audio-visual sound source localization in unconstrained videos, no datasets and metrics have been proposed in the literature to quantitatively evaluate its performance.