Scene-Aware Dialogue
4 papers with code • 1 benchmarks • 1 datasets
Most implemented papers
Audio-Visual Scene-Aware Dialog
We introduce the task of scene-aware dialog.
A Simple Baseline for Audio-Visual Scene-Aware Dialog
The recently proposed audio-visual scene-aware dialog task paves the way to a more data-driven way of learning virtual assistants, smart speakers and car navigation systems.
Maintaining Common Ground in Dynamic Environments
Common grounding is the process of creating and maintaining mutual understandings, which is a critical aspect of sophisticated human communication.
An Embodied Generalist Agent in 3D World
However, several significant challenges remain: (i) most of these models rely on 2D images yet exhibit a limited capacity for 3D input; (ii) these models rarely explore the tasks inherently defined in 3D world, e. g., 3D grounding, embodied reasoning and acting.