Scene-Aware Dialogue
4 papers with code • 1 benchmarks • 1 datasets
Latest papers
An Embodied Generalist Agent in 3D World
However, several significant challenges remain: (i) most of these models rely on 2D images yet exhibit a limited capacity for 3D input; (ii) these models rarely explore the tasks inherently defined in 3D world, e. g., 3D grounding, embodied reasoning and acting.
Maintaining Common Ground in Dynamic Environments
Common grounding is the process of creating and maintaining mutual understandings, which is a critical aspect of sophisticated human communication.
A Simple Baseline for Audio-Visual Scene-Aware Dialog
The recently proposed audio-visual scene-aware dialog task paves the way to a more data-driven way of learning virtual assistants, smart speakers and car navigation systems.
Audio-Visual Scene-Aware Dialog
We introduce the task of scene-aware dialog.