Paper

Embodied Language Grounding with Implicit 3D Visual Feature Representations

Consider the utterance "the tomato is to the left of the pot." Humans can answer numerous questions about the situation described, as well as reason through counterfactuals and alternatives, such as, "is the pot larger than the tomato ?.. (read more)

Results in Papers With Code
(↓ scroll down to see all results)