Grounded language learning
22 papers with code • 0 benchmarks • 1 datasets
Acquire the meaning of language in situated environments.
These leaderboards are used to track progress in Grounded language learning
Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and scientific reasons, but given the poor data efficiency of the current learning methods, this goal may require substantial research efforts.
Most existing methods employ a transformer-based multimodal encoder to jointly model visual tokens (region-based image features) and word tokens.
To study this human-like language acquisition ability, we present VisCOLL, a visually grounded language learning task, which simulates the continual acquisition of compositional phrases from streaming visual scenes.
Trained via a combination of reinforcement and unsupervised learning, and beginning with minimal prior knowledge, the agent learns to relate linguistic symbols to emergent perceptual representations of its physical surroundings and to pertinent sequences of actions.
Interactive Language Acquisition with One-shot Visual Concept Learning through a Conversational Game
Building intelligent agents that can communicate with and learn from humans in natural language is of great value.
Our goal is to develop a model that can learn to follow new instructions given prior instruction-perception-action examples.
Recent work has shown how to learn better visual-semantic embeddings by leveraging image descriptions in more than one language.
Previous work on grounded language learning did not fully capture the semantics underlying the correspondences between structured world state representations and texts, especially those between numerical values and lexical terms.