The development of recommender systems that optimize multi-turn interaction with users, and model the interactions of different agents (e. g., users, content providers, vendors) in the recommender ecosystem have drawn increasing attention in recent years.
Vision-and-Language Navigation wayfinding agents can be enhanced by exploiting automatically generated navigation instructions.
Identifying a short segment in a long video that semantically matches a text query is a challenging task that has important application potentials in language-based video search, browsing, and navigation.
Summarization is the task of compressing source document(s) into coherent and succinct passages.
We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN) dataset.
Recent progresses have leveraged the ideas of pre-training (from language modeling) and attention layers in Transformers to learn representation from datasets containing images aligned with linguistic expressions that describe the images.
We present a multi-level geocoding model (MLG) that learns to associate texts to geographic locations.
Many methods have been proposed to quantify the predictive uncertainty associated with the outputs of deep neural networks.
To this end, we propose BabyWalk, a new VLN agent that is learned to navigate by decomposing long instructions into shorter ones (BabySteps) and completing them sequentially.
Recent research efforts enable study for natural language grounded navigation in photo-realistic environments, e. g., following natural language instructions or dialog.
These have been added to the StreetLearn dataset and can be obtained via the same process as used previously for StreetLearn.
VALAN is a lightweight and scalable software framework for deep reinforcement learning based on the SEED RL architecture.
We show that it is feasible to perform entity linking by training a dual encoder (two-tower) model that encodes mentions and entities in the same dense vector space, where candidate entities are retrieved by approximate nearest neighbor search.
We propose RecSim, a configurable platform for authoring simulation environments for recommender systems (RSs) that naturally supports sequential interaction with users.
Vision-and-Language Navigation (VLN) tasks such as Room-to-Room (R2R) require machine agents to interpret natural language instructions and learn to act in visually realistic environments to achieve navigation goals.
Ranked #75 on Vision and Language Navigation on VLN Challenge
We address fundamental flaws in previously used metrics and show how Dynamic Time Warping (DTW), a long known method of measuring similarity between two time series, can be used for evaluation of navigation agents.
Vision-and-Language Navigation (VLN) is a natural language grounding task where agents have to interpret natural language instructions in the context of visual scenes in a dynamic environment to achieve prescribed navigation goals.
no code implementations • 29 May 2019 • Eugene Ie, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Morgane Lustman, Vince Gatto, Paul Covington, Jim McFadden, Tushar Chandra, Craig Boutilier
(i) We develop SLATEQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates.
We also show that the existing paths in the dataset are not ideal for evaluating instruction following because they are direct-to-goal shortest paths.
Albeit the simplicity of the resulting optimization problem, it is effective in improving both recognition and localization accuracy.