We make multiple contributions to initiate research on this task.
In recent years, we have witnessed a surge of Graph Neural Networks (GNNs), most of which can learn powerful representations in an end-to-end fashion with great success in many real-world applications.
Instead of planning with the expensive MCTS, we use the learned model to construct an advantage estimation based on a one-step rollout.
Natural language generally describes objects and spatial relations with compositionality and ambiguity, two major obstacles to effective language grounding.
This ignores the context of entity pairs and the label correlations between the none class and pre-defined classes, leading to sub-optimal predictions.
Individualization refers to artificially distinguishing a node in the graph and refinement is the propagation of this information to other nodes through message passing.
This paper exemplifies the integration of entropic regularized optimal transport techniques as a layer in a deep reinforcement learning network.
To solve these issues, we present Context Hierarchy IRL(CHIRL), a new IRL algorithm that exploits the context to scale up IRL and learn reward functions of complex behaviors.
Depending upon the smoothness of the action-value function, one approach to overcoming this issue is through online learning, where information is interpolated among similar states; Policy Gradient Search provides a practical algorithm to achieve this.
These layers rely on Gaussian dropouts and are inserted in between the layers of the deep neural network model to help facilitate variational Thompson sampling.
Message passing Graph Neural Networks (GNNs) are known to be limited in expressive power by the 1-WL color-refinement test for graph isomorphism.
Ensemble and auxiliary tasks are both well known to improve the performance of machine learning models when data is limited.
Manipulating deformable objects, such as ropes and clothing, is a long-standing challenge in robotics, because of their large degrees of freedom, complex non-linear dynamics, and self-occlusion in visual perception.
We derive a variational Thompson sampling approximation for DQNs which uses a deep network whose parameters are perturbed by a learned variational noise distribution.
Designing a network to learn a molecule structure given its physical/chemical properties is a hard problem, but is useful for drug discovery tasks.
This paper presents Contrastive Variational Reinforcement Learning (CVRL), a model-based method that tackles complex visual observations in DRL.
In this work, we study performance degradation of GCNs by experimentally examining how stacking only TRANs or PROPs works.
The particle filter maintains a belief using learned discriminative update, which is trained end-to-end for decision making.
We address the problem of Visual Relationship Detection (VRD) which aims to describe the relationships between pairs of objects in the form of triplets of (subject, predicate, object).
Aspect-based sentiment analysis produces a list of aspect terms and their corresponding sentiments for a natural language sentence.
LeTS-Drive leverages the robustness of planning and the runtime efficiency of learning to enhance the performance of both.
This paper introduces the Differentiable Algorithm Network (DAN), a composable architecture for robot learning systems.
Our key observation is that experience can be directly generalized over target contexts.
We consider the cross-domain sentiment classification problem, where a sentiment classifier is to be learned from a source domain and to be generalized to a target domain.
First, we propose a method for target representation that better captures the semantic meaning of the opinion target.
We propose to take a novel approach to robot system design where each building block of a larger system is represented as a differentiable program, i. e. a deep neural network.
This paper introduces Push-Net, a deep recurrent neural network model, which enables a robot to push objects of unknown physical properties for re-positioning and re-orientation, using only visual camera images as input.
Attention-based long short-term memory (LSTM) networks have proven to be useful in aspect-level sentiment classification.
Our planning system combines a POMDP algorithm with the pedestrian motion model and runs in near real time.
Particle filtering is a powerful approach to sequential state estimation and finds application in many domains, including robot localization, object tracking, etc.
Planning under uncertainty is critical for robust robot performance in uncertain, dynamic environments, but it incurs high computational cost.
How can a delivery robot navigate reliably to a destination in a new office building, with minimal prior information?
Unlike topic models which typically assume independently generated words, word embedding models encourage words that appear in similar contexts to be located close to each other in the embedding space.
It is a recurrent policy network, but it represents a policy for a parameterized set of tasks by connecting a model with a planning algorithm that solves the model, thus embedding the solution structure of planning in a network learning architecture.
Scarce data is a major challenge to scaling robot learning to truly complex tasks, as we need to generalize locally learned policies over different "contexts".
We show that the best policy obtained from a DESPOT is near-optimal, with a regret bound that depends on the representation size of the optimal policy.
We show that a POMDP-lite is equivalent to a set of fully observable Markov decision processes indexed by a hidden parameter and is useful for modeling a variety of interesting robotic tasks.
For the representer theorem to hold, the linear functionals are required to be bounded in the RKHS, and we show that this is true for a variety of commonly used RKHS and invariances.
We introduce a new objective function for pool-based Bayesian active learning with probabilistic hypotheses.
Bayesian reinforcement learning (BRL) encodes prior knowledge of the world in a model and represents uncertainty in model parameters by maintaining a probability distribution over them.