Text-based games (TGs) are exciting testbeds for developing deep reinforcement learning techniques due to their partially observed environments and large action spaces.
Offline goal-conditioned RL (GCRL) offers a feasible paradigm to learn general-purpose policies from diverse and multi-task offline datasets.
Dialogue state tracking (DST) plays an important role in task-oriented dialogue systems.
Furthermore, we propose distilling the rewriting capabilities of LLMs into smaller models to reduce rewriting latency.
Although large language models (LLMs) are impressive in solving various tasks, they can quickly be outdated after deployment.
Data-centric AI, with its primary focus on the collection, management, and utilization of data to drive AI models and applications, has attracted increasing attention in recent years.
We posit that our research marks a significant paradigm shift in human mobility modelling, transitioning from building complex domain-specific models to harnessing general-purpose LLMs that yield accurate predictions through language instructions.
In ERC, we propose a regularizer that guides the approximation error tending towards the 1-eigensubspace, resulting in a more efficient and stable path of value approximation.
Despite the fact that adversarial training has become the de facto method for improving the robustness of deep neural networks, it is well-known that vanilla adversarial training suffers from daunting robust overfitting, resulting in unsatisfactory robust generalization.
We hereby carry out a first-of-its-kind study unveiling that modern large-kernel ConvNets, a compelling competitor to Vision Transformers, are remarkably more effective teachers for small-kernel ConvNets, due to more similar architectures.
Dynamic sparse training (DST), as a leading sparse training approach, can train deep neural networks at high sparsity from scratch to match the performance of their dense counterparts.
While the majority of current approaches construct the reward redistribution in an uninterpretable manner, we propose to explicitly model the contributions of state and action from a causal perspective, resulting in an interpretable reward redistribution and preserving policy invariance.
We evaluate two popular pretrained Chinese conversational models, CDial-GPT and EVA2. 0, using CHBias.
Our proposed framework provides access: (i) for verifying whether automatic metrics are faithful to human preference, regardless of their correlation level to human; and (ii) for inspecting the strengths and limitations of NLG systems via pairwise evaluation.
Text-based games (TGs) are language-based interactive environments for reinforcement learning.
1 code implementation • 28 Nov 2022 • Tianjin Huang, Tianlong Chen, Meng Fang, Vlado Menkovski, Jiaxu Zhao, Lu Yin, Yulong Pei, Decebal Constantin Mocanu, Zhangyang Wang, Mykola Pechenizkiy, Shiwei Liu
Recent works have impressively demonstrated that there exists a subnetwork in randomly initialized convolutional neural networks (CNNs) that can match the performance of the fully trained dense networks at initialization, without any optimization of the weights of the network (i. e., untrained networks).
Thus, a trade-off between accuracy, memory consumption and processing speed is pursued.
We present Twin Answer Sentences Attack (TASA), an adversarial attack method for question answering (QA) models that produces fluent and grammatical adversarial contexts while maintaining gold answers.
In PGU, we adopt a set of shared and learnable prototypes as the queries to extract diverse and semantically aligned features for both modalities in the granularity-unified feature space, which further promotes the ReID performance.
Although the vision-and-language pretraining (VLP) equipped cross-modal image-text retrieval (ITR) has achieved remarkable progress in the past two years, it suffers from a major drawback: the ever-increasing size of VLP models restricts its deployment to real-world search scenarios (where the high latency is unacceptable).
In this work, we propose logic structural-constraint modeling to solve the logical reasoning QA and introduce discourse-aware graph networks (DAGNs).
Recent works on sparse neural network training (sparse training) have shown that a compelling trade-off between performance and efficiency can be achieved by training intrinsically sparse neural networks from scratch.
Generating high-quality textual adversarial examples is critical for investigating the pitfalls of natural language processing (NLP) models and further promoting their robustness.
To alleviate the above data issues, we propose a data manipulation method, which is model-agnostic to be packed with any persona-based dialogue generation model to improve its performance.
Recent work incorporates pre-trained word embeddings such as BERT embeddings into Neural Topic Models (NTMs), generating highly coherent topics.
Text-based games provide an interactive way to study natural language processing.
In this paper, we revisit the theoretical property of GCSL -- optimizing a lower bound of the goal reaching objective, and extend GCSL as a novel offline goal-conditioned RL algorithm.
In a variety of text-based games, we show that this simple method results in competitive performance for agents.
Imitation Learning (IL) merely concentrates on reproducing expert behaviors and could take dangerous actions, which is unbearable in safety-critical scenarios.
Deep reinforcement learning provides a promising approach for text-based games in studying natural language communication between humans and artificial agents.
The ability to detect Out-of-Domain (OOD) inputs has been a critical requirement in many real-world NLP applications.
Safe exploration is crucial for the real-world application of reinforcement learning (RL).
Replacing original goals with virtual goals generated from interaction with a trained dynamics model leads to a novel relabeling method, model-based relabeling (MBR).
The model encodes discourse information as a graph with elementary discourse units (EDUs) and discourse relations, and learns the discourse-aware features via a graph network for downstream QA tasks.
Ranked #24 on Reading Comprehension on ReClor
Despite the remarkable progress made by the policy gradient algorithms in reinforcement learning (RL), sub-optimal policies usually result from the local exploration property of the policy gradient update.
Dealing with multi-agent control in networked systems is one of the biggest challenges in Reinforcement Learning (RL) and limited success has been presented compared to recent deep reinforcement learning in single-agent domain.
It is crucial since the quality of the evidence is the key to answering commonsense questions, and even determines the upper bound on the QA systems performance.
We show that with orders of less computation scale, a faithful reimplementation of AlphaStar's methods can not succeed and the proposed techniques are necessary to ensure TStarBot-X's competitive performance.
This poses non-trivial difficulties for researchers or engineers and prevents the application of MARL to a broader range of real-world problems.
We study reinforcement learning (RL) for text-based games, which are interactive simulations in the context of natural language.
By contrast, this paper proves that LfO is almost equivalent to LfD in the deterministic robot environment, and more generally even in the robot environment with bounded randomness.
In this work, we study dialogue models with multiple input sources adapted from the pretrained language model GPT2.
However, PG algorithms rely on exploiting the value function being learned with the first-order update locally, which results in limited sample efficiency.
A great challenge in cooperative decentralized multi-agent reinforcement learning (MARL) is generating diversified behaviors for each individual agent when receiving only a team reward.
This ``Goal-and-Curiosity-driven Curriculum Learning'' leads to ``Curriculum-guided HER (CHER)'', which adaptively and dynamically controls the exploration-exploitation trade-off during the learning process via hindsight experience selection.
On the other hand, it further reduces domain distribution discrepancy through conditional adversarial learning across domains.
The framework builds a user simulator by first generating diverse dialogue data from templates and then build a new State2Seq user simulator on the data.
We introduce Arena, a toolkit for multi-agent reinforcement learning (MARL) research.
However, the importance of feature embedding, i. e., exploring the relationship among training samples, is neglected.
We propose a new neural transfer method termed Dual Adversarial Transfer Network (DATNet) for addressing low-resource Named Entity Recognition (NER).
Dealing with sparse rewards is one of the most important challenges in reinforcement learning (RL), especially when a goal is dynamic (e. g., to grasp a moving object).
Graph convolutional networks are used to obtain a relation-aware representation of nodes for entity graphs built from documents with multi-level features.
First, we initialize an adversarial example with a gray color image on which every pixel has roughly the same importance for the target model.
In this paper, we comprehensively describe the methodology of our submissions to the One-Minute Gradual-Emotion Behavior Challenge 2018.
Active learning aims to select a small subset of data for annotation such that a classifier learned on the data is highly accurate.
Cross-lingual model transfer is a compelling and popular method for predicting annotations in a low-resource language, whereby parallel corpora provide a bridge to a high-resource language and its associated annotated corpora.
Multi-view spectral clustering, which aims at yielding an agreement or consensus data objects grouping across multi-views with their graph laplacian matrices, is a fundamental clustering problem.
Cross lingual projection of linguistic annotation suffers from many sources of bias and noise, leading to unreliable annotations that cannot be used directly.
In the teaching-to-learn step, a teacher is designed to arrange the regions from simple to difficult and then assign the simplest regions to the learner.
In this paper, we propose a new transfer learning algorithm that attempts to transfer common latent structure features across the source and target networks.