Fine-tuning large pre-trained models with task-specific data has achieved great success in NLP.
Inspired by the recent success of leveraging a retrieval module to augment large-scale neural network models, we propose to retrieve examples that are semantically-similar to a test sample to formulate its corresponding prompt.
Large-scale language models have recently demonstrated impressive empirical performance.
To verify the effectiveness of the proposed framework, we apply CoDA to Transformer-based models on a wide range of natural language understanding tasks.
We prove, from a theoretical perspective, that the gradients derived from this new masking schema have a smaller variance and can lead to more efficient self-supervised training.
Ranked #1 on Sentence Classification on ACL-ARC
Adversarial training has been shown effective at endowing the learned representations with stronger generalization ability.
Ranked #2 on Machine Translation on IWSLT2014 German-English
Generative semantic hashing is a promising technique for large-scale information retrieval thanks to its fast retrieval speed and small memory footprint.
Learning disentangled representations of natural language is essential for many NLP tasks, e. g., conditional text generation, style transfer, personalized dialogue systems, etc.
Attention-based models have shown significant improvement over traditional algorithms in several NLP tasks.
Generating high-quality paraphrases is a fundamental yet challenging natural language processing task.
The Straight-Through (ST) estimator is a widely used technique for back-propagating gradients through discrete random variables.
Hashing is promising for large-scale information retrieval tasks thanks to the efficiency of distance evaluation between binary codes.
Vector representations of sentences, trained on massive text corpora, are widely used as generic sentence embeddings across a variety of NLP problems.
We present a syntax-infused variational autoencoder (SIVAE), that integrates sentences with their syntactic trees to improve the grammar of generated sentences.
Constituting highly informative network embeddings is an important tool for network analysis.
We propose a topic-guided variational auto-encoder (TGVAE) model for text generation.
We propose a topic-guided variational autoencoder (TGVAE) model for text generation.
Variational autoencoders (VAEs) have received much attention recently as an end-to-end architecture for text generation with latent variables.
Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments.
Ranked #2 on Vision-Language Navigation on Room2Room
Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring a substantial number of parameters and expensive computations.
Ranked #1 on Named Entity Recognition on CoNLL 2000
Semantic hashing has become a powerful paradigm for fast similarity search in many information retrieval systems.
Word embeddings are effective intermediate representations for capturing semantic regularities between words, when learning the representations of text sequences.
Ranked #11 on Text Classification on DBpedia
In this paper, we conduct an extensive comparative study between Simple Word Embeddings-based Models (SWEMs), with no compositional parameters, relative to employing word embeddings within RNN/CNN-based models.
The TCNLM learns the global semantic coherence of a document via a neural topic model, and the probability of each learned latent topic is further used to build a Mixture-of-Experts (MoE) language model, where each expert (corresponding to one topic) is a recurrent neural network (RNN) that accounts for learning the local structure of a word sequence.
Parametric embedding methods such as parametric t-SNE (pt-SNE) have been widely adopted for data visualization and out-of-sample data embedding without further computationally expensive optimization or approximation.
The role of meta network is to abstract the contextual information of a sentence or document into a set of input-aware filters.
Ranked #13 on Text Classification on DBpedia
A latent-variable model is introduced for text matching, inferring sentence representations by jointly optimizing generative and discriminative objectives.