These alignments are learned separately from parser training and require a complex pipeline of rule-based components, pre-processing, and post-processing to satisfy domain-specific constraints.
We propose DiffCSE, an unsupervised contrastive learning framework for learning sentence embeddings.
Ranked #7 on Semantic Textual Similarity on STS16
This work aims to develop a control mechanism by which a user can select spans of context as "highlights" for the model to focus on, and generate relevant output.
We demonstrate that co-training (Blum & Mitchell, 1998) can improve the performance of prompt-based learning by using unlabeled data.
While flexible and performant, these models often require large datasets for training and can fail spectacularly on benchmarks designed to test for compositional generalization.
no code implementations • 13 Jul 2021 • Stanislav Lukyanenko, Won-Dong Jang, Donglai Wei, Robbert Struyven, Yoon Kim, Brian Leahy, Helen Yang, Alexander Rush, Dalit Ben-Yosef, Daniel Needleman, Hanspeter Pfister
In this work, we propose a two-stream model for developmental stage classification.
While vector-based language representations from pretrained language models have set a new standard for many NLP tasks, there is not yet a complete accounting of their inner workings.
Importing from computational and cognitive neuroscience the notion of representational invariance, we perform a series of probes designed to test the sensitivity of Transformer representations to several kinds of structure in sentences.
This approach views finetuning as learning a task-specific diff vector that is applied on top of the pretrained parameter vector, which remains fixed and is shared across different tasks.
In addition, we find that the emergence of linear separability in these manifolds is driven by a combined reduction of manifolds' radius, dimensionality and inter-manifold correlations.
One of the most surprising and exciting discoveries in supervised learning was the benefit of overparameterization (i. e. training a very large model) to improving the optimization landscape of a problem, with minimal effect on statistical performance (i. e. generalization).
We study a formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic context-free grammar.
Ranked #5 on Constituency Grammar Induction on PTB
On language modeling, unsupervised RNNGs perform as well their supervised counterparts on benchmarks in English and Chinese.
Ranked #6 on Constituency Grammar Induction on PTB (Max F1 (WSJ) metric)
There has been much recent, exciting work on combining the complementary strengths of latent variable models and deep learning.
VAEs can capture complex distributions, but they can also suffer from an issue known as "latent variable collapse," especially if the likelihood model is powerful.
This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference.
Ranked #23 on Machine Translation on IWSLT2014 German-English
Visual question answering (VQA) models respond to open-ended natural language questions about images.
Amortized variational inference (AVI) replaces instance-specific local inference with a global inference network.
Ranked #2 on Text Generation on Yahoo Questions
We introduce an open-source toolkit for neural machine translation (NMT) to support research into model architectures, feature representations, and source modalities, while maintaining competitive performance, modularity and reasonable training requirements.
In a controlled experiment of sequence-to-sequence approaches for the task of sentence correction, we find that character-based models are generally more effective than word-based models and models that encode subword information via convolutions, and that modeling the output data as a series of diffs improves effectiveness over standard approaches.
This adversarially regularized autoencoder (ARAE) allows us to generate natural textual outputs as well as perform manipulations in the latent space to induce change in the output space.
Attention networks have proven to be an effective approach for embedding categorical inference within a deep neural network.
We demonstrate that standard knowledge distillation applied to word-level prediction can be effective for NMT, and also introduce two novel sequence-level versions of knowledge distillation that further improve performance, and somewhat surprisingly, seem to eliminate the need for beam search (even when applied on the original teacher model).
Ranked #1 on Machine Translation on IWSLT2015 Thai-English
We demonstrate that an attention-based encoder-decoder model can be used for sentence-level grammatical error identification for the Automated Evaluation of Scientific Writing (AESW) Shared Task 2016.
We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks.
Ranked #33 on Natural Language Inference on SNLI
We provide a simple but novel supervised weighting scheme for adjusting term frequency in tf-idf for sentiment analysis and text classification.
We provide a method for automatically detecting change in language across time through a chronologically trained neural language model.