Paraphrase Identification
72 papers with code • 10 benchmarks • 17 datasets
The goal of Paraphrase Identification is to determine whether a pair of sentences have the same meaning.
Source: Adversarial Examples with Difficult Common Words for Paraphrase Identification
Image source: On Paraphrase Identification Corpora
Libraries
Use these libraries to find Paraphrase Identification models and implementationsLatest papers
GAPX: Generalized Autoregressive Paraphrase-Identification X
Paraphrase Identification is a fundamental task in Natural Language Processing.
Adversarial Self-Attention for Language Understanding
Deep neural models (e. g. Transformer) naturally learn spurious features, which create a ``shortcut'' between the labels and inputs, thus impairing the generalization and robustness.
NMTScore: A Multilingual Analysis of Translation-based Text Similarity Measures
Being able to rank the similarity of short text segments is an interesting bonus feature of neural machine translation.
Match-Prompt: Improving Multi-task Generalization Ability for Neural Text Matching via Prompt Learning
In generalization stage, matching model explores the essential matching signals by being trained on diverse matching tasks.
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind.
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations
Predominantly, two formulations are used for sentence-pair tasks: bi-encoders and cross-encoders.
Towards Better Characterization of Paraphrases
To effectively characterize the nature of paraphrase pairs without expert human annotation, we proposes two new metrics: word position deviation (WPD) and lexical deviation (LD).
Modelling Latent Translations for Cross-Lingual Transfer
To remedy this, we propose a new technique that integrates both steps of the traditional pipeline (translation and classification) into a single model, by treating the intermediate translations as a latent random variable.
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
In this paper, we propose a new model inductive bias that learns a subword tokenization end-to-end as part of the model.
Improving Paraphrase Detection with the Adversarial Paraphrasing Task
Can we teach them instead to identify paraphrases in a way that draws on the inferential properties of the sentences, and is not over-reliant on lexical and syntactic similarities of a sentence pair?