Search Results for author: Keita Kurita

Found 4 papers, 3 papers with code

Weight Poisoning Attacks on Pre-trained Models

2 code implementations14 Apr 2020 Keita Kurita, Paul Michel, Graham Neubig

We show that by applying a regularization method, which we call RIPPLe, and an initialization procedure, which we call Embedding Surgery, such attacks are possible even with limited knowledge of the dataset and fine-tuning procedure.

Sentiment Analysis Sentiment Classification +1

Towards Robust Toxic Content Classification

1 code implementation14 Dec 2019 Keita Kurita, Anna Belova, Antonios Anastasopoulos

We propose a method of generating realistic model-agnostic attacks using a lexicon of toxic tokens, which attempts to mislead toxicity classifiers by diluting the toxicity signal either by obfuscating toxic tokens through character-level perturbations, or by injecting non-toxic distractor tokens.

Classification Denoising +1

Measuring Bias in Contextualized Word Representations

1 code implementation WS 2019 Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W. black, Yulia Tsvetkov

Contextual word embeddings such as BERT have achieved state of the art performance in numerous NLP tasks.

Word Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.