Search Results for author: Alan Ansell

Found 7 papers, 4 papers with code

MAD-G: Multilingual Adapter Generation for Efficient Cross-Lingual Transfer

no code implementations Findings (EMNLP) 2021 Alan Ansell, Edoardo Maria Ponti, Jonas Pfeiffer, Sebastian Ruder, Goran Glavaš, Ivan Vulić, Anna Korhonen

While offering (1) improved fine-tuning efficiency (by a factor of around 50 in our experiments), (2) a smaller parameter budget, and (3) increased language coverage, MAD-G remains competitive with more expensive methods for language-specific adapter training across the board.

Dependency Parsing named-entity-recognition +4

Scaling Sparse Fine-Tuning to Large Language Models

2 code implementations29 Jan 2024 Alan Ansell, Ivan Vulić, Hannah Sterz, Anna Korhonen, Edoardo M. Ponti

We experiment with instruction-tuning of LLMs on standard dataset mixtures, finding that SpIEL is often superior to popular parameter-efficient fine-tuning methods like LoRA (low-rank adaptation) in terms of performance and comparable in terms of run time.

Quantization

Cross-Lingual Transfer with Target Language-Ready Task Adapters

no code implementations5 Jun 2023 Marinela Parović, Alan Ansell, Ivan Vulić, Anna Korhonen

We address this mismatch by exposing the task adapter to the target language adapter during training, and empirically validate several variants of the idea: in the simplest form, we alternate between using the source and target language adapters during task adapter training, which can be generalized to cycling over any set of language adapters.

Zero-Shot Cross-Lingual Transfer

Distilling Efficient Language-Specific Models for Cross-Lingual Transfer

1 code implementation2 Jun 2023 Alan Ansell, Edoardo Maria Ponti, Anna Korhonen, Ivan Vulić

Specifically, we use a two-phase distillation approach, termed BiStil: (i) the first phase distils a general bilingual model from the MMT, while (ii) the second, task-specific phase sparsely fine-tunes the bilingual "student" model using a task-tuned variant of the original MMT as its "teacher".

Transfer Learning XLM-R +1

PolyLM: Learning about Polysemy through Language Modeling

1 code implementation EACL 2021 Alan Ansell, Felipe Bravo-Marquez, Bernhard Pfahringer

To avoid the "meaning conflation deficiency" of word embeddings, a number of models have aimed to embed individual word senses.

Language Modelling Word Embeddings +1

Cannot find the paper you are looking for? You can Submit a new open access paper.