no code implementations • 3 Nov 2023 • Lucas Georges Gabriel Charpentier, David Samuel
This paper introduces a novel modification of the transformer architecture, tailored for the data-efficient pretraining of language models.
1 code implementation • 30 Oct 2023 • David Samuel
This paper explores the use of latent bootstrapping, an alternative self-supervision technique, for pretraining language models.
1 code implementation • 13 Jun 2023 • Matias Jentoft, David Samuel
While there has been a surge of large language models for Norwegian in recent years, we lack any tool to evaluate their understanding of grammaticality.
1 code implementation • 13 Jun 2023 • David Samuel, Lilja Øvrelid
In recent years, language models have become increasingly larger and more complex.
1 code implementation • 6 May 2023 • David Samuel, Andrey Kutuzov, Samia Touileb, Erik Velldal, Lilja Øvrelid, Egil Rønningstad, Elina Sigdel, Anna Palatkina
We present NorBench: a streamlined suite of NLP tasks and probes for evaluating Norwegian language models (LMs) on standardized data splits and evaluation metrics.
1 code implementation • 19 Apr 2023 • Lucas Georges Gabriel Charpentier, Sondre Wold, David Samuel, Egil Rønningstad
After training, we also separate the language model, which we call the reader, from the retriever components, and show that this can be fine-tuned on a range of downstream tasks.
2 code implementations • 17 Mar 2023 • David Samuel, Andrey Kutuzov, Lilja Øvrelid, Erik Velldal
While modern masked language models (LMs) are trained on ever larger corpora, we here explore the effects of down-scaling training to a modestly-sized but representative, well-balanced, and publicly available English text source -- the British National Corpus.
1 code implementation • 18 Oct 2022 • Huiling You, David Samuel, Samia Touileb, Lilja Øvrelid
This paper presents our submission to the 2022 edition of the CASE 2021 shared task 1, subtask 4.
1 code implementation • 16 Oct 2022 • Huiling You, David Samuel, Samia Touileb, Lilja Øvrelid
Event extraction therefore becomes a graph parsing problem, which provides the following advantages: 1) performing event detection and argument extraction jointly; 2) detecting and extracting multiple events from a piece of text; and 3) capturing the complicated interaction between event arguments and triggers.
1 code implementation • ACL 2022 • David Samuel, Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Øvrelid, Erik Velldal
This paper demonstrates how a graph-based semantic parser can be applied to the task of structured sentiment analysis, directly predicting sentiment graphs from text.
1 code implementation • WNUT (ACL) 2021 • David Samuel, Milan Straka
We present the winning entry to the Multilingual Lexical Normalization (MultiLexNorm) shared task at W-NUT 2021 (van der Goot et al., 2021a), which evaluates lexical-normalization systems on 12 social media datasets in 11 languages.
no code implementations • 24 May 2021 • Milan Straka, Jakub Náplava, Jana Straková, David Samuel
We present RobeCzech, a monolingual RoBERTa language representation model trained on Czech data.
Ranked #1 on Semantic Parsing on PTG (czech, MRP 2020)
2 code implementations • 2 Nov 2020 • David Samuel, Milan Straka
PERIN was one of the winners of the shared task.
Ranked #1 on Semantic Parsing on DRG (english, MRP 2020)
1 code implementation • CONLL 2020 • David Samuel, Milan Straka
PERIN was one of the winners of the shared task.
1 code implementation • 17 Feb 2020 • David Samuel, Aditya Ganeshan, Jason Naradowsky
We propose a hierarchical meta-learning-inspired model for music source separation (Meta-TasNet) in which a generator model is used to predict the weights of individual extractor models.
Ranked #23 on Music Source Separation on MUSDB18