no code implementations • 14 Feb 2024 • Phillip Rust, Bowen Shi, Skyler Wang, Necati Cihan Camgöz, Jean Maillard
A major impediment to the advancement of sign language translation (SLT) is data scarcity.
no code implementations • 1 Nov 2023 • Jonas F. Lotz, Elizabeth Salesky, Phillip Rust, Desmond Elliott
Pixel-based language models process text rendered as images, which allows them to handle any script, making them a promising approach to open vocabulary language modelling.
1 code implementation • 22 Oct 2023 • Nadav Borenstein, Phillip Rust, Desmond Elliott, Isabelle Augenstein
We then pre-train our model, PHD, on a combination of synthetic scans and real historical newspapers from the 1700-1900 period.
no code implementations • 17 Aug 2023 • Phillip Rust, Anders Søgaard
Language models such as mBERT, XLM-R, and BLOOM aim to achieve multilingual generalization or compression to facilitate transfer to a large number of (potentially unseen) languages.
1 code implementation • 14 Jul 2022 • Phillip Rust, Jonas F. Lotz, Emanuele Bugliarello, Elizabeth Salesky, Miryam de Lhoneux, Desmond Elliott
We pretrain the 86M parameter PIXEL model on the same English data as BERT and evaluate on syntactic and semantic tasks in typologically diverse languages, including various non-Latin scripts.
Ranked #1 on Named Entity Recognition (NER) on MasakhaNER
no code implementations • ACL 2022 • Daniel Hershcovich, Stella Frank, Heather Lent, Miryam de Lhoneux, Mostafa Abdou, Stephanie Brandl, Emanuele Bugliarello, Laura Cabello Piqueras, Ilias Chalkidis, Ruixiang Cui, Constanza Fierro, Katerina Margatina, Phillip Rust, Anders Søgaard
Various efforts in the Natural Language Processing (NLP) community have been made to accommodate linguistic diversity and serve speakers of many different languages.
1 code implementation • ACL 2021 • Phillip Rust, Jonas Pfeiffer, Ivan Vulić, Sebastian Ruder, Iryna Gurevych
In this work, we provide a systematic and comprehensive empirical comparison of pretrained multilingual language models versus their monolingual counterparts with regard to their monolingual task performance.
no code implementations • ACL 2020 • Gözde Gül Şahin, Yova Kementchedjhieva, Phillip Rust, Iryna Gurevych
To expose this problem in a new light, we introduce a challenge on learning from small data, PuzzLing Machines, which consists of Rosetta Stone puzzles from Linguistic Olympiads for high school students.