Search Results for author: Jaume Zaragoza-Bernabeu

Found 7 papers, 3 papers with code

Human evaluation of web-crawled parallel corpora for machine translation

no code implementations HumEval (ACL) 2022 Gema Ramírez-Sánchez, Marta Bañón, Jaume Zaragoza-Bernabeu, Sergio Ortiz Rojas

Quality assessment has been an ongoing activity of the series of ParaCrawl efforts to crawl massive amounts of parallel data from multilingual websites for 29 languages.

Machine Translation Translation

Bicleaner AI: Bicleaner Goes Neural

1 code implementation LREC 2022 Jaume Zaragoza-Bernabeu, Gema Ramírez-Sánchez, Marta Bañón, Sergio Ortiz Rojas

This paper describes the experiments carried out during the development of the latest version of Bicleaner, named Bicleaner AI, a tool that aims at detecting noisy sentences in parallel corpora.

Binary Classification Machine Translation +2

FastSpell: the LangId Magic Spell

no code implementations12 Apr 2024 Marta Bañón, Jaume Zaragoza-Bernabeu, Gema Ramírez-Sánchez, Sergio Ortiz-Rojas

Language identification is a crucial component in the automated production of language resources, particularly in multilingual and big data contexts.

Language Identification

A New Massive Multilingual Dataset for High-Performance Language Technologies

no code implementations20 Mar 2024 Ona de Gibert, Graeme Nail, Nikolay Arefyev, Marta Bañón, Jelmer Van der Linde, Shaoxiong Ji, Jaume Zaragoza-Bernabeu, Mikko Aulamo, Gema Ramírez-Sánchez, Andrey Kutuzov, Sampo Pyysalo, Stephan Oepen, Jörg Tiedemann

We present the HPLT (High Performance Language Technologies) language resources, a new massive multilingual dataset including both monolingual and bilingual corpora extracted from CommonCrawl and previously unused web crawls from the Internet Archive.

Language Modelling Machine Translation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.