The corpus represents the largest existing corpus of Catalan containing 687 million words, which is a significant increase given that until now the biggest corpus of Catalan, CuCWeb, counts 166 million words.
Source: caWaC -- A web corpus of Catalan and its application to language modeling and machine translationPaper | Code | Results | Date | Stars |
---|