CLIRMatrix is a large collection of bilingual and multilingual datasets for Cross-Lingual Information Retrieval. It includes:
In total, 49 million unique queries and 34 billion (query, document, label) triplets were mined, making CLIRMatrix the largest and most comprehensive CLIR dataset to date.
Source: CLIRMatrix: A massively large collection of bilingual and multilingual datasets for Cross-Lingual Information RetrievalPaper | Code | Results | Date | Stars |
---|