BoostCLIR is a bilingual (Japanese-English) corpus of patent abstracts, extracted from the MAREC patent data, and the data from the NTCIR PatentMT workshop collections, accompanied with relevance judgements for the task of patent prior-art search.

Important: The English side of the corpus contains patent IDs as well as the text of the abstracts. The Japanese side only contains patent IDs because of NTCIR copyright restrictions. The Japanese patent abstracts can be extracted from full text Japanese patent documents, which are available from the organizers of the NTCIR workshop.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


Modalities


Languages