`BonTen' -- Corpus Concordance System for `NINJAL Web Japanese Corpus'
The National Institute for Japanese Language and Linguistics, Japan (NINJAL) has undertaken a corpus compilation project to construct a web corpus for linguistic research comprising ten billion words. The project is divided into four parts: page collection, linguistic analysis, development of the corpus concordance system, and preservation. This article presents the corpus concordance system named {`}BonTen{'} which enables the ten-billion-scaled corpus to be queried by string, a sequence of morphological information or a subtree of the syntactic dependency structure.
PDF Abstract COLING 2016 PDF COLING 2016 Abstract