corpus-tools.org: An Interoperable Generic Software Tool Set for Multi-layer Linguistic Corpora

This paper introduces an open source, interoperable generic software tool set catering for the entire workflow of creation, migration, annotation, query and analysis of multi-layer linguistic corpora. It consists of four components: Salt, a graph-based meta model and API for linguistic data, the common data model for the rest of the tool set; Pepper, a conversion tool and platform for linguistic data that can be used to convert many different linguistic formats into each other; Atomic, an extensible, platform-independent multi-layer desktop annotation software for linguistic corpora; ANNIS, a search and visualization architecture for multi-layer linguistic corpora with many different visualizations and a powerful native query language. The set was designed to solve the following issues in a multi-layer corpus workflow: Lossless data transition between tools through a common data model generic enough to allow for a potentially unlimited number of different types of annotation, conversion capabilities for different linguistic formats to cater for the processing of data from different sources and/or with existing annotations, a high level of extensibility to enhance the sustainability of the whole tool set, analysis capabilities encompassing corpus and annotation query alongside multi-faceted visualizations of all annotation layers.

PDF Abstract LREC 2016 PDF LREC 2016 Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here