Adapting the TANL tool suite to Universal Dependencies

LREC 2016  ·  Maria Simi, Giuseppe Attardi ·

TANL is a suite of tools for text analytics based on the software architecture paradigm of data driven pipelines. The strategies for upgrading TANL to the use of Universal Dependencies range from a minimalistic approach consisting of introducing pre/post-processing steps into the native pipeline to revising the whole pipeline. We explore the issue in the context of the Italian Treebank, considering both the efforts involved, how to avoid losing linguistically relevant information and the loss of accuracy in the process. In particular we compare different strategies for parsing and discuss the implications of simplifying the pipeline when detailed part-of-speech and morphological annotations are not available, as it is the case for less resourceful languages. The experiments are relative to the Italian linguistic pipeline, but the use of different parsers in our evaluations and the avoidance of language specific tagging make the results general enough to be useful in helping the transition to UD for other languages.

PDF Abstract LREC 2016 PDF LREC 2016 Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here