T2K\textasciicircum2: a System for Automatically Extracting and Organizing Knowledge from Texts

In this paper, we present T2K{\textasciicircum}2, a suite of tools for automatically extracting domain―specific knowledge from collections of Italian and English texts. T2K{\textasciicircum}2 (Text―To―Knowledge v2) relies on a battery of tools for Natural Language Processing (NLP), statistical text analysis and machine learning which are dynamically integrated to provide an accurate and incremental representation of the content of vast repositories of unstructured documents. Extracted knowledge ranges from domain―specific entities and named entities to the relations connecting them and can be used for indexing document collections with respect to different information types. T2K{\textasciicircum}2 also includes “linguistic profiling” functionalities aimed at supporting the user in constructing the acquisition corpus, e.g. in selecting texts belonging to the same genre or characterized by the same degree of specialization or in monitoring the “added value” of newly inserted documents. T2K{\textasciicircum}2 is a web application which can be accessed from any browser through a personal account which has been tested in a wide range of domains.

PDF Abstract
No code implementations yet. Submit your code now



Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here