The corpus contains 1199 sessions and 79, 373 speeches, for a total of about 31 million words and was encoded according to the ParlaCLARIN TEI XML format, as well as in CoNLL-UD format.
Available language technology is hardly applicable to scarcely attested ancient languages, yet their digital semantic representation, though challenging, is an asset for the purpose of sharing and preserving existing cultural knowledge.
Poor digital representation of minority languages further prevents their usability on digital media and devices.
An experiment is presented to induce a set of polysemous basic type alternations (such as Animal-Food, or Building-Institution) by deriving them from the sense alternations found in an existing lexical resource.
The paper describes the multimodal enrichment of ItalWordNet action verbsÂ’ entries by means of an automatic mapping with an ontology of action types instantiated by video scenes (ImagAct).
This paper presents the platform developed in the PANACEA project, a distributed factory that automates the stages involved in the acquisition, production, updating and maintenance of Language Resources required by Machine Translation and other Language Technologies.
In this paper we describe the implementation of a pos-tagged-corpus converter, which is needed for chaining together in a workflow the Freeling PoS tagger for Italian and the DESR dependency parser, given that these two tools have been developed independently.
The FLaReNet Strategic Agenda highlights the most pressing needs for the sector of Language Resources and Technologies and presents a set of recommendations for its development and progress in Europe, as issued from a three-year consultation of the FLaReNet European project.
In addition to this, we assigned to the extracted entries of the lexicon a confidence score based on the relative frequency and evaluated the extractor on domain specific data.