Search Results for author: Jan Haji{\v{c}}

Found 34 papers, 2 papers with code

Not an Interlingua, But Close: Comparison of English AMRs to Chinese and Czech

no code implementations LREC 2014 Nianwen Xue, Ond{\v{r}}ej Bojar, Jan Haji{\v{c}}, Martha Palmer, Zde{\v{n}}ka Ure{\v{s}}ov{\'a}, Xiuhong Zhang

Abstract Meaning Representations (AMRs) are rooted, directional and labeled graphs that abstract away from morpho-syntactic idiosyncrasies such as word category (verbs and nouns), word order, and function words (determiners, some prepositions).

Machine Translation Semantic Parsing +2

CLARA: A New Generation of Researchers in Common Language Resources and Their Applications

no code implementations LREC 2014 Koenraad De Smedt, Erhard Hinrichs, Detmar Meurers, Inguna Skadi{\c{n}}a, Bolette Pedersen, Costanza Navarretta, N{\'u}ria Bel, Krister Lind{\'e}n, Mark{\'e}ta Lopatkov{\'a}, Jan Haji{\v{c}}, Gisle Andersen, Przemyslaw Lenkiewicz

CLARA (Common Language Resources and Their Applications) is a Marie Curie Initial Training Network which ran from 2009 until 2014 with the aim of providing researcher training in crucial areas related to language resources and infrastructure.

QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages

no code implementations LREC 2016 Arantxa Otegi, Nora Aranberri, Antonio Branco, Jan Haji{\v{c}}, Martin Popel, Kiril Simov, Eneko Agirre, Petya Osenova, Rita Pereira, Jo{\~a}o Silva, Steven Neale

This work presents parallel corpora automatically annotated with several NLP tools, including lemma and part-of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference.

Cross-Lingual Transfer Entity Disambiguation +9

UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing

no code implementations LREC 2016 Milan Straka, Jan Haji{\v{c}}, Jana Strakov{\'a}

Automatic natural language processing of large texts often presents recurring challenges in multiple languages: even for most advanced tasks, the texts are first processed by basic processing steps {--} from tokenization to parsing.

Dependency Parsing Lemmatization +4

Enriching a Valency Lexicon by Deverbative Nouns

no code implementations WS 2016 Eva Fu{\v{c}}{\'\i}kov{\'a}, Jan Haji{\v{c}}, Zde{\v{n}}ka Ure{\v{s}}ov{\'a}

The motivation for the task is to extend a verbal valency (i. e., predicate-argument) lexicon by adding nouns that share the valency properties with the base verb, assuming their properties can be derived (even if not trivially) from the underlying verb by deterministic grammatical rules.

Joint search in a bilingual valency lexicon and an annotated corpus

no code implementations COLING 2016 Eva Fu{\v{c}}{\'\i}kov{\'a}, Jan Haji{\v{c}}, Zde{\v{n}}ka Ure{\v{s}}ov{\'a}

In this paper and the associated system demo, we present an advanced search system that allows to perform a joint search over a (bilingual) valency lexicon and a correspondingly annotated linked parallel corpus.

Synonymy in Bilingual Context: The CzEngClass Lexicon

no code implementations COLING 2018 Zde{\v{n}}ka Ure{\v{s}}ov{\'a}, Eva Fu{\v{c}}{\'\i}kov{\'a}, Eva Haji{\v{c}}ov{\'a}, Jan Haji{\v{c}}

This paper describes CzEngClass, a bilingual lexical resource being built to investigate verbal synonymy in bilingual context and to relate semantic roles common to one synonym class to verb arguments (verb valency).

Word Sense Disambiguation

LemmaTag: Jointly Tagging and Lemmatizing for Morphologically Rich Languages with BRNNs

1 code implementation EMNLP 2018 Daniel Kondratyuk, Tom{\'a}{\v{s}} Gaven{\v{c}}iak, Milan Straka, Jan Haji{\v{c}}

We present LemmaTag, a featureless neural network architecture that jointly generates part-of-speech tags and lemmas for sentences by using bidirectional RNNs with character-level and word-level embeddings.

Lemmatization Machine Translation +4

CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

no code implementations CONLL 2018 Daniel Zeman, Jan Haji{\v{c}}, Martin Popel, Martin Potthast, Milan Straka, Filip Ginter, Joakim Nivre, Slav Petrov

Every year, the Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets.

Dependency Parsing Morphological Analysis +1

Prague Dependency Treebank - Consolidated 1.0

no code implementations LREC 2020 Jan Haji{\v{c}}, Eduard Bej{\v{c}}ek, Jaroslava Hlavacova, Marie Mikulov{\'a}, Milan Straka, Jan {\v{S}}t{\v{e}}p{\'a}nek, Barbora {\v{S}}t{\v{e}}p{\'a}nkov{\'a}

We present a richly annotated and genre-diversified language resource, the Prague Dependency Treebank-Consolidated 1. 0 (PDT-C 1. 0), the purpose of which is - as it always been the case for the family of the Prague Dependency Treebanks - to serve both as a training data for various types of NLP tasks as well as for linguistically-oriented research.

Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.