Search Results for author: Simon Clematide

Found 34 papers, 6 papers with code

Evaluation of Transfer Learning and Domain Adaptation for Analyzing German-Speaking Job Advertisements

no code implementations LREC 2022 Ann-Sophie Gnehm, Eva Bühlmann, Simon Clematide

Our main contribution consists in building language models which are adapted to the domain of job advertisements, and their assessment on a broad range of machine learning problems.

Domain Adaptation Transfer Learning

On Isotropy Calibration of Transformer Models

no code implementations insights (ACL) 2022 Yue Ding, Karolis Martinkus, Damian Pascual, Simon Clematide, Roger Wattenhofer

Different studies of the embedding space of transformer models suggest that the distribution of contextual representations is highly anisotropic - the embeddings are distributed in a narrow cone.

CLUZH at SIGMORPHON 2021 Shared Task on Multilingual Grapheme-to-Phoneme Conversion: Variations on a Baseline

no code implementations ACL (SIGMORPHON) 2021 Simon Clematide, Peter Makarov

This paper describes the submission by the team from the Department of Computational Linguistics, Zurich University, to the Multilingual Grapheme-to-Phoneme Conversion (G2P) Task 1 of the SIGMORPHON 2021 challenge in the low and medium settings.

Imitation Learning

Text Zoning and Classification for Job Advertisements in German, French and English

no code implementations EMNLP (NLP+CSS) 2020 Ann-Sophie Gnehm, Simon Clematide

We present experiments to structure job ads into text zones and classify them into pro- fessions, industries and management functions, thereby facilitating social science analyses on labor marked demand.

Management

Transformer-based HTR for Historical Documents

1 code implementation21 Mar 2022 Phillip Benjamin Ströbel, Simon Clematide, Martin Volk, Tobias Hodel

We apply the TrOCR framework to real-world, historical manuscripts and show that TrOCR per se is a strong model, ideal for transfer learning.

HTR Transfer Learning

Evaluation of HTR models without Ground Truth Material

1 code implementation LREC 2022 Phillip Benjamin Ströbel, Simon Clematide, Martin Volk, Raphael Schwitter, Tobias Hodel, David Schoch

The evaluation of Handwritten Text Recognition (HTR) models during their development is straightforward: because HTR is a supervised problem, the usual data split into training, validation, and test data sets allows the evaluation of models in terms of accuracy or error rates.

Handwritten Text Recognition HTR +1

On Isotropy Calibration of Transformers

no code implementations27 Sep 2021 Yue Ding, Karolis Martinkus, Damian Pascual, Simon Clematide, Roger Wattenhofer

Different studies of the embedding space of transformer models suggest that the distribution of contextual representations is highly anisotropic - the embeddings are distributed in a narrow cone.

Semi-supervised Contextual Historical Text Normalization

no code implementations ACL 2020 Peter Makarov, Simon Clematide

Historical text normalization, the task of mapping historical word forms to their modern counterparts, has recently attracted a lot of interest (Bollmann, 2019; Tang et al., 2018; Lusetti et al., 2018; Bollmann et al., 2018;Robertson and Goldwater, 2018; Bollmannet al., 2017; Korchagina, 2017).

Language Modelling

Language Resources for Historical Newspapers: the Impresso Collection

no code implementations LREC 2020 Maud Ehrmann, Matteo Romanello, Simon Clematide, Phillip Benjamin Str{\"o}bel, Rapha{\"e}l Barman

If this represents a huge step forward in terms of preservation and accessibility, the next fundamental challenge{--} and real promise of digitization{--} is to exploit the contents of these digital assets, and therefore to adapt and develop appropriate language technologies to search and retrieve information from this {`}Big Data of the Past{'}.

Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

3 code implementations14 Feb 2020 Raphaël Barman, Maud Ehrmann, Simon Clematide, Sofia Ares Oliveira, Frédéric Kaplan

The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration.

Document Layout Analysis Semantic Segmentation

Geotagging a Diachronic Corpus of Alpine Texts: Comparing Distinct Approaches to Toponym Recognition

no code implementations RANLP 2019 Tannon Kew, Anastassia Shaitarova, Isabel Meraner, Janis Goldzycher, Simon Clematide, Martin Volk

Geotagging historic and cultural texts provides valuable access to heritage data, enabling location-based searching and new geographically related discoveries.

Toponym Recognition

Imitation Learning for Neural Morphological String Transduction

1 code implementation EMNLP 2018 Peter Makarov, Simon Clematide

We employ imitation learning to train a neural transition-based string transducer for morphological tasks such as inflection generation and lemmatization.

Imitation Learning Lemmatization

Neural Transition-based String Transduction for Limited-Resource Setting in Morphology

1 code implementation COLING 2018 Peter Makarov, Simon Clematide

We present a neural transition-based model that uses a simple set of edit actions (copy, delete, insert) for morphological transduction tasks such as inflection generation, lemmatization, and reinflection.

Lemmatization Machine Translation +1

Align and Copy: UZH at SIGMORPHON 2017 Shared Task for Morphological Reinflection

no code implementations CONLL 2017 Peter Makarov, Tatiana Ruzsics, Simon Clematide

The second approach is a neural state-transition system over a set of explicit edit actions, including a designated COPY action.

LEMMA

Stance Detection in Facebook Posts of a German Right-wing Party

no code implementations WS 2017 Manfred Klenner, Don Tuggener, Simon Clematide

We argue that in order to detect stance, not only the explicit attitudes of the stance holder towards the targets are crucial.

Relation Extraction Stance Detection

Crowdsourcing an OCR Gold Standard for a German and French Heritage Corpus

no code implementations LREC 2016 Simon Clematide, Lenz Furrer, Martin Volk

Crowdsourcing approaches for post-correction of OCR output (Optical Character Recognition) have been successfully applied to several historic text collections.

Optical Character Recognition Optical Character Recognition (OCR)

Collaboratively Annotating Multilingual Parallel Corpora in the Biomedical Domain---some MANTRAs

no code implementations LREC 2014 Johannes Hellrich, Simon Clematide, Udo Hahn, Dietrich Rebholz-Schuhmann

The coverage of multilingual biomedical resources is high for the English language, yet sparse for non-English languages―an observation which holds for seemingly well-resourced, yet still dramatically low-resourced ones such as Spanish, French or German but even more so for really under-resourced ones such as Dutch.

Named Entity Recognition (NER) Translation

Using Large Biomedical Databases as Gold Annotations for Automatic Relation Extraction

no code implementations LREC 2014 Tilia Ellendorff, Fabio Rinaldi, Simon Clematide

We show how to use large biomedical databases in order to obtain a gold standard for training a machine learning system over a corpus of biomedical text.

Document Classification Entity Extraction using GAN +2

Dependency parsing for interaction detection in pharmacogenomics

no code implementations LREC 2012 Gerold Schneider, Fabio Rinaldi, Simon Clematide

We give an overview of our approach to the extraction of interactions between pharmacogenomic entities like drugs, genes and diseases and suggest classes of interaction types driven by data from PharmGKB and partly following the top level ontology WordNet and biomedical types from BioNLP.

Dependency Parsing

Cannot find the paper you are looking for? You can Submit a new open access paper.