Search Results for author: Simon Clematide

Found 34 papers, 6 papers with code

Searching for Legal Documents at Paragraph Level: Automating Label Generation and Use of an Extended Attention Mask for Boosting Neural Models of Semantic Similarity

no code implementations • EMNLP (NLLP) 2021 • Li Tang, Simon Clematide

For this domain, we formulated a task to compare different ways of modeling semantic similarity at paragraph level, using neural and non-neural systems.

Information Retrieval Legal Reasoning +3

Paper
Add Code

CLUZH at SIGMORPHON 2022 Shared Tasks on Morpheme Segmentation and Inflection Generation

1 code implementation • NAACL (SIGMORPHON) 2022 • Silvan Wehrli, Simon Clematide, Peter Makarov

We report competitive results for morpheme segmentation (including sharing first place in part 2 of the challenge).

Ranked #3 on Morpheme Segmentaiton on UniMorph 4.0 (f1 macro avg (subtask 2) metric)

Morpheme Segmentaiton Morphological Inflection +2

Paper
Code

On Isotropy Calibration of Transformer Models

no code implementations • insights (ACL) 2022 • Yue Ding, Karolis Martinkus, Damian Pascual, Simon Clematide, Roger Wattenhofer

Different studies of the embedding space of transformer models suggest that the distribution of contextual representations is highly anisotropic - the embeddings are distributed in a narrow cone.

Paper
Add Code

Evaluation of Transfer Learning and Domain Adaptation for Analyzing German-Speaking Job Advertisements

no code implementations • LREC 2022 • Ann-Sophie Gnehm, Eva Bühlmann, Simon Clematide

Our main contribution consists in building language models which are adapted to the domain of job advertisements, and their assessment on a broad range of machine learning problems.

Domain Adaptation Transfer Learning

Paper
Add Code

CLUZH at SIGMORPHON 2021 Shared Task on Multilingual Grapheme-to-Phoneme Conversion: Variations on a Baseline

no code implementations • ACL (SIGMORPHON) 2021 • Simon Clematide, Peter Makarov

This paper describes the submission by the team from the Department of Computational Linguistics, Zurich University, to the Multilingual Grapheme-to-Phoneme Conversion (G2P) Task 1 of the SIGMORPHON 2021 challenge in the low and medium settings.

Decoder Imitation Learning

Paper
Add Code

Results of the Second SIGMORPHON Shared Task on Multilingual Grapheme-to-Phoneme Conversion

no code implementations • ACL (SIGMORPHON) 2021 • Lucas F.E. Ashby, Travis M. Bartley, Simon Clematide, Luca Del Signore, Cameron Gibson, Kyle Gorman, Yeonju Lee-Sikka, Peter Makarov, Aidan Malanoski, Sean Miller, Omar Ortiz, Reuben Raff, Arundhati Sengupta, Bora Seo, Yulia Spektor, Winnie Yan

Grapheme-to-phoneme conversion is an important component in many speech technologies, but until recently there were no multilingual benchmarks for this task.

Paper
Add Code

Text Zoning and Classification for Job Advertisements in German, French and English

no code implementations • EMNLP (NLP+CSS) 2020 • Ann-Sophie Gnehm, Simon Clematide

We present experiments to structure job ads into text zones and classify them into pro- fessions, industries and management functions, thereby facilitating social science analyses on labor marked demand.

Management

Paper
Add Code

Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language Models

no code implementations • 11 Apr 2024 • Andreas Säuberli, Simon Clematide

We then used this protocol and the dataset to evaluate the quality of items generated by Llama 2 and GPT-4.

Multiple-choice Reading Comprehension

Paper
Add Code

UZH_CLyp at SemEval-2023 Task 9: Head-First Fine-Tuning and ChatGPT Data Generation for Cross-Lingual Learning in Tweet Intimacy Prediction

no code implementations • 2 Mar 2023 • Andrianos Michail, Stefanos Konstantinou, Simon Clematide

This paper describes the submission of UZH_CLyp for the SemEval 2023 Task 9 "Multilingual Tweet Intimacy Analysis".

Cross-Lingual Transfer Domain Adaptation +3

Paper
Add Code

Transformer-based HTR for Historical Documents

1 code implementation • 21 Mar 2022 • Phillip Benjamin Ströbel, Simon Clematide, Martin Volk, Tobias Hodel

We apply the TrOCR framework to real-world, historical manuscripts and show that TrOCR per se is a strong model, ideal for transfer learning.

HTR Transfer Learning

Paper
Code

Evaluation of HTR models without Ground Truth Material

1 code implementation • LREC 2022 • Phillip Benjamin Ströbel, Simon Clematide, Martin Volk, Raphael Schwitter, Tobias Hodel, David Schoch

The evaluation of Handwritten Text Recognition (HTR) models during their development is straightforward: because HTR is a supervised problem, the usual data split into training, validation, and test data sets allows the evaluation of models in terms of accuracy or error rates.

Handwritten Text Recognition HTR +1

Paper
Code

On Isotropy Calibration of Transformers

no code implementations • 27 Sep 2021 • Yue Ding, Karolis Martinkus, Damian Pascual, Simon Clematide, Roger Wattenhofer

Different studies of the embedding space of transformer models suggest that the distribution of contextual representations is highly anisotropic - the embeddings are distributed in a narrow cone.

Paper
Add Code

Semi-supervised Contextual Historical Text Normalization

no code implementations • ACL 2020 • Peter Makarov, Simon Clematide

Historical text normalization, the task of mapping historical word forms to their modern counterparts, has recently attracted a lot of interest (Bollmann, 2019; Tang et al., 2018; Lusetti et al., 2018; Bollmann et al., 2018;Robertson and Goldwater, 2018; Bollmannet al., 2017; Korchagina, 2017).

Language Modelling

Paper
Add Code

CLUZH at SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion

no code implementations • WS 2020 • Peter Makarov, Simon Clematide

The submission adapts our system from the 2018 edition of the SIGMORPHON shared task.

Imitation Learning

Paper
Add Code

How Much Data Do You Need? About the Creation of a Ground Truth for Black Letter and the Effectiveness of Neural OCR

no code implementations • LREC 2020 • Phillip Benjamin Str{\"o}bel, Simon Clematide, Martin Volk

Recent advances in Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR) have led to more accurate textrecognition of historical documents.

Handwritten Text Recognition HTR +2

Paper
Add Code

Language Resources for Historical Newspapers: the Impresso Collection

no code implementations • LREC 2020 • Maud Ehrmann, Matteo Romanello, Simon Clematide, Phillip Benjamin Str{\"o}bel, Rapha{\"e}l Barman

If this represents a huge step forward in terms of preservation and accessibility, the next fundamental challenge{--} and real promise of digitization{--} is to exploit the contents of these digital assets, and therefore to adapt and develop appropriate language technologies to search and retrieve information from this {`}Big Data of the Past{'}.

Paper
Add Code

Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

3 code implementations • 14 Feb 2020 • Raphaël Barman, Maud Ehrmann, Simon Clematide, Sofia Ares Oliveira, Frédéric Kaplan

The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration.

Document Layout Analysis Semantic Segmentation

Paper
Code

Geotagging a Diachronic Corpus of Alpine Texts: Comparing Distinct Approaches to Toponym Recognition

no code implementations • RANLP 2019 • Tannon Kew, Anastassia Shaitarova, Isabel Meraner, Janis Goldzycher, Simon Clematide, Martin Volk

Geotagging historic and cultural texts provides valuable access to heritage data, enabling location-based searching and new geographically related discoveries.

Toponym Recognition

Paper
Add Code

UZH at CoNLL--SIGMORPHON 2018 Shared Task on Universal Morphological Reinflection

no code implementations • CONLL 2018 • Peter Makarov, Simon Clematide

Imitation Learning Morphological Inflection

Paper
Add Code

Imitation Learning for Neural Morphological String Transduction

1 code implementation • EMNLP 2018 • Peter Makarov, Simon Clematide

We employ imitation learning to train a neural transition-based string transducer for morphological tasks such as inflection generation and lemmatization.

Imitation Learning Lemmatization

Paper
Code

Neural Transition-based String Transduction for Limited-Resource Setting in Morphology

1 code implementation • COLING 2018 • Peter Makarov, Simon Clematide

We present a neural transition-based model that uses a simple set of edit actions (copy, delete, insert) for morphological transduction tasks such as inflection generation, lemmatization, and reinflection.

Lemmatization Machine Translation +1

Paper
Code

Strategies and Challenges for Crowdsourcing Regional Dialect Perception Data for Swiss German and Swiss French

no code implementations • LREC 2018 • Jean-Philippe Goldman, Simon Clematide, Mathieu Avanzi, T, Raphael ler

Paper
Add Code

Align and Copy: UZH at SIGMORPHON 2017 Shared Task for Morphological Reinflection

no code implementations • CONLL 2017 • Peter Makarov, Tatiana Ruzsics, Simon Clematide

The second approach is a neural state-transition system over a set of explicit edit actions, including a designated COPY action.

Decoder LEMMA

Paper
Add Code

CLUZH at VarDial GDI 2017: Testing a Variety of Machine Learning Tools for the Classification of Swiss German Dialects

no code implementations • WS 2017 • Simon Clematide, Peter Makarov

Measured by classification accuracy, our ensemble run (Na{\"\i}ve Bayes, CRF, SVM) reaches 67{\%} (second rank) being 1{\%} lower than the best system.

General Classification Language Identification +1

Paper
Add Code

Stance Detection in Facebook Posts of a German Right-wing Party

no code implementations • WS 2017 • Manfred Klenner, Don Tuggener, Simon Clematide

We argue that in order to detect stance, not only the explicit attitudes of the stance holder towards the targets are crucial.

Relation Extraction Stance Detection

Paper
Add Code

How Factuality Determines Sentiment Inferences

no code implementations • SEMEVAL 2016 • Manfred Klenner, Simon Clematide

Common Sense Reasoning

Paper
Add Code

Crowdsourcing an OCR Gold Standard for a German and French Heritage Corpus

no code implementations • LREC 2016 • Simon Clematide, Lenz Furrer, Martin Volk

Crowdsourcing approaches for post-correction of OCR output (Optical Character Recognition) have been successfully applied to several historic text collections.

Optical Character Recognition Optical Character Recognition (OCR)

Paper
Add Code

Detecting Code-Switching in a Multilingual Alpine Heritage Corpus

no code implementations • WS 2014 • Martin Volk, Simon Clematide

Language Identification Named Entity Recognition (NER) +1

Paper
Add Code

Collaboratively Annotating Multilingual Parallel Corpora in the Biomedical Domain---some MANTRAs

no code implementations • LREC 2014 • Johannes Hellrich, Simon Clematide, Udo Hahn, Dietrich Rebholz-Schuhmann

The coverage of multilingual biomedical resources is high for the English language, yet sparse for non-English languagesâ€•an observation which holds for seemingly well-resourced, yet still dramatically low-resourced ones such as Spanish, French or German but even more so for really under-resourced ones such as Dutch.

Named Entity Recognition (NER) Translation

Paper
Add Code

Using Large Biomedical Databases as Gold Annotations for Automatic Relation Extraction

no code implementations • LREC 2014 • Tilia Ellendorff, Fabio Rinaldi, Simon Clematide

We show how to use large biomedical databases in order to obtain a gold standard for training a machine learning system over a corpus of biomedical text.

Document Classification Entity Extraction using GAN +2

Paper
Add Code

A Pilot Study on the Semantic Classification of Two German Prepositions: Combining Monolingual and Multilingual Evidence

no code implementations • RANLP 2013 • Simon Clematide, Manfred Klenner

General Classification Machine Translation +1

Paper
Add Code

UZH in BioNLP 2013

no code implementations • WS 2013 • Gerold Schneider, Simon Clematide, Tilia Ellendorff, Don Tuggener, Fabio Rinaldi, Gintar{\.e} Grigonyt{\.e}

Chunking Dependency Parsing +2

Paper
Add Code

MLSA --- A Multi-layered Reference Corpus for German Sentiment Analysis

no code implementations • LREC 2012 • Simon Clematide, Stefan Gindl, Manfred Klenner, Stefanos Petrakis, Robert Remus, Josef Ruppenhofer, Ulli Waltinger, Michael Wiegand

The construction of the corpus is based on the manual annotation of 270 German-language sentences considering three different layers of granularity.

Opinion Mining Question Answering +2

Paper
Add Code

Dependency parsing for interaction detection in pharmacogenomics

no code implementations • LREC 2012 • Gerold Schneider, Fabio Rinaldi, Simon Clematide

We give an overview of our approach to the extraction of interactions between pharmacogenomic entities like drugs, genes and diseases and suggest classes of interaction types driven by data from PharmGKB and partly following the top level ontology WordNet and biomedical types from BioNLP.

Dependency Parsing

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.