no code implementations • EMNLP (NLLP) 2021 • Li Tang, Simon Clematide
For this domain, we formulated a task to compare different ways of modeling semantic similarity at paragraph level, using neural and non-neural systems.
1 code implementation • NAACL (SIGMORPHON) 2022 • Silvan Wehrli, Simon Clematide, Peter Makarov
We report competitive results for morpheme segmentation (including sharing first place in part 2 of the challenge).
Ranked #3 on Morpheme Segmentaiton on UniMorph 4.0 (f1 macro avg (subtask 2) metric)
no code implementations • insights (ACL) 2022 • Yue Ding, Karolis Martinkus, Damian Pascual, Simon Clematide, Roger Wattenhofer
Different studies of the embedding space of transformer models suggest that the distribution of contextual representations is highly anisotropic - the embeddings are distributed in a narrow cone.
no code implementations • LREC 2022 • Ann-Sophie Gnehm, Eva Bühlmann, Simon Clematide
Our main contribution consists in building language models which are adapted to the domain of job advertisements, and their assessment on a broad range of machine learning problems.
no code implementations • ACL (SIGMORPHON) 2021 • Simon Clematide, Peter Makarov
This paper describes the submission by the team from the Department of Computational Linguistics, Zurich University, to the Multilingual Grapheme-to-Phoneme Conversion (G2P) Task 1 of the SIGMORPHON 2021 challenge in the low and medium settings.
no code implementations • ACL (SIGMORPHON) 2021 • Lucas F.E. Ashby, Travis M. Bartley, Simon Clematide, Luca Del Signore, Cameron Gibson, Kyle Gorman, Yeonju Lee-Sikka, Peter Makarov, Aidan Malanoski, Sean Miller, Omar Ortiz, Reuben Raff, Arundhati Sengupta, Bora Seo, Yulia Spektor, Winnie Yan
Grapheme-to-phoneme conversion is an important component in many speech technologies, but until recently there were no multilingual benchmarks for this task.
no code implementations • EMNLP (NLP+CSS) 2020 • Ann-Sophie Gnehm, Simon Clematide
We present experiments to structure job ads into text zones and classify them into pro- fessions, industries and management functions, thereby facilitating social science analyses on labor marked demand.
no code implementations • 11 Apr 2024 • Andreas Säuberli, Simon Clematide
We then used this protocol and the dataset to evaluate the quality of items generated by Llama 2 and GPT-4.
no code implementations • 2 Mar 2023 • Andrianos Michail, Stefanos Konstantinou, Simon Clematide
This paper describes the submission of UZH_CLyp for the SemEval 2023 Task 9 "Multilingual Tweet Intimacy Analysis".
1 code implementation • 21 Mar 2022 • Phillip Benjamin Ströbel, Simon Clematide, Martin Volk, Tobias Hodel
We apply the TrOCR framework to real-world, historical manuscripts and show that TrOCR per se is a strong model, ideal for transfer learning.
1 code implementation • LREC 2022 • Phillip Benjamin Ströbel, Simon Clematide, Martin Volk, Raphael Schwitter, Tobias Hodel, David Schoch
The evaluation of Handwritten Text Recognition (HTR) models during their development is straightforward: because HTR is a supervised problem, the usual data split into training, validation, and test data sets allows the evaluation of models in terms of accuracy or error rates.
no code implementations • 27 Sep 2021 • Yue Ding, Karolis Martinkus, Damian Pascual, Simon Clematide, Roger Wattenhofer
Different studies of the embedding space of transformer models suggest that the distribution of contextual representations is highly anisotropic - the embeddings are distributed in a narrow cone.
no code implementations • ACL 2020 • Peter Makarov, Simon Clematide
Historical text normalization, the task of mapping historical word forms to their modern counterparts, has recently attracted a lot of interest (Bollmann, 2019; Tang et al., 2018; Lusetti et al., 2018; Bollmann et al., 2018;Robertson and Goldwater, 2018; Bollmannet al., 2017; Korchagina, 2017).
no code implementations • WS 2020 • Peter Makarov, Simon Clematide
The submission adapts our system from the 2018 edition of the SIGMORPHON shared task.
no code implementations • LREC 2020 • Phillip Benjamin Str{\"o}bel, Simon Clematide, Martin Volk
Recent advances in Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR) have led to more accurate textrecognition of historical documents.
no code implementations • LREC 2020 • Maud Ehrmann, Matteo Romanello, Simon Clematide, Phillip Benjamin Str{\"o}bel, Rapha{\"e}l Barman
If this represents a huge step forward in terms of preservation and accessibility, the next fundamental challenge{--} and real promise of digitization{--} is to exploit the contents of these digital assets, and therefore to adapt and develop appropriate language technologies to search and retrieve information from this {`}Big Data of the Past{'}.
3 code implementations • 14 Feb 2020 • Raphaël Barman, Maud Ehrmann, Simon Clematide, Sofia Ares Oliveira, Frédéric Kaplan
The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration.
no code implementations • RANLP 2019 • Tannon Kew, Anastassia Shaitarova, Isabel Meraner, Janis Goldzycher, Simon Clematide, Martin Volk
Geotagging historic and cultural texts provides valuable access to heritage data, enabling location-based searching and new geographically related discoveries.
1 code implementation • EMNLP 2018 • Peter Makarov, Simon Clematide
We employ imitation learning to train a neural transition-based string transducer for morphological tasks such as inflection generation and lemmatization.
1 code implementation • COLING 2018 • Peter Makarov, Simon Clematide
We present a neural transition-based model that uses a simple set of edit actions (copy, delete, insert) for morphological transduction tasks such as inflection generation, lemmatization, and reinflection.
no code implementations • CONLL 2017 • Peter Makarov, Tatiana Ruzsics, Simon Clematide
The second approach is a neural state-transition system over a set of explicit edit actions, including a designated COPY action.
no code implementations • WS 2017 • Simon Clematide, Peter Makarov
Measured by classification accuracy, our ensemble run (Na{\"\i}ve Bayes, CRF, SVM) reaches 67{\%} (second rank) being 1{\%} lower than the best system.
no code implementations • WS 2017 • Manfred Klenner, Don Tuggener, Simon Clematide
We argue that in order to detect stance, not only the explicit attitudes of the stance holder towards the targets are crucial.
no code implementations • LREC 2016 • Simon Clematide, Lenz Furrer, Martin Volk
Crowdsourcing approaches for post-correction of OCR output (Optical Character Recognition) have been successfully applied to several historic text collections.
Optical Character Recognition Optical Character Recognition (OCR)
no code implementations • LREC 2014 • Johannes Hellrich, Simon Clematide, Udo Hahn, Dietrich Rebholz-Schuhmann
The coverage of multilingual biomedical resources is high for the English language, yet sparse for non-English languages―an observation which holds for seemingly well-resourced, yet still dramatically low-resourced ones such as Spanish, French or German but even more so for really under-resourced ones such as Dutch.
no code implementations • LREC 2014 • Tilia Ellendorff, Fabio Rinaldi, Simon Clematide
We show how to use large biomedical databases in order to obtain a gold standard for training a machine learning system over a corpus of biomedical text.
no code implementations • LREC 2012 • Simon Clematide, Stefan Gindl, Manfred Klenner, Stefanos Petrakis, Robert Remus, Josef Ruppenhofer, Ulli Waltinger, Michael Wiegand
The construction of the corpus is based on the manual annotation of 270 German-language sentences considering three different layers of granularity.
no code implementations • LREC 2012 • Gerold Schneider, Fabio Rinaldi, Simon Clematide
We give an overview of our approach to the extraction of interactions between pharmacogenomic entities like drugs, genes and diseases and suggest classes of interaction types driven by data from PharmGKB and partly following the top level ontology WordNet and biomedical types from BioNLP.