Search Results for author: Besim Kabashi

Found 13 papers, 1 papers with code

Modelling Collocations in OntoLex-FrAC

no code implementations • gwll (LREC) 2022 • Christian Chiarcos, Katerina Gkirtzou, Maxim Ionov, Besim Kabashi, Fahad Khan, Ciprian-Octavian Truică

Following presentations of frequency and attestations, and embeddings and distributional similarity, this paper introduces the third cornerstone of the emerging OntoLex module for Frequency, Attestation and Corpus-based Information, OntoLex-FrAC.

Paper
Add Code

TIAD 2022: The Fifth Translation Inference Across Dictionaries Shared Task

no code implementations • gwll (LREC) 2022 • Jorge Gracia, Besim Kabashi, Ilan Kernerman

The objective of the Translation Inference Across Dictionaries (TIAD) series of shared tasks is to explore and compare methods and techniques that infer translations indirectly between language pairs, based on other bilingual/multilingual lexicographic resources.

Translation

Paper
Add Code

Modelling Frequency, Attestation, and Corpus-Based Information with OntoLex-FrAC

no code implementations • COLING 2022 • Christian Chiarcos, Elena-Simona Apostol, Besim Kabashi, Ciprian-Octavian Truică

OntoLex-Lemon has become a de facto standard for lexical resources in the web of data.

Paper
Add Code

The_Illiterati: Part-of-Speech Tagging for Magahi and Bhojpuri without even knowing the alphabet

no code implementations • NSURL 2019 • Thomas Proisl, Peter Uhrig, Andreas Blombach, Natalie Dykes, Philipp Heinrich, Besim Kabashi, Sefora Mammarella

Part-Of-Speech Tagging

Paper
Add Code

EmpiriST Corpus 2.0: Adding Manual Normalization, Lemmatization and Semantic Tagging to a German Web and CMC Corpus

no code implementations • LREC 2020 • Thomas Proisl, Natalie Dykes, Philipp Heinrich, Besim Kabashi, Andreas Blombach, Stefan Evert

The EmpiriST corpus (Bei{\ss}wenger et al., 2016) is a manually tokenized and part-of-speech tagged corpus of approximately 23, 000 tokens of German Web and CMC (computer-mediated communication) data.

Lemmatization

Paper
Add Code

A Corpus of German Reddit Exchanges (GeRedE)

no code implementations • LREC 2020 • Andreas Blombach, Natalie Dykes, Philipp Heinrich, Besim Kabashi, Thomas Proisl

GeRedE is a 270 million token German CMC corpus containing approximately 380, 000 submissions and 6, 800, 000 comments posted on Reddit between 2010 and 2018.