no code implementations • NLP4DH (ICON) 2021 • Aynat Rubinstein, Avi Shmidman
A big unknown in Digital Humanities (DH) projects that seek to analyze previously untouched corpora is the question of how to adapt existing Natural Language Processing (NLP) resources to the specific nature of the target corpus.
no code implementations • 9 Jul 2024 • Shaltiel Shmidman, Avi Shmidman, Amir DN Cohen, Moshe Koppel
Adapting a pre-trained model to a new language involves specialized techniques that differ significantly from training a model from scratch or further training existing models on well-resourced languages such as English.
1 code implementation • 11 May 2024 • Avi Shmidman, Cheyn Shmuel Shmidman, Dan Bareket, Moshe Koppel, Reut Tsarfaty
We evaluate all existing models for contextualized Hebrew embeddings on a novel Hebrew homograph challenge sets that we deliver.
no code implementations • 11 Mar 2024 • Shaltiel Shmidman, Avi Shmidman, Moshe Koppel, Reut Tsarfaty
Syntactic parsing remains a critical tool for relation extraction and information extraction, especially in resource-scarce languages where LLMs are lacking.
no code implementations • 25 Sep 2023 • Shaltiel Shmidman, Avi Shmidman, Amir David Nissan Cohen, Moshe Koppel
As a commitment to promoting research and development in the Hebrew language, we release both the foundation model and the instruct-tuned model under a Creative Commons license.
no code implementations • 31 Aug 2023 • Shaltiel Shmidman, Avi Shmidman, Moshe Koppel
We present DictaBERT, a new state-of-the-art pre-trained BERT model for modern Hebrew, outperforming existing models on most benchmarks.
no code implementations • 28 Nov 2022 • Eylon Gueta, Avi Shmidman, Shaltiel Shmidman, Cheyn Shmuel Shmidman, Joshua Guedalia, Moshe Koppel, Dan Bareket, Amit Seker, Reut Tsarfaty
We perform a contrastive analysis of this model against all previous Hebrew PLMs (mBERT, heBERT, AlephBERT) and assess the effects of larger vocabularies on task performance.
Ranked #1 on Named Entity Recognition (NER) on NEMO-Corpus
no code implementations • 3 Aug 2022 • Avi Shmidman, Joshua Guedalia, Shaltiel Shmidman, Cheyn Shmuel Shmidman, Eli Handel, Moshe Koppel
We present a new pre-trained language model (PLM) for Rabbinic Hebrew, termed Berel (BERT Embeddings for Rabbinic-Encoded Language).
no code implementations • Findings of the Association for Computational Linguistics 2020 • Avi Shmidman, Joshua Guedalia, Shaltiel Shmidman, Moshe Koppel, Reut Tsarfaty
One of the primary tasks of morphological parsers is the disambiguation of homographs.
no code implementations • ACL 2020 • Avi Shmidman, Shaltiel Shmidman, Moshe Koppel, Yoav Goldberg
We present a system for automatic diacritization of Hebrew text.
1 code implementation • 11 Sep 2018 • Yonatan Belinkov, Alexander Magidow, Alberto Barrón-Cedeño, Avi Shmidman, Maxim Romanov
Arabic is a widely-spoken language with a long and rich history, but existing corpora and language technology focus mostly on modern Arabic and its varieties.
no code implementations • WS 2016 • Yonatan Belinkov, Alexander Magidow, Maxim Romanov, Avi Shmidman, Moshe Koppel
Arabic is a widely-spoken language with a rich and long history spanning more than fourteen centuries.
no code implementations • 28 Feb 2016 • Avi Shmidman, Moshe Koppel, Ely Porat
We propose a method for efficiently finding all parallel passages in a large corpus, even if the passages are not quite identical due to rephrasing and orthographic variation.