Search Results for author: Avi Shmidman

Found 13 papers, 2 papers with code

NLP in the DH pipeline: Transfer-learning to a Chronolect

no code implementations NLP4DH (ICON) 2021 Aynat Rubinstein, Avi Shmidman

A big unknown in Digital Humanities (DH) projects that seek to analyze previously untouched corpora is the question of how to adapt existing Natural Language Processing (NLP) resources to the specific nature of the target corpus.

Transfer Learning

Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities

no code implementations9 Jul 2024 Shaltiel Shmidman, Avi Shmidman, Amir DN Cohen, Moshe Koppel

Adapting a pre-trained model to a new language involves specialized techniques that differ significantly from training a model from scratch or further training existing models on well-resourced languages such as English.

Multilingual NLP Question Answering +1

Do Pretrained Contextual Language Models Distinguish between Hebrew Homograph Analyses?

1 code implementation11 May 2024 Avi Shmidman, Cheyn Shmuel Shmidman, Dan Bareket, Moshe Koppel, Reut Tsarfaty

We evaluate all existing models for contextualized Hebrew embeddings on a novel Hebrew homograph challenge sets that we deliver.

Word Sense Disambiguation

MRL Parsing Without Tears: The Case of Hebrew

no code implementations11 Mar 2024 Shaltiel Shmidman, Avi Shmidman, Moshe Koppel, Reut Tsarfaty

Syntactic parsing remains a critical tool for relation extraction and information extraction, especially in resource-scarce languages where LLMs are lacking.

Dependency Parsing POS +2

Introducing DictaLM -- A Large Generative Language Model for Modern Hebrew

no code implementations25 Sep 2023 Shaltiel Shmidman, Avi Shmidman, Amir David Nissan Cohen, Moshe Koppel

As a commitment to promoting research and development in the Hebrew language, we release both the foundation model and the instruct-tuned model under a Creative Commons license.

Language Modelling Sentiment Analysis

DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew

no code implementations31 Aug 2023 Shaltiel Shmidman, Avi Shmidman, Moshe Koppel

We present DictaBERT, a new state-of-the-art pre-trained BERT model for modern Hebrew, outperforming existing models on most benchmarks.

Morphological Tagging Question Answering +1

Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language

no code implementations3 Aug 2022 Avi Shmidman, Joshua Guedalia, Shaltiel Shmidman, Cheyn Shmuel Shmidman, Eli Handel, Moshe Koppel

We present a new pre-trained language model (PLM) for Rabbinic Hebrew, termed Berel (BERT Embeddings for Rabbinic-Encoded Language).

Language Modelling

Studying the History of the Arabic Language: Language Technology and a Large-Scale Historical Corpus

1 code implementation11 Sep 2018 Yonatan Belinkov, Alexander Magidow, Alberto Barrón-Cedeño, Avi Shmidman, Maxim Romanov

Arabic is a widely-spoken language with a long and rich history, but existing corpora and language technology focus mostly on modern Arabic and its varieties.

Shamela: A Large-Scale Historical Arabic Corpus

no code implementations WS 2016 Yonatan Belinkov, Alexander Magidow, Maxim Romanov, Avi Shmidman, Moshe Koppel

Arabic is a widely-spoken language with a rich and long history spanning more than fourteen centuries.

Identification of Parallel Passages Across a Large Hebrew/Aramaic Corpus

no code implementations28 Feb 2016 Avi Shmidman, Moshe Koppel, Ely Porat

We propose a method for efficiently finding all parallel passages in a large corpus, even if the passages are not quite identical due to rephrasing and orthographic variation.

Cannot find the paper you are looking for? You can Submit a new open access paper.