First, so far, Hebrew resources for training large language models are not of the same magnitude as their English counterparts.
While fine-tuned language models perform well on many tasks, they were also shown to rely on superficial surface features such as lexical overlap.
Furthermore, we suggest a method that given a sentence, identifies points in the quality control space that are expected to yield optimal generated paraphrases.
Second, there are no accepted tasks and benchmarks to evaluate the progress of Hebrew PLMs on.
Ranked #1 on Named Entity Recognition on NEMO-Corpus (token,test)