Large-Scale Machine Translation between Arabic and Hebrew: Available Corpora and Initial Results

25 Sep 2016  ·  Yonatan Belinkov, James Glass ·

Machine translation between Arabic and Hebrew has so far been limited by a lack of parallel corpora, despite the political and cultural importance of this language pair. Previous work relied on manually-crafted grammars or pivoting via English, both of which are unsatisfactory for building a scalable and accurate MT system. In this work, we compare standard phrase-based and neural systems on Arabic-Hebrew translation. We experiment with tokenization by external tools and sub-word modeling by character-level neural models, and show that both methods lead to improved translation performance, with a small advantage to the neural models.

PDF Abstract


  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here