PheMT is a phenomenon-wise dataset designed for evaluating the robustness of Japanese-English machine translation systems. The dataset is based on the MTNT dataset, with additional annotations of four linguistic phenomena common in UGC; Proper Noun, Abbreviated Noun, Colloquial Expression, and Variant

Source: https://github.com/cl-tohoku/PheMT

Papers


Paper Code Results Date Stars

Dataset Loaders


Tasks


Similar Datasets


Source: Fujii et al.

License


  • Unknown

Modalities


Languages