1 code implementation • 13 Apr 2024 • Hayato Tsukagoshi, Tsutomu Hirao, Makoto Morishita, Katsuki Chousa, Ryohei Sasano, Koichi Takeda
The task of Split and Rephrase, which splits a complex sentence into multiple simple sentences with the same meaning, improves readability and enhances the performance of downstream tasks in natural language processing (NLP).
no code implementations • LREC 2022 • Makoto Morishita, Katsuki Chousa, Jun Suzuki, Masaaki Nagata
Most current machine translation models are mainly trained with parallel corpora, and their translation accuracy largely depends on the quality and quantity of the corpora.
no code implementations • ACL (WAT) 2021 • Katsuki Chousa, Makoto Morishita
This paper describes our systems that were submitted to the restricted translation task at WAT 2021.
no code implementations • COLING 2020 • Yui Oka, Katsuki Chousa, Katsuhito Sudoh, Satoshi Nakamura
Since length constraints with exact target sentence lengths degrade translation performance, we add random noise within a certain window size to the length constraints in the PE during the training.
1 code implementation • COLING 2020 • Katsuki Chousa, Masaaki Nagata, Masaaki Nishino
In particular, our method improved by +53. 9 F1 scores for extracting non-parallel sentences.
no code implementations • 29 Apr 2020 • Katsuki Chousa, Masaaki Nagata, Masaaki Nishino
We also conduct a sentence alignment experiment using En-Ja newspaper articles and find that the proposed method using multilingual BERT achieves significantly better accuracy than a baseline method using a bilingual dictionary and dynamic programming.
no code implementations • 27 Nov 2019 • Katsuki Chousa, Katsuhito Sudoh, Satoshi Nakamura
Simultaneous machine translation is a variant of machine translation that starts the translation process before the end of an input.
no code implementations • 30 Jul 2018 • Katsuki Chousa, Katsuhito Sudoh, Satoshi Nakamura
The proposed loss function encourages an NMT decoder to generate words close to their references in the embedding space; this helps the decoder to choose similar acceptable words when the actual best candidates are not included in the vocabulary due to its size limitation.