1 code implementation • Findings (ACL) 2022 • Tatsuya Hiraoka, Sho Takase, Kei Uchiumi, Atsushi Keyaki, Naoaki Okazaki
We present two simple modifications for word-level perturbation: Word Replacement considering Length (WR-L) and Compositional Word Replacement (CWR). In conventional word replacement, a word in an input is replaced with a word sampled from the entire vocabulary, regardless of the length and context of the target word. WR-L considers the length of a target word by sampling words from the Poisson distribution. CWR considers the compositional candidates by restricting the source of sampling to related words that appear in subword regularization. Experimental results showed that the combination of WR-L and CWR improved the performance of text classification and machine translation.
no code implementations • 28 Dec 2023 • Sho Takase, Shun Kiyono, Sosuke Kobayashi, Jun Suzuki
Loss spikes often occur during pre-training of large language models.
no code implementations • 29 May 2023 • Mengsay Loem, Masahiro Kaneko, Sho Takase, Naoaki Okazaki
Large-scale pre-trained language models such as GPT-3 have shown remarkable performance across various natural language processing tasks.
no code implementations • 26 Aug 2022 • Ayana Niwa, Sho Takase, Naoaki Okazaki
In addition, the proposed method outperforms an NAR baseline on the WMT'14 En-De dataset.
no code implementations • 27 Jul 2022 • Mengsay Loem, Sho Takase, Masahiro Kaneko, Naoaki Okazaki
Impressive performance of Transformer has been attributed to self-attention, where dependencies between entire input in a sequence are considered at every position.
1 code implementation • 1 Jun 2022 • Sho Takase, Shun Kiyono, Sosuke Kobayashi, Jun Suzuki
Recent Transformers tend to be Pre-LN because, in Post-LN with deep Transformers (e. g., those with ten or more layers), the training is often unstable, resulting in useless models.
no code implementations • Findings (ACL) 2022 • Sho Takase, Tatsuya Hiraoka, Naoaki Okazaki
Subword regularizations use multiple subword segmentations during training to improve the robustness of neural machine translation models.
1 code implementation • ACL 2022 • Masahiro Kaneko, Sho Takase, Ayana Niwa, Naoaki Okazaki
In this study, we introduce an Example-Based GEC (EB-GEC) that presents examples to language learners as a basis for a correction result.
no code implementations • NAACL (ACL) 2022 • Mengsay Loem, Sho Takase, Masahiro Kaneko, Naoaki Okazaki
Through experiments, we show that ExtraPhrase improves the performance of abstractive summarization tasks by more than 0. 50 points in ROUGE scores compared to the setting without data augmentation.
2 code implementations • Findings (ACL) 2021 • Tatsuya Hiraoka, Sho Takase, Kei Uchiumi, Atsushi Keyaki, Naoaki Okazaki
Since traditional tokenizers are isolated from a downstream task and model, they cannot output an appropriate tokenization depending on the task and model, although recent studies imply that the appropriate tokenization improves the performance.
2 code implementations • 13 Apr 2021 • Sho Takase, Shun Kiyono
We propose a parameter sharing method for Transformers (Vaswani et al., 2017).
Ranked #1 on Machine Translation on WMT2014 English-German
1 code implementation • NAACL 2021 • Sho Takase, Shun Kiyono
We often use perturbations to regularize neural models.
Ranked #1 on Text Summarization on DUC 2004 Task 1 (using extra training data)
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Tatsuya Hiraoka, Sho Takase, Kei Uchiumi, Atsushi Keyaki, Naoaki Okazaki
In traditional NLP, we tokenize a given sentence as a preprocessing, and thus the tokenization is unrelated to a target downstream task.
no code implementations • LREC 2022 • Sho Takase, Naoaki Okazaki
Experimental results indicate that Transum improves the performance from the strong baseline, Transformer, in Chinese-English, Arabic-English, and English-Japanese translation datasets.
Abstractive Text Summarization Cross-Lingual Abstractive Summarization +4
1 code implementation • ACL 2020 • Kazuki Matsumaru, Sho Takase, Naoaki Okazaki
Building a binary classifier that predicts an entailment relation between an article and its headline, we filter out untruthful instances from the supervision data.
no code implementations • LREC 2020 • Sho Shimazu, Sho Takase, Toshiaki Nakazawa, Naoaki Okazaki
Therefore, we present a hand-crafted dataset to evaluate whether translation models can resolve the zero pronoun problems in Japanese to English translations.
1 code implementation • NeurIPS 2020 • Sho Takase, Sosuke Kobayashi
The proposed method, ALONE (all word embeddings from one), constructs the embedding of a word by modifying the shared embedding with a filter vector, which is word-specific but non-trainable.
Ranked #3 on Text Summarization on DUC 2004 Task 1
no code implementations • IJCNLP 2019 • Masaaki Nishino, Sho Takase, Tsutomu Hirao, Masaaki Nagata
An anagram is a sentence or a phrase that is made by permutating the characters of an input sentence or a phrase.
no code implementations • WS 2019 • Yuichi Sasazawa, Sho Takase, Naoaki Okazaki
One of the key requirements of QG is to generate a question such that it results in a target answer.
no code implementations • 13 Jun 2019 • Sho Takase, Jun Suzuki, Masaaki Nagata
This paper proposes a novel Recurrent Neural Network (RNN) language model that takes advantage of character information.
1 code implementation • NAACL 2019 • Sho Takase, Naoaki Okazaki
Neural encoder-decoder models have been successful in natural language generation tasks.
Ranked #2 on Text Summarization on DUC 2004 Task 1
no code implementations • WS 2018 • Shun Kiyono, Sho Takase, Jun Suzuki, Naoaki Okazaki, Kentaro Inui, Masaaki Nagata
Developing a method for understanding the inner workings of black-box neural methods is an important research endeavor.
1 code implementation • EMNLP 2018 • Sho Takase, Jun Suzuki, Masaaki Nagata
This paper proposes a state-of-the-art recurrent neural network (RNN) language model that combines probability distributions computed not only from a final RNN layer but also from middle layers.
Ranked #8 on Language Modelling on Penn Treebank (Word Level)
1 code implementation • ACL 2018 • Jun Suzuki, Sho Takase, Hidetaka Kamigaito, Makoto Morishita, Masaaki Nagata
This paper investigates the construction of a strong baseline based on general purpose sequence-to-sequence models for constituency parsing.
Ranked #16 on Constituency Parsing on Penn Treebank
no code implementations • 22 Dec 2017 • Shun Kiyono, Sho Takase, Jun Suzuki, Naoaki Okazaki, Kentaro Inui, Masaaki Nagata
The encoder-decoder model is widely used in natural language generation tasks.
1 code implementation • IJCNLP 2017 • Sho Takase, Jun Suzuki, Masaaki Nagata
This paper proposes a reinforcing method that refines the output layers of existing Recurrent Neural Network (RNN) language models.
1 code implementation • ACL 2016 • Sho Takase, Naoaki Okazaki, Kentaro Inui
Learning distributed representations for relation instances is a central technique in downstream NLP applications.