1 code implementation • Findings (ACL) 2022 • Tatsuya Hiraoka, Sho Takase, Kei Uchiumi, Atsushi Keyaki, Naoaki Okazaki
We present two simple modifications for word-level perturbation: Word Replacement considering Length (WR-L) and Compositional Word Replacement (CWR). In conventional word replacement, a word in an input is replaced with a word sampled from the entire vocabulary, regardless of the length and context of the target word. WR-L considers the length of a target word by sampling words from the Poisson distribution. CWR considers the compositional candidates by restricting the source of sampling to related words that appear in subword regularization. Experimental results showed that the combination of WR-L and CWR improved the performance of text classification and machine translation.
2 code implementations • Findings (ACL) 2021 • Tatsuya Hiraoka, Sho Takase, Kei Uchiumi, Atsushi Keyaki, Naoaki Okazaki
Since traditional tokenizers are isolated from a downstream task and model, they cannot output an appropriate tokenization depending on the task and model, although recent studies imply that the appropriate tokenization improves the performance.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Tatsuya Hiraoka, Sho Takase, Kei Uchiumi, Atsushi Keyaki, Naoaki Okazaki
In traditional NLP, we tokenize a given sentence as a preprocessing, and thus the tokenization is unrelated to a target downstream task.
no code implementations • COLING 2020 • Chihiro Shibata, Kei Uchiumi, Daichi Mochihashi
For some dimensions in the context vector, we show that their activations are highly correlated with the depth of phrase structures, such as VP and NP.