no code implementations • EMNLP (MRQA) 2021 • Kazutoshi Shinoda, Saku Sugawara, Akiko Aizawa
Question answering (QA) models for reading comprehension have been demonstrated to exploit unintended dataset biases such as question–context lexical overlap.
no code implementations • 4 Jul 2024 • LLM-jp, :, Akiko Aizawa, Eiji Aramaki, Bowen Chen, Fei Cheng, Hiroyuki Deguchi, Rintaro Enomoto, Kazuki Fujii, Kensuke Fukumoto, Takuya Fukushima, Namgi Han, Yuto Harada, Chikara Hashimoto, Tatsuya Hiraoka, Shohei Hisada, Sosuke Hosokawa, Lu Jie, Keisuke Kamata, Teruhito Kanazawa, Hiroki Kanezashi, Hiroshi Kataoka, Satoru Katsumata, Daisuke Kawahara, Seiya Kawano, Atsushi Keyaki, Keisuke Kiryu, Hirokazu Kiyomaru, Takashi Kodama, Takahiro Kubo, Yohei Kuga, Ryoma Kumon, Shuhei Kurita, Sadao Kurohashi, Conglong Li, Taiki Maekawa, Hiroshi Matsuda, Yusuke Miyao, Kentaro Mizuki, Sakae Mizuki, Yugo Murawaki, Ryo Nakamura, Taishi Nakamura, Kouta Nakayama, Tomoka Nakazato, Takuro Niitsuma, Jiro Nishitoba, Yusuke Oda, Hayato Ogawa, Takumi Okamoto, Naoaki Okazaki, Yohei Oseki, Shintaro Ozaki, Koki Ryu, Rafal Rzepka, Keisuke Sakaguchi, Shota Sasaki, Satoshi Sekine, Kohei Suda, Saku Sugawara, Issa Sugiura, Hiroaki Sugiyama, Hisami Suzuki, Jun Suzuki, Toyotaro Suzumura, Kensuke Tachibana, Yu Takagi, Kyosuke Takami, Koichi Takeda, Masashi Takeshita, Masahiro Tanaka, Kenjiro Taura, Arseny Tolmachev, Nobuhiro Ueda, Zhen Wan, Shuntaro Yada, Sakiko Yahata, Yuya Yamamoto, Yusuke Yamauchi, Hitomi Yanaka, Rio Yokota, Koichiro Yoshino
This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs).
1 code implementation • 19 Jun 2024 • Julian Schnitzler, Xanh Ho, Jiahao Huang, Florian Boudin, Saku Sugawara, Akiko Aizawa
Instead of relying solely on factual reasoning, we enhance the existing multi-hop questions by adding another layer of questioning that involves one, two, or all three of the following types of reasoning: commonsense, arithmetic, and symbolic.
1 code implementation • 6 Jun 2024 • Daiki Asami, Saku Sugawara
This result suggests that models with shallower depth and fewer heads can learn good-enough language processing.
1 code implementation • 14 Dec 2023 • Daiki Asami, Saku Sugawara
The projectivity may vary depending on the combination of presupposition triggers and environments.
no code implementations • 30 Nov 2023 • Akira Kawabata, Saku Sugawara
To precisely evaluate a language model's capability for logical reading comprehension, we present a dataset for testing the understanding of the rationale behind critical reasoning.
no code implementations • 4 Jun 2023 • Kazushi Kondo, Saku Sugawara, Akiko Aizawa
In this study, we create a CConS (Counter-commonsense Contextual Size comparison) dataset to investigate how physical commonsense affects the contextualized size comparison task; the proposed dataset consists of both contexts that fit physical commonsense and those that do not.
no code implementations • 24 May 2023 • Saku Sugawara, Shun Tsugita
By demonstrating that current practices in NLU studies can be associated with those criteria and organizing them into a comprehensive checklist, we prove that the validity argument can serve as a coherent guideline for designing credible test sets and facilitating scientific communication.
2 code implementations • 12 Feb 2023 • Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, Akiko Aizawa
To explain the predicted answers and evaluate the reasoning abilities of models, several studies have utilized underlying reasoning (UR) tasks in multi-hop question answering (QA) datasets.
Multi-hop Question Answering Open-Ended Question Answering +1
no code implementations • 14 Dec 2022 • Hongkuan Zhang, Saku Sugawara, Akiko Aizawa, Lei Zhou, Ryohei Sasano, Koichi Takeda
Moreover, the higher model performance on difficult examples and unseen data also demonstrates the generalization ability.
1 code implementation • 29 Nov 2022 • Kazutoshi Shinoda, Saku Sugawara, Akiko Aizawa
We assume that the learnability of shortcuts, i. e., how easy it is to learn a shortcut, is useful to mitigate the problem.
no code implementations • 29 Nov 2022 • Kazutoshi Shinoda, Saku Sugawara, Akiko Aizawa
Contrary to our expectations, although models become sensitive to the four types of perturbations, we find that the OOD generalization is not improved.
1 code implementation • 28 Oct 2022 • Johannes Mario Meissner, Saku Sugawara, Akiko Aizawa
We propose a new debiasing method in which we identify debiased pruning masks that can be applied to a finetuned model.
no code implementations • 26 Oct 2022 • Kazutoshi Shinoda, Saku Sugawara, Akiko Aizawa
Specifically, we find that when the relative positions in a training set are biased, the performance on examples with relative positions unseen during training is significantly degraded.
1 code implementation • 11 Oct 2022 • Xanh Ho, Saku Sugawara, Akiko Aizawa
Other results reveal that our probing questions can help to improve the performance of the models (e. g., by +10. 3 F1) on the main QA task and our dataset can be used for data augmentation to improve the robustness of the models.
1 code implementation • COLING 2022 • Mana Ashida, Saku Sugawara
The possible consequences for the same context may vary depending on the situation we refer to.
no code implementations • 5 Sep 2022 • Xanh Ho, Johannes Mario Meissner, Saku Sugawara, Akiko Aizawa
The issue of shortcut learning is widely known in NLP and has been an important research focus in recent years.
1 code implementation • ACL 2022 • Saku Sugawara, Nikita Nangia, Alex Warstadt, Samuel R. Bowman
For a natural language understanding benchmark to be useful in research, it has to consist of examples that are diverse and difficult enough to discriminate among current and near-future state-of-the-art systems.
1 code implementation • 23 Sep 2021 • Kazutoshi Shinoda, Saku Sugawara, Akiko Aizawa
Question answering (QA) models for reading comprehension have been demonstrated to exploit unintended dataset biases such as question-context lexical overlap.
1 code implementation • ACL 2021 • Johannes Mario Meissner, Napat Thumwanit, Saku Sugawara, Akiko Aizawa
Natural Language Inference (NLI) datasets contain examples with highly ambiguous labels.
1 code implementation • ACL 2021 • Nikita Nangia, Saku Sugawara, Harsh Trivedi, Alex Warstadt, Clara Vania, Samuel R. Bowman
However, we find that training crowdworkers, and then using an iterative process of collecting data, sending feedback, and qualifying workers based on expert judgments is an effective means of collecting challenging data.
1 code implementation • COLING 2020 • Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, Akiko Aizawa
The evidence information has two benefits: (i) providing a comprehensive explanation for predictions and (ii) evaluating the reasoning skills of a model.
1 code implementation • ACL 2021 • Kazutoshi Shinoda, Saku Sugawara, Akiko Aizawa
While most existing QAG methods aim to improve the quality of synthetic examples, we conjecture that diversity-promoting QAG can mitigate the sparsity of training sets and lead to better robustness.
no code implementations • EACL 2021 • Saku Sugawara, Pontus Stenetorp, Akiko Aizawa
Machine reading comprehension (MRC) has received considerable attention as a benchmark for natural language understanding.
no code implementations • 21 Nov 2019 • Saku Sugawara, Pontus Stenetorp, Kentaro Inui, Akiko Aizawa
Existing analysis work in machine reading comprehension (MRC) is largely concerned with evaluating the capabilities of systems.
1 code implementation • EMNLP 2018 • Saku Sugawara, Kentaro Inui, Satoshi Sekine, Akiko Aizawa
From this study, we observed that (i) the baseline performances for the hard subsets remarkably degrade compared to those of entire datasets, (ii) hard questions require knowledge inference and multiple-sentence reasoning in comparison with easy questions, and (iii) multiple-choice questions tend to require a broader range of reasoning skills than answer extraction and description questions.
no code implementations • ACL 2017 • Saku Sugawara, Yusuke Kido, Hikaru Yokono, Akiko Aizawa
Knowing the quality of reading comprehension (RC) datasets is important for the development of natural-language understanding systems.
no code implementations • WS 2016 • Kimi Kaneko, Saku Sugawara, Koji Mineshima, Daisuke Bekki
This paper proposes a methodology for building a specialized Japanese data set for recognizing temporal relations and discourse relations.