Search Results for author: Saku Sugawara

Found 29 papers, 15 papers with code

Can Question Generation Debias Question Answering Models? A Case Study on Question–Context Lexical Overlap

no code implementations EMNLP (MRQA) 2021 Kazutoshi Shinoda, Saku Sugawara, Akiko Aizawa

Question answering (QA) models for reading comprehension have been demonstrated to exploit unintended dataset biases such as question–context lexical overlap.

Data Augmentation Question Answering +3

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

no code implementations4 Jul 2024 LLM-jp, :, Akiko Aizawa, Eiji Aramaki, Bowen Chen, Fei Cheng, Hiroyuki Deguchi, Rintaro Enomoto, Kazuki Fujii, Kensuke Fukumoto, Takuya Fukushima, Namgi Han, Yuto Harada, Chikara Hashimoto, Tatsuya Hiraoka, Shohei Hisada, Sosuke Hosokawa, Lu Jie, Keisuke Kamata, Teruhito Kanazawa, Hiroki Kanezashi, Hiroshi Kataoka, Satoru Katsumata, Daisuke Kawahara, Seiya Kawano, Atsushi Keyaki, Keisuke Kiryu, Hirokazu Kiyomaru, Takashi Kodama, Takahiro Kubo, Yohei Kuga, Ryoma Kumon, Shuhei Kurita, Sadao Kurohashi, Conglong Li, Taiki Maekawa, Hiroshi Matsuda, Yusuke Miyao, Kentaro Mizuki, Sakae Mizuki, Yugo Murawaki, Ryo Nakamura, Taishi Nakamura, Kouta Nakayama, Tomoka Nakazato, Takuro Niitsuma, Jiro Nishitoba, Yusuke Oda, Hayato Ogawa, Takumi Okamoto, Naoaki Okazaki, Yohei Oseki, Shintaro Ozaki, Koki Ryu, Rafal Rzepka, Keisuke Sakaguchi, Shota Sasaki, Satoshi Sekine, Kohei Suda, Saku Sugawara, Issa Sugiura, Hiroaki Sugiyama, Hisami Suzuki, Jun Suzuki, Toyotaro Suzumura, Kensuke Tachibana, Yu Takagi, Kyosuke Takami, Koichi Takeda, Masashi Takeshita, Masahiro Tanaka, Kenjiro Taura, Arseny Tolmachev, Nobuhiro Ueda, Zhen Wan, Shuntaro Yada, Sakiko Yahata, Yuya Yamamoto, Yusuke Yamauchi, Hitomi Yanaka, Rio Yokota, Koichiro Yoshino

This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs).

MoreHopQA: More Than Multi-hop Reasoning

1 code implementation19 Jun 2024 Julian Schnitzler, Xanh Ho, Jiahao Huang, Florian Boudin, Saku Sugawara, Akiko Aizawa

Instead of relying solely on factual reasoning, we enhance the existing multi-hop questions by adding another layer of questioning that involves one, two, or all three of the following types of reasoning: commonsense, arithmetic, and symbolic.

Question Answering

What Makes Language Models Good-enough?

1 code implementation6 Jun 2024 Daiki Asami, Saku Sugawara

This result suggests that models with shallower depth and fewer heads can learn good-enough language processing.

Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading Comprehension

no code implementations30 Nov 2023 Akira Kawabata, Saku Sugawara

To precisely evaluate a language model's capability for logical reading comprehension, we present a dataset for testing the understanding of the rationale behind critical reasoning.

Multiple-choice Reading Comprehension

Probing Physical Reasoning with Counter-Commonsense Context

no code implementations4 Jun 2023 Kazushi Kondo, Saku Sugawara, Akiko Aizawa

In this study, we create a CConS (Counter-commonsense Contextual Size comparison) dataset to investigate how physical commonsense affects the contextualized size comparison task; the proposed dataset consists of both contexts that fit physical commonsense and those that do not.

On Degrees of Freedom in Defining and Testing Natural Language Understanding

no code implementations24 May 2023 Saku Sugawara, Shun Tsugita

By demonstrating that current practices in NLU studies can be associated with those criteria and organizing them into a comprehensive checklist, we prove that the validity argument can serve as a coherent guideline for designing credible test sets and facilitating scientific communication.

Natural Language Understanding valid

Analyzing the Effectiveness of the Underlying Reasoning Tasks in Multi-hop Question Answering

2 code implementations12 Feb 2023 Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, Akiko Aizawa

To explain the predicted answers and evaluate the reasoning abilities of models, several studies have utilized underlying reasoning (UR) tasks in multi-hop question answering (QA) datasets.

Multi-hop Question Answering Open-Ended Question Answering +1

Cross-Modal Similarity-Based Curriculum Learning for Image Captioning

no code implementations14 Dec 2022 Hongkuan Zhang, Saku Sugawara, Akiko Aizawa, Lei Zhou, Ryohei Sasano, Koichi Takeda

Moreover, the higher model performance on difficult examples and unseen data also demonstrates the generalization ability.

Image Captioning Language Modelling

Which Shortcut Solution Do Question Answering Models Prefer to Learn?

1 code implementation29 Nov 2022 Kazutoshi Shinoda, Saku Sugawara, Akiko Aizawa

We assume that the learnability of shortcuts, i. e., how easy it is to learn a shortcut, is useful to mitigate the problem.

Multiple-choice Question Answering +1

Debiasing Masks: A New Framework for Shortcut Mitigation in NLU

1 code implementation28 Oct 2022 Johannes Mario Meissner, Saku Sugawara, Akiko Aizawa

We propose a new debiasing method in which we identify debiased pruning masks that can be applied to a finetuned model.

Natural Language Understanding

Look to the Right: Mitigating Relative Position Bias in Extractive Question Answering

no code implementations26 Oct 2022 Kazutoshi Shinoda, Saku Sugawara, Akiko Aizawa

Specifically, we find that when the relative positions in a training set are biased, the performance on examples with relative positions unseen during training is significantly degraded.

Extractive Question-Answering Position +1

How Well Do Multi-hop Reading Comprehension Models Understand Date Information?

1 code implementation11 Oct 2022 Xanh Ho, Saku Sugawara, Akiko Aizawa

Other results reveal that our probing questions can help to improve the performance of the models (e. g., by +10. 3 F1) on the main QA task and our dataset can be used for data augmentation to improve the robustness of the models.

Data Augmentation Multi-Hop Reading Comprehension +1

What Makes Reading Comprehension Questions Difficult?

1 code implementation ACL 2022 Saku Sugawara, Nikita Nangia, Alex Warstadt, Samuel R. Bowman

For a natural language understanding benchmark to be useful in research, it has to consist of examples that are diverse and difficult enough to discriminate among current and near-future state-of-the-art systems.

Logical Reasoning Multiple-choice +2

Can Question Generation Debias Question Answering Models? A Case Study on Question-Context Lexical Overlap

1 code implementation23 Sep 2021 Kazutoshi Shinoda, Saku Sugawara, Akiko Aizawa

Question answering (QA) models for reading comprehension have been demonstrated to exploit unintended dataset biases such as question-context lexical overlap.

Data Augmentation Question Answering +3

What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks?

1 code implementation ACL 2021 Nikita Nangia, Saku Sugawara, Harsh Trivedi, Alex Warstadt, Clara Vania, Samuel R. Bowman

However, we find that training crowdworkers, and then using an iterative process of collecting data, sending feedback, and qualifying workers based on expert judgments is an effective means of collecting challenging data.

Multiple-choice Natural Language Understanding +1

Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

1 code implementation COLING 2020 Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, Akiko Aizawa

The evidence information has two benefits: (i) providing a comprehensive explanation for predictions and (ii) evaluating the reasoning skills of a model.

Multi-hop Question Answering Question Answering

Improving the Robustness of QA Models to Challenge Sets with Variational Question-Answer Pair Generation

1 code implementation ACL 2021 Kazutoshi Shinoda, Saku Sugawara, Akiko Aizawa

While most existing QAG methods aim to improve the quality of synthetic examples, we conjecture that diversity-promoting QAG can mitigate the sparsity of training sets and lead to better robustness.

Data Augmentation Diversity +2

Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets

no code implementations21 Nov 2019 Saku Sugawara, Pontus Stenetorp, Kentaro Inui, Akiko Aizawa

Existing analysis work in machine reading comprehension (MRC) is largely concerned with evaluating the capabilities of systems.

Benchmarking Machine Reading Comprehension +1

What Makes Reading Comprehension Questions Easier?

1 code implementation EMNLP 2018 Saku Sugawara, Kentaro Inui, Satoshi Sekine, Akiko Aizawa

From this study, we observed that (i) the baseline performances for the hard subsets remarkably degrade compared to those of entire datasets, (ii) hard questions require knowledge inference and multiple-sentence reasoning in comparison with easy questions, and (iii) multiple-choice questions tend to require a broader range of reasoning skills than answer extraction and description questions.

Machine Reading Comprehension Multiple-choice +1

Cannot find the paper you are looking for? You can Submit a new open access paper.