1 code implementation • 28 Feb 2024 • Yuiga Wada, Kanta Kaneda, Daichi Saito, Komei Sugiura
Establishing an automatic evaluation metric that closely aligns with human judgments is essential for effectively developing image captioning models.
1 code implementation • 26 Dec 2023 • Kanta Kaneda, Shunya Nagashima, Ryosuke Korekata, Motonari Kambara, Komei Sugiura
Therefore, we focus on the task of retrieving target objects from open-vocabulary user instructions in a human-in-the-loop setting, which we define as the learning-to-rank physical objects (LTRPO) task.
1 code implementation • 12 Nov 2023 • Kanta Kaneda, Ryosuke Korekata, Yuiga Wada, Shunya Nagashima, Motonari Kambara, Yui Iioka, Haruka Matsuo, Yuto Imai, Takayuki Nishimura, Komei Sugiura
This paper focuses on the DialFRED task, which is the task of embodied instruction following in a setting where an agent can actively ask questions about the task.
1 code implementation • 7 Nov 2023 • Yuiga Wada, Kanta Kaneda, Komei Sugiura
Image captioning studies heavily rely on automatic evaluation metrics such as BLEU and METEOR.
no code implementations • 7 Nov 2023 • Motonari Kambara, Komei Sugiura
This paper aims to develop a framework that enables a robot to execute tasks based on visual information, in response to natural language instructions for Fetch-and-Carry with Object Grounding (FCOG) tasks.
no code implementations • 17 Jul 2023 • Yui Iioka, Yu Yoshida, Yuiga Wada, Shumpei Hatanaka, Komei Sugiura
In this study, we aim to develop a model that comprehends a natural language instruction (e. g., "Go to the living room and get the nearest pillow to the radio art on the wall") and generates a segmentation mask for the target everyday object.
no code implementations • 14 Jul 2023 • Ryosuke Korekata, Motonari Kambara, Yu Yoshida, Shintaro Ishikawa, Yosuke Kawasaki, Masaki Takahashi, Komei Sugiura
The results show that our method outperforms the baseline method in terms of language comprehension accuracy.
no code implementations • 12 Jul 2023 • Seitaro Otsuki, Shintaro Ishikawa, Komei Sugiura
Most conventional models have been trained on real-world datasets that are labor-intensive to collect, and they have not fully leveraged simulation data through a transfer learning framework.
no code implementations • 24 Jun 2023 • Hidenori Itaya, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, Komei Sugiura
The decoder in AQT utilizes action queries, which represent the information of each action, as queries.
1 code implementation • 19 Jul 2022 • Motonari Kambara, Komei Sugiura
Domestic service robots that support daily tasks are a promising solution for elderly or disabled people.
1 code implementation • 2 Apr 2022 • Shintaro Ishikawa, Komei Sugiura
This is challenging because the robot needs to break down the instruction sentences into subgoals and execute them in the correct order.
1 code implementation • 28 Dec 2021 • Shoya Matsumori, Yuki Abe, Kosuke Shingyouchi, Komei Sugiura, Michita Imai
Previous models for this task successfully generate images iteratively, given a sequence of instructions and a previously generated image.
Ranked #1 on Text-to-Image Generation on GeNeVA (CoDraw)
no code implementations • 2 Jul 2021 • Motonari Kambara, Komei Sugiura
The CRT can handle the objects because of the Case Relation Block.
no code implementations • 2 Jul 2021 • Shintaro Ishikawa, Komei Sugiura
Currently, domestic service robots have an insufficient ability to interact naturally through language.
1 code implementation • ICCV 2021 • Shoya Matsumori, Kosuke Shingyouchi, Yuki Abe, Yosuke Fukuchi, Komei Sugiura, Michita Imai
In addition, we build a goal-oriented visual dialogue task called CLEVR Ask.
no code implementations • 6 Mar 2021 • Hidenori Itaya, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, Komei Sugiura
A3C consists of a feature extractor that extracts features from an image, a policy branch that outputs the policy, and a value branch that outputs the state value.
no code implementations • 1 Mar 2021 • Aly Magassouba, Komei Sugiura, Hisashi Kawai
Navigation guided by natural language instructions is particularly suitable for Domestic Service Robots that interacts naturally with users.
no code implementations • 12 Feb 2021 • Aly Magassouba, Komei Sugiura, Angelica Nakayama, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, Hisashi Kawai
Thus, inferring the collision-risk before a placing motion is crucial for achieving the requested task.
no code implementations • 9 Jul 2020 • Tadashi Ogura, Aly Magassouba, Komei Sugiura, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, Hisashi Kawai
Domestic service robots (DSRs) are a promising solution to the shortage of home care workers.
no code implementations • 23 Dec 2019 • Aly Magassouba, Komei Sugiura, Hisashi Kawai
To solve such a task, we propose the multimodal target-source classifier model with attention branches (MTCM-AB), which is an extension of the MTCM.
no code implementations • 10 Sep 2019 • Aly Magassouba, Komei Sugiura, Hisashi Kawai
In this paper, we address the automatic sentence generation of fetching instructions for domestic service robots.
no code implementations • 11 Jun 2018 • Aly Magassouba, Komei Sugiura, Hisashi Kawai
This paper focuses on a multimodal language understanding method for carry-and-place tasks with domestic service robots.
no code implementations • 16 Jan 2018 • Komei Sugiura, Hisashi Kawai
The target task of this study is grounded language understanding for domestic service robots (DSRs).