Search Results for author: Komei Sugiura

Found 23 papers, 8 papers with code

Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

1 code implementation • 28 Feb 2024 • Yuiga Wada, Kanta Kaneda, Daichi Saito, Komei Sugiura

Establishing an automatic evaluation metric that closely aligns with human judgments is essential for effectively developing image captioning models.

Contrastive Learning Image Captioning +1

Paper
Code

Learning-To-Rank Approach for Identifying Everyday Objects Using a Physical-World Search Engine

1 code implementation • 26 Dec 2023 • Kanta Kaneda, Shunya Nagashima, Ryosuke Korekata, Motonari Kambara, Komei Sugiura

Therefore, we focus on the task of retrieving target objects from open-vocabulary user instructions in a human-in-the-loop setting, which we define as the learning-to-rank physical objects (LTRPO) task.

Learning-To-Rank

Paper
Code

DialMAT: Dialogue-Enabled Transformer with Moment-Based Adversarial Training

1 code implementation • 12 Nov 2023 • Kanta Kaneda, Ryosuke Korekata, Yuiga Wada, Shunya Nagashima, Motonari Kambara, Yui Iioka, Haruka Matsuo, Yuto Imai, Takayuki Nishimura, Komei Sugiura

This paper focuses on the DialFRED task, which is the task of embodied instruction following in a setting where an agent can actively ask questions about the task.

Instruction Following Position

Paper
Code

JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures for Image Captioning Models

1 code implementation • 7 Nov 2023 • Yuiga Wada, Kanta Kaneda, Komei Sugiura

Image captioning studies heavily rely on automatic evaluation metrics such as BLEU and METEOR.

Image Captioning

Paper
Code

Fully Automated Task Management for Generation, Execution, and Evaluation: A Framework for Fetch-and-Carry Tasks with Natural Language Instructions in Continuous Space

no code implementations • 7 Nov 2023 • Motonari Kambara, Komei Sugiura

This paper aims to develop a framework that enables a robot to execute tasks based on visual information, in response to natural language instructions for Fetch-and-Carry with Object Grounding (FCOG) tasks.

Paper
Add Code

Multimodal Diffusion Segmentation Model for Object Segmentation from Manipulation Instructions

no code implementations • 17 Jul 2023 • Yui Iioka, Yu Yoshida, Yuiga Wada, Shumpei Hatanaka, Komei Sugiura

In this study, we aim to develop a model that comprehends a natural language instruction (e. g., "Go to the living room and get the nearest pillow to the radio art on the wall") and generates a segmentation mask for the target everyday object.

Segmentation Semantic Segmentation +1

Paper
Add Code

Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks

no code implementations • 14 Jul 2023 • Ryosuke Korekata, Motonari Kambara, Yu Yoshida, Shintaro Ishikawa, Yosuke Kawasaki, Masaki Takahashi, Komei Sugiura

The results show that our method outperforms the baseline method in terms of language comprehension accuracy.

Object Referring Expression +1

Paper
Add Code

Prototypical Contrastive Transfer Learning for Multimodal Language Understanding

no code implementations • 12 Jul 2023 • Seitaro Otsuki, Shintaro Ishikawa, Komei Sugiura

Most conventional models have been trained on real-world datasets that are labor-intensive to collect, and they have not fully leveraged simulation data through a transfer learning framework.

Transfer Learning

Paper
Add Code

Action Q-Transformer: Visual Explanation in Deep Reinforcement Learning with Encoder-Decoder Model using Action Query

no code implementations • 24 Jun 2023 • Hidenori Itaya, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, Komei Sugiura

The decoder in AQT utilizes action queries, which represent the information of each action, as queries.

Atari Games Decision Making +1

Paper
Add Code

Relational Future Captioning Model for Explaining Likely Collisions in Daily Tasks

1 code implementation • 19 Jul 2022 • Motonari Kambara, Komei Sugiura

Domestic service robots that support daily tasks are a promising solution for elderly or disabled people.

Text Generation

Paper
Code

Moment-based Adversarial Training for Embodied Language Comprehension

1 code implementation • 2 Apr 2022 • Shintaro Ishikawa, Komei Sugiura

This is challenging because the robot needs to break down the instruction sentences into subgoals and execute them in the correct order.

Paper
Code

LatteGAN: Visually Guided Language Attention for Multi-Turn Text-Conditioned Image Manipulation

1 code implementation • 28 Dec 2021 • Shoya Matsumori, Yuki Abe, Kosuke Shingyouchi, Komei Sugiura, Michita Imai

Previous models for this task successfully generate images iteratively, given a sequence of instructions and a previously generated image.

Ranked #1 on Text-to-Image Generation on GeNeVA (CoDraw)

Image Manipulation Text-to-Image Generation

Paper
Code

Case Relation Transformer: A Crossmodal Language Generation Model for Fetching Instructions

no code implementations • 2 Jul 2021 • Motonari Kambara, Komei Sugiura

The CRT can handle the objects because of the Case Relation Block.

Relation Sentence +1

Paper
Add Code

Target-dependent UNITER: A Transformer-Based Multimodal Language Comprehension Model for Domestic Service Robots

no code implementations • 2 Jul 2021 • Shintaro Ishikawa, Komei Sugiura

Currently, domestic service robots have an insufficient ability to interact naturally through language.

Paper
Add Code

Unified Questioner Transformer for Descriptive Question Generation in Goal-Oriented Visual Dialogue

1 code implementation • ICCV 2021 • Shoya Matsumori, Kosuke Shingyouchi, Yuki Abe, Yosuke Fukuchi, Komei Sugiura, Michita Imai

In addition, we build a goal-oriented visual dialogue task called CLEVR Ask.

Descriptive Question Generation +1

Paper
Code

Visual Explanation using Attention Mechanism in Actor-Critic-based Deep Reinforcement Learning

no code implementations • 6 Mar 2021 • Hidenori Itaya, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, Komei Sugiura

A3C consists of a feature extractor that extracts features from an image, a policy branch that outputs the policy, and a value branch that outputs the state value.

Decision Making reinforcement-learning +1

Paper
Add Code

CrossMap Transformer: A Crossmodal Masked Path Transformer Using Double Back-Translation for Vision-and-Language Navigation

no code implementations • 1 Mar 2021 • Aly Magassouba, Komei Sugiura, Hisashi Kawai

Navigation guided by natural language instructions is particularly suitable for Domestic Service Robots that interacts naturally with users.

Translation Vision and Language Navigation

Paper
Add Code

Predicting and Attending to Damaging Collisions for Placing Everyday Objects in Photo-Realistic Simulations

no code implementations • 12 Feb 2021 • Aly Magassouba, Komei Sugiura, Angelica Nakayama, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, Hisashi Kawai

Thus, inferring the collision-risk before a placing motion is crucial for achieving the requested task.

Paper
Add Code

Alleviating the Burden of Labeling: Sentence Generation by Attention Branch Encoder-Decoder Network

no code implementations • 9 Jul 2020 • Tadashi Ogura, Aly Magassouba, Komei Sugiura, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, Hisashi Kawai

Domestic service robots (DSRs) are a promising solution to the shortage of home care workers.

Image Captioning Sentence

Paper
Add Code

A Multimodal Target-Source Classifier with Attention Branches to Understand Ambiguous Instructions for Fetching Daily Objects

no code implementations • 23 Dec 2019 • Aly Magassouba, Komei Sugiura, Hisashi Kawai

To solve such a task, we propose the multimodal target-source classifier model with attention branches (MTCM-AB), which is an extension of the MTCM.

Sentence

Paper
Add Code

Multimodal Attention Branch Network for Perspective-Free Sentence Generation

no code implementations • 10 Sep 2019 • Aly Magassouba, Komei Sugiura, Hisashi Kawai

In this paper, we address the automatic sentence generation of fetching instructions for domestic service robots.

Sentence

Paper
Add Code

A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions

no code implementations • 11 Jun 2018 • Aly Magassouba, Komei Sugiura, Hisashi Kawai

This paper focuses on a multimodal language understanding method for carry-and-place tasks with domestic service robots.

Generative Adversarial Network

Paper
Add Code

Grounded Language Understanding for Manipulation Instructions Using GAN-Based Classification

no code implementations • 16 Jan 2018 • Komei Sugiura, Hisashi Kawai

The target task of this study is grounded language understanding for domestic service robots (DSRs).

Classification General Classification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.