Search Results for author: Taichi Nishimura

Found 11 papers, 0 papers with code

Image Description Dataset for Language Learners

no code implementations • LREC 2022 • Kento Tanaka, Taichi Nishimura, Hiroaki Nanjo, Keisuke Shirai, Hirotaka Kameko, Masatake Dantsuji

We focus on image description and a corresponding assessment system for language learners.

Sentence

Paper
Add Code

BioVL-QR: Egocentric Biochemical Video-and-Language Dataset Using Micro QR Codes

no code implementations • 4 Apr 2024 • Taichi Nishimura, Koki Yamamoto, Yuto Haneji, Keiya Kajimura, Chihiro Nishiwaki, Eriko Daikoku, Natsuko Okuda, Fumihito Ono, Hirotaka Kameko, Shinsuke Mori

From our preliminary study, we found that detecting objects only using Micro QR Codes is still difficult because the researchers manipulate objects, causing blur and occlusion frequently.

Paper
Add Code

Text-driven Affordance Learning from Egocentric Vision

no code implementations • 3 Apr 2024 • Tomoya Yoshida, Shuhei Kurita, Taichi Nishimura, Shinsuke Mori

The key idea of our approach is employing textual instruction, targeting various affordances for a wide range of objects.

Referring Expression Referring Expression Comprehension

Paper
Add Code

Automatic Construction of a Large-Scale Corpus for Geoparsing Using Wikipedia Hyperlinks

no code implementations • 25 Mar 2024 • Keyaki Ohno, Hirotaka Kameko, Keisuke Shirai, Taichi Nishimura, Shinsuke Mori

By utilizing hyperlinks, we can accurately assign location expressions with coordinates even with ambiguous location expressions in the texts.

Paper
Add Code

On the Audio Hallucinations in Large Audio-Video Language Models

no code implementations • 18 Jan 2024 • Taichi Nishimura, Shota Nakada, Masayoshi Kondo

This paper refers to this as audio hallucinations and analyzes them in large audio-video language models.

Hallucination Sentence

Paper
Add Code

Vision-Language Models Learn Super Images for Efficient Partially Relevant Video Retrieval

no code implementations • 1 Dec 2023 • Taichi Nishimura, Shota Nakada, Masayoshi Kondo

The zero-shot QASIR yields two discoveries: (1) it enables VLMs to generalize to super images and (2) the grid size $N$, image resolution, and VLM size are key trade-off parameters between performance and computation costs.

Image Retrieval Partially Relevant Video Retrieval +2

Paper
Add Code

Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos

no code implementations • 28 Nov 2023 • Takehiko Ohkawa, Takuma Yagi, Taichi Nishimura, Ryosuke Furuta, Atsushi Hashimoto, Yoshitaka Ushiku, Yoichi Sato

We propose a novel benchmark for cross-view knowledge transfer of dense video captioning, adapting models from web instructional videos with exocentric views to an egocentric view.

Dense Video Captioning Transfer Learning

Paper
Add Code

Recipe Generation from Unsegmented Cooking Videos

no code implementations • 21 Sep 2022 • Taichi Nishimura, Atsushi Hashimoto, Yoshitaka Ushiku, Hirotaka Kameko, Shinsuke Mori

However, unlike DVC, in recipe generation, recipe story awareness is crucial, and a model should extract an appropriate number of events in the correct order and generate accurate sentences based on them.

Dense Video Captioning Recipe Generation +1

Paper
Add Code

Visual Recipe Flow: A Dataset for Learning Visual State Changes of Objects with Recipe Flows

no code implementations • COLING 2022 • Keisuke Shirai, Atsushi Hashimoto, Taichi Nishimura, Hirotaka Kameko, Shuhei Kurita, Yoshitaka Ushiku, Shinsuke Mori

We present a new multimodal dataset called Visual Recipe Flow, which enables us to learn each cooking action result in a recipe text.

Text Generation

Paper
Add Code

Visual Grounding Annotation of Recipe Flow Graph

no code implementations • LREC 2020 • Taichi Nishimura, Suzushi Tomori, Hayato Hashimoto, Atsushi Hashimoto, Yoko Yamakata, Jun Harashima, Yoshitaka Ushiku, Shinsuke Mori

Visual grounding is provided as bounding boxes to image sequences of recipes, and each bounding box is linked to an element of the workflow.

Visual Grounding

Paper
Add Code

Procedural Text Generation from a Photo Sequence

no code implementations • WS 2019 • Taichi Nishimura, Atsushi Hashimoto, Shinsuke Mori

Multimedia procedural texts, such as instructions and manuals with pictures, support people to share how-to knowledge.

Text Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.