Search Results for author: Taichi Nishimura

Found 11 papers, 0 papers with code

BioVL-QR: Egocentric Biochemical Video-and-Language Dataset Using Micro QR Codes

no code implementations4 Apr 2024 Taichi Nishimura, Koki Yamamoto, Yuto Haneji, Keiya Kajimura, Chihiro Nishiwaki, Eriko Daikoku, Natsuko Okuda, Fumihito Ono, Hirotaka Kameko, Shinsuke Mori

From our preliminary study, we found that detecting objects only using Micro QR Codes is still difficult because the researchers manipulate objects, causing blur and occlusion frequently.

Text-driven Affordance Learning from Egocentric Vision

no code implementations3 Apr 2024 Tomoya Yoshida, Shuhei Kurita, Taichi Nishimura, Shinsuke Mori

The key idea of our approach is employing textual instruction, targeting various affordances for a wide range of objects.

Referring Expression Referring Expression Comprehension

Automatic Construction of a Large-Scale Corpus for Geoparsing Using Wikipedia Hyperlinks

no code implementations25 Mar 2024 Keyaki Ohno, Hirotaka Kameko, Keisuke Shirai, Taichi Nishimura, Shinsuke Mori

By utilizing hyperlinks, we can accurately assign location expressions with coordinates even with ambiguous location expressions in the texts.

On the Audio Hallucinations in Large Audio-Video Language Models

no code implementations18 Jan 2024 Taichi Nishimura, Shota Nakada, Masayoshi Kondo

This paper refers to this as audio hallucinations and analyzes them in large audio-video language models.

Hallucination Sentence

Vision-Language Models Learn Super Images for Efficient Partially Relevant Video Retrieval

no code implementations1 Dec 2023 Taichi Nishimura, Shota Nakada, Masayoshi Kondo

The zero-shot QASIR yields two discoveries: (1) it enables VLMs to generalize to super images and (2) the grid size $N$, image resolution, and VLM size are key trade-off parameters between performance and computation costs.

Image Retrieval Partially Relevant Video Retrieval +2

Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos

no code implementations28 Nov 2023 Takehiko Ohkawa, Takuma Yagi, Taichi Nishimura, Ryosuke Furuta, Atsushi Hashimoto, Yoshitaka Ushiku, Yoichi Sato

We propose a novel benchmark for cross-view knowledge transfer of dense video captioning, adapting models from web instructional videos with exocentric views to an egocentric view.

Dense Video Captioning Transfer Learning

Recipe Generation from Unsegmented Cooking Videos

no code implementations21 Sep 2022 Taichi Nishimura, Atsushi Hashimoto, Yoshitaka Ushiku, Hirotaka Kameko, Shinsuke Mori

However, unlike DVC, in recipe generation, recipe story awareness is crucial, and a model should extract an appropriate number of events in the correct order and generate accurate sentences based on them.

Dense Video Captioning Recipe Generation +1

Visual Grounding Annotation of Recipe Flow Graph

no code implementations LREC 2020 Taichi Nishimura, Suzushi Tomori, Hayato Hashimoto, Atsushi Hashimoto, Yoko Yamakata, Jun Harashima, Yoshitaka Ushiku, Shinsuke Mori

Visual grounding is provided as bounding boxes to image sequences of recipes, and each bounding box is linked to an element of the workflow.

Visual Grounding

Procedural Text Generation from a Photo Sequence

no code implementations WS 2019 Taichi Nishimura, Atsushi Hashimoto, Shinsuke Mori

Multimedia procedural texts, such as instructions and manuals with pictures, support people to share how-to knowledge.

Text Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.