no code implementations • LREC 2022 • Kento Tanaka, Taichi Nishimura, Hiroaki Nanjo, Keisuke Shirai, Hirotaka Kameko, Masatake Dantsuji
We focus on image description and a corresponding assessment system for language learners.
no code implementations • 4 Apr 2024 • Taichi Nishimura, Koki Yamamoto, Yuto Haneji, Keiya Kajimura, Chihiro Nishiwaki, Eriko Daikoku, Natsuko Okuda, Fumihito Ono, Hirotaka Kameko, Shinsuke Mori
From our preliminary study, we found that detecting objects only using Micro QR Codes is still difficult because the researchers manipulate objects, causing blur and occlusion frequently.
no code implementations • 3 Apr 2024 • Tomoya Yoshida, Shuhei Kurita, Taichi Nishimura, Shinsuke Mori
The key idea of our approach is employing textual instruction, targeting various affordances for a wide range of objects.
no code implementations • 25 Mar 2024 • Keyaki Ohno, Hirotaka Kameko, Keisuke Shirai, Taichi Nishimura, Shinsuke Mori
By utilizing hyperlinks, we can accurately assign location expressions with coordinates even with ambiguous location expressions in the texts.
no code implementations • 18 Jan 2024 • Taichi Nishimura, Shota Nakada, Masayoshi Kondo
This paper refers to this as audio hallucinations and analyzes them in large audio-video language models.
no code implementations • 1 Dec 2023 • Taichi Nishimura, Shota Nakada, Masayoshi Kondo
The zero-shot QASIR yields two discoveries: (1) it enables VLMs to generalize to super images and (2) the grid size $N$, image resolution, and VLM size are key trade-off parameters between performance and computation costs.
no code implementations • 28 Nov 2023 • Takehiko Ohkawa, Takuma Yagi, Taichi Nishimura, Ryosuke Furuta, Atsushi Hashimoto, Yoshitaka Ushiku, Yoichi Sato
We propose a novel benchmark for cross-view knowledge transfer of dense video captioning, adapting models from web instructional videos with exocentric views to an egocentric view.
no code implementations • 21 Sep 2022 • Taichi Nishimura, Atsushi Hashimoto, Yoshitaka Ushiku, Hirotaka Kameko, Shinsuke Mori
However, unlike DVC, in recipe generation, recipe story awareness is crucial, and a model should extract an appropriate number of events in the correct order and generate accurate sentences based on them.
no code implementations • COLING 2022 • Keisuke Shirai, Atsushi Hashimoto, Taichi Nishimura, Hirotaka Kameko, Shuhei Kurita, Yoshitaka Ushiku, Shinsuke Mori
We present a new multimodal dataset called Visual Recipe Flow, which enables us to learn each cooking action result in a recipe text.
no code implementations • LREC 2020 • Taichi Nishimura, Suzushi Tomori, Hayato Hashimoto, Atsushi Hashimoto, Yoko Yamakata, Jun Harashima, Yoshitaka Ushiku, Shinsuke Mori
Visual grounding is provided as bounding boxes to image sequences of recipes, and each bounding box is linked to an element of the workflow.
no code implementations • WS 2019 • Taichi Nishimura, Atsushi Hashimoto, Shinsuke Mori
Multimedia procedural texts, such as instructions and manuals with pictures, support people to share how-to knowledge.