Search Results for author: Yasutomo Kawanishi

Found 11 papers, 5 papers with code

J-CRe3: A Japanese Conversation Dataset for Real-world Reference Resolution

2 code implementations • 28 Mar 2024 • Nobuhiro Ueda, Hideko Habe, Yoko Matsui, Akishige Yuguchi, Seiya Kawano, Yasutomo Kawanishi, Sadao Kurohashi, Koichiro Yoshino

Understanding expressions that refer to the physical world is crucial for such human-assisting systems in the real world, as robots that must perform actions that are expected by users.

Paper
Code

A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions

1 code implementation • 26 Mar 2024 • Shun Inadumi, Seiya Kawano, Akishige Yuguchi, Yasutomo Kawanishi, Koichiro Yoshino

Such ambiguities in questions are often clarified by the contexts in conversational situations, such as joint attention with a user or user gaze information.

Gaze Target Estimation Question Answering +1

Paper
Code

Multi-View Video-Based Learning: Leveraging Weak Labels for Frame-Level Perception

no code implementations • 18 Mar 2024 • Vijay John, Yasutomo Kawanishi

In this paper, we propose a novel learning framework, where the weak labels are first used to train a multi-view video-based base model, which is subsequently used for downstream frame-level perception tasks.

Action Recognition

Paper
Add Code

ManifoldNeRF: View-dependent Image Feature Supervision for Few-shot Neural Radiance Fields

no code implementations • 20 Oct 2023 • Daiju Kanaoka, Motoharu Sonogashira, Hakaru Tamukoh, Yasutomo Kawanishi

DietNeRF is an extension of NeRF that aims to achieve this task from only a few images by introducing a new loss function for unknown viewpoints with no input images.

Novel View Synthesis

Paper
Add Code

Small Object Detection for Birds with Swin Transformer

no code implementations • 18th International Conference on Machine Vision and Applications (MVA) 2023 • Da Huo, Marc A. Kastner, TingWei Liu, Yasutomo Kawanishi, Takatsugu Hirayama, Takahiro Komamizu, Ichiro Ide

Object detection is the task of detecting objects in an image.

Ranked #4 on Small Object Detection on SOD4SB Public Test (using extra training data)

Object object-detection +1

Paper
Add Code

MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results

1 code implementation • 18 Jul 2023 • Yuki Kondo, Norimichi Ukita, Takayuki Yamaguchi, Hao-Yu Hou, Mu-Yi Shen, Chia-Chi Hsu, En-Ming Huang, Yu-Chen Huang, Yu-Cheng Xia, Chien-Yao Wang, Chun-Yi Lee, Da Huo, Marc A. Kastner, TingWei Liu, Yasutomo Kawanishi, Takatsugu Hirayama, Takahiro Komamizu, Ichiro Ide, Yosuke Shinya, Xinyao Liu, Guang Liang, Syusuke Yasui

Small Object Detection (SOD) is an important machine vision topic because (i) a variety of real-world applications require object detection for distant objects and (ii) SOD is a challenging task due to the noisy, blurred, and less-informative image appearances of small objects.

Ranked #2 on Small Object Detection on SOD4SB Public Test (using extra training data)

Object object-detection +1

Paper
Code

DeePoint: Visual Pointing Recognition and Direction Estimation

no code implementations • ICCV 2023 • Shu Nakamura, Yasutomo Kawanishi, Shohei Nobuhara, Ko Nishino

The first is the introduction of a first-of-its-kind large-scale dataset for pointing recognition and direction estimation, which we refer to as the DP Dataset.

Paper
Add Code

IPA-CLIP: Integrating Phonetic Priors into Vision and Language Pretraining

no code implementations • 6 Mar 2023 • Chihaya Matsuhira, Marc A. Kastner, Takahiro Komamizu, Takatsugu Hirayama, Keisuke Doman, Yasutomo Kawanishi, Ichiro Ide

Furthermore, in some multimodal retrieval tasks, we confirm that the proposed pronunciation encoder enhances the performance of the text encoder and that the pronunciation encoder handles nonsense words in a more phonetic manner than the text encoder.

Retrieval

Paper
Add Code

A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition

no code implementations • 20 Oct 2022 • Vijay John, Yasutomo Kawanishi

In the framework, a novel deep latent embedding framework, termed the AVTNet, is proposed to learn multiple latent embeddings.

Person Recognition Sensor Fusion

Paper
Add Code

SDOF-Tracker: Fast and Accurate Multiple Human Tracking by Skipped-Detection and Optical-Flow

1 code implementation • 27 Jun 2021 • Hitoshi Nishimura, Satoshi Komorita, Yasutomo Kawanishi, Hiroshi Murase

To maintain the tracking accuracy, we introduce robust interest point selection within human regions and a tracking termination metric calculated by the distribution of the interest points.

Human Detection Optical Flow Estimation +1

Paper
Code

Multiple Human Tracking using Multi-Cues including Primitive Action Features

1 code implementation • 18 Sep 2019 • Hitoshi Nishimura, Kazuyuki Tasaka, Yasutomo Kawanishi, Hiroshi Murase

The accurate human tracking result using PAF helps multi-frame-based action recognition.

Action Recognition

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.