no code implementations • 24 Dec 2024 • Kanoko Goto, Takumi Karasawa, Takumi Hirose, Rei Kawakami, Nakamasa Inoue
Small object detection aims to localize and classify small objects within images.
no code implementations • 19 Dec 2024 • Masanari Ohi, Masahiro Kaneko, Naoaki Okazaki, Nakamasa Inoue
Vision-language models (VLMs) have shown impressive abilities in text and image understanding.
no code implementations • 6 Oct 2024 • Yuto Nishimura, Takumi Hirose, Masanari Ohi, Hideki Nakayama, Nakamasa Inoue
Specifically, it incorporates the technique of using MRVQ sub-modules and continues training from a pre-trained LLM-based TTS model.
no code implementations • 20 Sep 2024 • Ryosuke Yamada, Kensho Hara, Hirokatsu Kataoka, Koshi Makihara, Nakamasa Inoue, Rio Yokota, Yutaka Satoh
Throughout the history of computer vision, while research has explored the integration of images (visual) and point clouds (geometric), many advancements in image and 3D object recognition have tended to process these modalities separately.
1 code implementation • 1 Sep 2024 • Go Ohtani, Ryu Tadokoro, Ryosuke Yamada, Yuki M. Asano, Iro Laina, Christian Rupprecht, Nakamasa Inoue, Rio Yokota, Hirokatsu Kataoka, Yoshimitsu Aoki
In this work, we investigate the understudied effect of the training data used for image super-resolution (SR).
1 code implementation • 1 Aug 2024 • Ryo Nakamura, Ryu Tadokoro, Ryosuke Yamada, Yuki M. Asano, Iro Laina, Christian Rupprecht, Nakamasa Inoue, Rio Yokota, Hirokatsu Kataoka
To this end, we search for a minimal, purely synthetic pre-training dataset that allows us to achieve performance similar to the 1 million images of ImageNet-1k.
no code implementations • 30 Jul 2024 • Ruoyue Shen, Nakamasa Inoue, Koichi Shinoda
Visual question answering (VQA) is the task of providing accurate answers to natural language questions based on visual input.
no code implementations • 28 Jul 2024 • Mahiro Ukai, Shuhei Kurita, Atsushi Hashimoto, Yoshitaka Ushiku, Nakamasa Inoue
In the inference phase, given an input question, AdaCoder predicts the question type and chooses the appropriate corresponding compressed preprompt to generate code to answer the question.
no code implementations • 28 Jul 2024 • Nakamasa Inoue, Shinta Otake, Takumi Hirose, Masanari Ohi, Rei Kawakami
The L-adapters create paths from each encoder layer to the downstream head and help to extract non-linguistic features from lower encoder layers that are effective for speaker verification and emotion recognition.
no code implementations • 20 Jun 2024 • Jungdae Lee, Taiki Miyanishi, Shuhei Kurita, Koya Sakamoto, Daichi Azuma, Yutaka Matsuo, Nakamasa Inoue
The findings are revealing: (i) our aerial agent model trained on human demonstration trajectories, outperform those trained on shortest path trajectories by a large margin; (ii) incorporating 2D spatial map information markedly and robustly enhances navigation performance at a city scale; (iii) despite the use of map information, our challenging CityNav dataset reveals a persistent performance gap between our baseline models and human performance.
1 code implementation • NeurIPS 2023 • Taiki Miyanishi, Fumiya Kitamori, Shuhei Kurita, Jungdae Lee, Motoaki Kawanabe, Nakamasa Inoue
To tackle this problem, we introduce the CityRefer dataset for city-level visual grounding.
1 code implementation • ICCV 2023 • Risa Shinoda, Ryo Hayamizu, Kodai Nakashima, Nakamasa Inoue, Rio Yokota, Hirokatsu Kataoka
SegRCDB has a high potential to contribute to semantic segmentation pre-training and investigation by enabling the creation of large datasets without manual annotation.
1 code implementation • ICCV 2023 • Ryo Nakamura, Hirokatsu Kataoka, Sora Takashima, Edgar Josafat Martinez Noriega, Rio Yokota, Nakamasa Inoue
Prior work on FDSL has shown that pre-training vision transformers on such synthetic datasets can yield competitive accuracy on a wide range of downstream tasks.
no code implementations • CVPR 2023 • Sora Takashima, Ryo Hayamizu, Nakamasa Inoue, Hirokatsu Kataoka, Rio Yokota
Unlike JFT-300M which is a static dataset, the quality of synthetic datasets will continue to improve, and the current work is a testament to this possibility.
1 code implementation • 19 Dec 2022 • Tatsukichi Shibuya, Nakamasa Inoue, Rei Kawakami, Ikuro Sato
Learning of the feedforward and feedback networks is sufficient to make TP methods capable of training, but is having these layer-wise autoencoders a necessary condition for TP to work?
1 code implementation • 5 Jul 2022 • Ikuro Sato, Ryota Yamada, Masayuki Tanaka, Nakamasa Inoue, Rei Kawakami
We developed a training algorithm called PoF: Post-Training of Feature Extractor that updates the feature extractor part of an already-trained deep model to search a flatter minimum.
no code implementations • CVPR 2022 • Hirokatsu Kataoka, Ryo Hayamizu, Ryosuke Yamada, Kodai Nakashima, Sora Takashima, Xinyu Zhang, Edgar Josafat Martinez-Noriega, Nakamasa Inoue, Rio Yokota
In the present work, we show that the performance of formula-driven supervised learning (FDSL) can match or even exceed that of ImageNet-21k without the use of real images, human-, and self-supervision during the pre-training of Vision Transformers (ViTs).
1 code implementation • 24 Mar 2021 • Kodai Nakashima, Hirokatsu Kataoka, Asato Matsumoto, Kenji Iwata, Nakamasa Inoue
Moreover, although the ViT pre-trained without natural images produces some different visualizations from ImageNet pre-trained ViT, it can interpret natural image datasets to a large extent.
2 code implementations • 21 Jan 2021 • Hirokatsu Kataoka, Kazushige Okayasu, Asato Matsumoto, Eisuke Yamagata, Ryosuke Yamada, Nakamasa Inoue, Akio Nakamura, Yutaka Satoh
Is it possible to use convolutional neural networks pre-trained without any natural images to assist natural image understanding?
no code implementations • 19 Jan 2021 • Nakamasa Inoue, Eisuke Yamagata, Hirokatsu Kataoka
Our main idea is to initialize the network parameters by solving an artificial noise classification problem, where the aim is to classify Perlin noise samples into their noise categories.
no code implementations • 16 Apr 2020 • Mariana Rodrigues Makiuchi, Tifani Warnita, Nakamasa Inoue, Koichi Shinoda, Michitaka Yoshimura, Momoko Kitazawa, Kei Funaki, Yoko Eguchi, Taishiro Kishimoto
We propose a non-invasive and cost-effective method to automatically detect dementia by utilizing solely speech audio data.
no code implementations • 29 Feb 2020 • Takehiko Ohkawa, Naoto Inoue, Hirokatsu Kataoka, Nakamasa Inoue
Herein, we propose Augmented Cyclic Consistency Regularization (ACCR), a novel regularization method for unpaired I2I translation.
no code implementations • 12 Nov 2018 • Raden Mu'az Mun'im, Nakamasa Inoue, Koichi Shinoda
We investigate the feasibility of sequence-level knowledge distillation of Sequence-to-Sequence (Seq2Seq) models for Large Vocabulary Continuous Speech Recognition (LVSCR).
no code implementations • 19 Jul 2018 • Nakamasa Inoue, Koichi Shinoda
Few-shot adaptation provides robust parameter estimation with few training examples, by optimizing the parameters of zero-shot learning and supervised many-shot learning simultaneously.
no code implementations • 30 May 2018 • Thao Minh Le, Nakamasa Inoue, Koichi Shinoda
This paper presents a new framework for human action recognition from a 3D skeleton sequence.
Ranked #111 on Skeleton Based Action Recognition on NTU RGB+D
no code implementations • 1 Apr 2018 • Jiacen Zhang, Nakamasa Inoue, Koichi Shinoda
I-vector based text-independent speaker verification (SV) systems often have poor performance with short utterances, as the biased phonetic distribution in a short utterance makes the extracted i-vector unreliable.
Generative Adversarial Network Text-Independent Speaker Verification