Search Results for author: Nakamasa Inoue

Found 26 papers, 9 papers with code

Formula-Supervised Visual-Geometric Pre-training

no code implementations20 Sep 2024 Ryosuke Yamada, Kensho Hara, Hirokatsu Kataoka, Koshi Makihara, Nakamasa Inoue, Rio Yokota, Yutaka Satoh

Throughout the history of computer vision, while research has explored the integration of images (visual) and point clouds (geometric), many advancements in image and 3D object recognition have tended to process these modalities separately.

3D Object Classification 3D Object Recognition +2

Scaling Backwards: Minimal Synthetic Pre-training?

1 code implementation1 Aug 2024 Ryo Nakamura, Ryu Tadokoro, Ryosuke Yamada, Yuki M. Asano, Iro Laina, Christian Rupprecht, Nakamasa Inoue, Rio Yokota, Hirokatsu Kataoka

To this end, we search for a minimal, purely synthetic pre-training dataset that allows us to achieve performance similar to the 1 million images of ImageNet-1k.

Transfer Learning

Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering

no code implementations30 Jul 2024 Ruoyue Shen, Nakamasa Inoue, Koichi Shinoda

Visual question answering (VQA) is the task of providing accurate answers to natural language questions based on visual input.

Code Generation Question Answering +2

AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering

no code implementations28 Jul 2024 Mahiro Ukai, Shuhei Kurita, Atsushi Hashimoto, Yoshitaka Ushiku, Nakamasa Inoue

In the inference phase, given an input question, AdaCoder predicts the question type and chooses the appropriate corresponding compressed preprompt to generate code to answer the question.

Question Answering Visual Question Answering

ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks

no code implementations28 Jul 2024 Nakamasa Inoue, Shinta Otake, Takumi Hirose, Masanari Ohi, Rei Kawakami

The L-adapters create paths from each encoder layer to the downstream head and help to extract non-linguistic features from lower encoder layers that are effective for speaker verification and emotion recognition.

Emotion Recognition parameter-efficient fine-tuning +4

CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information

no code implementations20 Jun 2024 Jungdae Lee, Taiki Miyanishi, Shuhei Kurita, Koya Sakamoto, Daichi Azuma, Yutaka Matsuo, Nakamasa Inoue

The findings are revealing: (i) our aerial agent model trained on human demonstration trajectories, outperform those trained on shortest path trajectories by a large margin; (ii) incorporating 2D spatial map information markedly and robustly enhances navigation performance at a city scale; (iii) despite the use of map information, our challenging CityNav dataset reveals a persistent performance gap between our baseline models and human performance.

Vision and Language Navigation

SegRCDB: Semantic Segmentation via Formula-Driven Supervised Learning

1 code implementation ICCV 2023 Risa Shinoda, Ryo Hayamizu, Kodai Nakashima, Nakamasa Inoue, Rio Yokota, Hirokatsu Kataoka

SegRCDB has a high potential to contribute to semantic segmentation pre-training and investigation by enabling the creation of large datasets without manual annotation.

Segmentation Semantic Segmentation

Pre-training Vision Transformers with Very Limited Synthesized Images

1 code implementation ICCV 2023 Ryo Nakamura, Hirokatsu Kataoka, Sora Takashima, Edgar Josafat Martinez Noriega, Rio Yokota, Nakamasa Inoue

Prior work on FDSL has shown that pre-training vision transformers on such synthetic datasets can yield competitive accuracy on a wide range of downstream tasks.

Data Augmentation

Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

no code implementations CVPR 2023 Sora Takashima, Ryo Hayamizu, Nakamasa Inoue, Hirokatsu Kataoka, Rio Yokota

Unlike JFT-300M which is a static dataset, the quality of synthetic datasets will continue to improve, and the current work is a testament to this possibility.

Fixed-Weight Difference Target Propagation

1 code implementation19 Dec 2022 Tatsukichi Shibuya, Nakamasa Inoue, Rei Kawakami, Ikuro Sato

Learning of the feedforward and feedback networks is sufficient to make TP methods capable of training, but is having these layer-wise autoencoders a necessary condition for TP to work?

PoF: Post-Training of Feature Extractor for Improving Generalization

1 code implementation5 Jul 2022 Ikuro Sato, Ryota Yamada, Masayuki Tanaka, Nakamasa Inoue, Rei Kawakami

We developed a training algorithm called PoF: Post-Training of Feature Extractor that updates the feature extractor part of an already-trained deep model to search a flatter minimum.

Replacing Labeled Real-image Datasets with Auto-generated Contours

no code implementations CVPR 2022 Hirokatsu Kataoka, Ryo Hayamizu, Ryosuke Yamada, Kodai Nakashima, Sora Takashima, Xinyu Zhang, Edgar Josafat Martinez-Noriega, Nakamasa Inoue, Rio Yokota

In the present work, we show that the performance of formula-driven supervised learning (FDSL) can match or even exceed that of ImageNet-21k without the use of real images, human-, and self-supervision during the pre-training of Vision Transformers (ViTs).

Can Vision Transformers Learn without Natural Images?

1 code implementation24 Mar 2021 Kodai Nakashima, Hirokatsu Kataoka, Asato Matsumoto, Kenji Iwata, Nakamasa Inoue

Moreover, although the ViT pre-trained without natural images produces some different visualizations from ImageNet pre-trained ViT, it can interpret natural image datasets to a large extent.

Fairness Self-Supervised Learning

Pre-training without Natural Images

2 code implementations21 Jan 2021 Hirokatsu Kataoka, Kazushige Okayasu, Asato Matsumoto, Eisuke Yamagata, Ryosuke Yamada, Nakamasa Inoue, Akio Nakamura, Yutaka Satoh

Is it possible to use convolutional neural networks pre-trained without any natural images to assist natural image understanding?

Initialization Using Perlin Noise for Training Networks with a Limited Amount of Data

no code implementations19 Jan 2021 Nakamasa Inoue, Eisuke Yamagata, Hirokatsu Kataoka

Our main idea is to initialize the network parameters by solving an artificial noise classification problem, where the aim is to classify Perlin noise samples into their noise categories.

Classification General Classification +1

Augmented Cyclic Consistency Regularization for Unpaired Image-to-Image Translation

no code implementations29 Feb 2020 Takehiko Ohkawa, Naoto Inoue, Hirokatsu Kataoka, Nakamasa Inoue

Herein, we propose Augmented Cyclic Consistency Regularization (ACCR), a novel regularization method for unpaired I2I translation.

Data Augmentation Image-to-Image Translation +1

Sequence-Level Knowledge Distillation for Model Compression of Attention-based Sequence-to-Sequence Speech Recognition

no code implementations12 Nov 2018 Raden Mu'az Mun'im, Nakamasa Inoue, Koichi Shinoda

We investigate the feasibility of sequence-level knowledge distillation of Sequence-to-Sequence (Seq2Seq) models for Large Vocabulary Continuous Speech Recognition (LVSCR).

Knowledge Distillation Model Compression +2

Few-Shot Adaptation for Multimedia Semantic Indexing

no code implementations19 Jul 2018 Nakamasa Inoue, Koichi Shinoda

Few-shot adaptation provides robust parameter estimation with few training examples, by optimizing the parameters of zero-shot learning and supervised many-shot learning simultaneously.

Few-Shot Learning Zero-Shot Learning

I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification

no code implementations1 Apr 2018 Jiacen Zhang, Nakamasa Inoue, Koichi Shinoda

I-vector based text-independent speaker verification (SV) systems often have poor performance with short utterances, as the biased phonetic distribution in a short utterance makes the extracted i-vector unreliable.

Generative Adversarial Network Text-Independent Speaker Verification

Cannot find the paper you are looking for? You can Submit a new open access paper.