no code implementations • 22 Oct 2024 • Shota Onohara, Atsuyuki Miyai, Yuki Imajuku, Kazuki Egashira, Jeonghun Baek, Xiang Yue, Graham Neubig, Kiyoharu Aizawa
Accelerating research on Large Multimodal Models (LMMs) in non-English languages is crucial for enhancing user experiences across broader populations.
no code implementations • 27 Sep 2024 • Yuki Imajuku, Yoko Yamakata, Kiyoharu Aizawa
Research on food image understanding using recipe data has been a long-standing focus due to the diversity and complexity of the data.
no code implementations • 31 Aug 2024 • Sandra Zhang Ding, Jiafeng Mao, Kiyoharu Aizawa
We introduce latent optimization, a method that refines the noisy latent at each intermediate step of the generation process using cross-attention maps to ensure that the generated images closely adhere to the desired structure outlined in the reference sketch.
no code implementations • 26 Jul 2024 • Hikaru Ikuta, Leslie Wöhler, Kiyoharu Aizawa
MangaUB is designed to assess the recognition and understanding of content shown in a single panel as well as conveyed across multiple panels, allowing for a fine-grained analysis of a model's various capabilities required for manga understanding.
1 code implementation • 22 Apr 2024 • Yingxuan Li, Ryota Hinami, Kiyoharu Aizawa, Yusuke Matsui
To address this problem, we propose an iterative multimodal framework, the first to employ multimodal information for both character identification and speaker prediction tasks.
1 code implementation • 29 Mar 2024 • Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Qing Yu, Go Irie, Yixuan Li, Hai Li, Ziwei Liu, Kiyoharu Aizawa
This paper introduces a novel and significant challenge for Vision Language Models (VLMs), termed Unsolvable Problem Detection (UPD).
no code implementations • CVPR 2024 • Takashi Otonari, Satoshi Ikehata, Kiyoharu Aizawa
Recent advancements in the study of Neural Radiance Fields (NeRF) for dynamic scenes often involve explicit modeling of scene dynamics.
1 code implementation • 17 Dec 2023 • Jeonghun Baek, Yusuke Matsui, Kiyoharu Aizawa
We aim to find the condition that exploits knowledge from high-resource languages for improving performance in low-resource languages.
1 code implementation • 13 Dec 2023 • Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa
Text-to-image diffusion models allow users control over the content of generated images.
1 code implementation • CVPR 2024 • Daichi Horita, Naoto Inoue, Kotaro Kikuchi, Kota Yamaguchi, Kiyoharu Aizawa
We show that a simple retrieval augmentation can significantly improve the generation quality.
1 code implementation • 2 Oct 2023 • Atsuyuki Miyai, Qing Yu, Go Irie, Kiyoharu Aizawa
We consider that such data may significantly affect the performance of large pre-trained networks because the discriminability of these OOD data depends on the pre-training algorithm.
Out-of-Distribution Detection Out of Distribution (OOD) Detection
no code implementations • 30 Jul 2023 • Qing Yu, Go Irie, Kiyoharu Aizawa
Unsupervised domain adaptation (UDA) has proven to be very effective in transferring knowledge obtained from a source domain with labeled data to a target domain with unlabeled data.
2 code implementations • 30 Jun 2023 • Yingxuan Li, Kiyoharu Aizawa, Yusuke Matsui
For further understanding of comics, an automated approach is needed to link text in comics to characters speaking the words.
1 code implementation • NeurIPS 2023 • Atsuyuki Miyai, Qing Yu, Go Irie, Kiyoharu Aizawa
CLIP's local features have a lot of ID-irrelevant nuisances (e. g., backgrounds), and by learning to push them away from the ID class text embeddings, we can remove the nuisances in the ID class text embeddings and enhance the separation between ID and OOD.
Out-of-Distribution Detection Out of Distribution (OOD) Detection
1 code implementation • 5 May 2023 • Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa
Diffusion models have the ability to generate high quality images by denoising pure Gaussian noise images.
2 code implementations • 10 Apr 2023 • Atsuyuki Miyai, Qing Yu, Go Irie, Kiyoharu Aizawa
Zero-shot out-of-distribution (OOD) detection is a task that detects OOD images during inference with only in-distribution (ID) class names.
1 code implementation • 1 Mar 2023 • Koki Tsubota, Kiyoharu Aizawa
The experimental results reveal that the best approximated quantization differs by the network architectures, and the best approximations of the three are different from the original ones used for the architectures.
no code implementations • 7 Dec 2022 • Takashi Otonari, Satoshi Ikehata, Kiyoharu Aizawa
We propose two non-uniform ray sampling schemes for NeRF to suit 360{\textdegree} images - distortion-aware ray sampling and content-aware ray sampling.
1 code implementation • 18 Nov 2022 • Daichi Horita, Jiaolong Yang, Dong Chen, Yuki Koyama, Kiyoharu Aizawa, Nicu Sebe
The structure generator generates an edge image representing plausible structures within the holes, which is then used for guiding the texture generation process.
1 code implementation • 2 Nov 2022 • Koki Tsubota, Hiroaki Akutsu, Kiyoharu Aizawa
This task aims to compress images belonging to arbitrary domains, such as natural images, line drawings, and comics.
1 code implementation • 23 Oct 2022 • Atsuyuki Miyai, Qing Yu, Daiki Ikami, Go Irie, Kiyoharu Aizawa
The semantics of an image can be rotation-invariant or rotation-variant, so whether the rotated image is treated as positive or negative should be determined based on the content of the image.
no code implementations • 8 Sep 2022 • Yuuki Sawabe, Satoshi Ikehata, Kiyoharu Aizawa
360{\deg} images are informative -- it contains omnidirectional visual information around the camera.
no code implementations • 20 Jul 2022 • Koki Tsubota, Hiroaki Akutsu, Kiyoharu Aizawa
We comprehensively evaluate four deep IQAs on the same five datasets, and the experimental results show that image scale significantly influences IQA performance.
1 code implementation • 11 Jul 2022 • Jeonghun Baek, Yusuke Matsui, Kiyoharu Aizawa
To encourage research on this topic, we provide a novel comic onomatopoeia dataset (COO), which consists of onomatopoeia texts in Japanese comics.
no code implementations • 21 Jun 2022 • Haruka Aoki, Kiyoharu Aizawa
Designing fonts for Chinese characters is highly labor-intensive and time-consuming.
no code implementations • 10 Apr 2022 • Naoki Sugimoto, Satoshi Ikehata, Kiyoharu Aizawa
We constructed a large-scale 360{\deg} Image Intersection Identification (iii360) dataset for training and evaluation where 360{\deg} videos were collected from various areas such as school campus, downtown, suburb, and china town and demonstrate that our PDoT-based method achieves 88\% accuracy, which is significantly better than that achieved by the direct naive binary classification based method.
no code implementations • 3 Apr 2022 • Yuya Hasegawa, Ikehata Satoshi, Kiyoharu Aizawa
We propose a framework of direct use of ERP with coordinate conversion of correspondences and distortion-aware upsampling module to deal with the ERP related problems and extend a self-supervised learning method for open environments.
no code implementations • 7 Feb 2022 • Miao Cao, Satoshi Ikehata, Kiyoharu Aizawa
360{\deg} cameras have gained popularity over the last few years.
no code implementations • 20 Oct 2021 • Jiafeng Mao, Qing Yu, Yoko Yamakata, Kiyoharu Aizawa
In this study, we propose a new problem setting of training object detectors on datasets with entangled noises of annotations of class labels and bounding boxes.
no code implementations • 7 May 2021 • Jinjin Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Yu Qiao, Shuhang Gu, Radu Timofte, Manri Cheon, SungJun Yoon, Byungyeon Kang, Junwoo Lee, Qing Zhang, Haiyang Guo, Yi Bin, Yuqing Hou, Hengliang Luo, Jingyu Guo, ZiRui Wang, Hai Wang, Wenming Yang, Qingyan Bai, Shuwei Shi, Weihao Xia, Mingdeng Cao, Jiahao Wang, Yifan Chen, Yujiu Yang, Yang Li, Tao Zhang, Longtao Feng, Yiting Liao, Junlin Li, William Thong, Jose Costa Pereira, Ales Leonardis, Steven McDonagh, Kele Xu, Lehan Yang, Hengxing Cai, Pengfei Sun, Seyed Mehdi Ayyoubzadeh, Ali Royat, Sid Ahmed Fezza, Dounia Hammou, Wassim Hamidouche, Sewoong Ahn, Gwangjin Yoon, Koki Tsubota, Hiroaki Akutsu, Kiyoharu Aizawa
This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021.
no code implementations • 8 Mar 2021 • Daiki Tanaka, Daiki Ikami, Kiyoharu Aizawa
Positive-unlabeled learning refers to the process of training a binary classifier using only positive and unlabeled data.
1 code implementation • CVPR 2021 • Jeonghun Baek, Yusuke Matsui, Kiyoharu Aizawa
To the best of our knowledge, this is the first study that 1) shows sufficient performance by only using real labels and 2) introduces semi- and self-supervised methods into STR with fewer labels.
no code implementations • 17 Nov 2020 • Naoki Sugimoto, Yoshihito Ebine, Kiyoharu Aizawa
Frames of the video are localized on the map, intersections are detected, and videos are segmented.
no code implementations • 4 Nov 2020 • Haruka Aoki, Koki Tsubota, Hikaru Ikuta, Kiyoharu Aizawa
Designing fonts for languages with a large number of characters, such as Japanese and Chinese, is an extremely labor-intensive and time-consuming task.
no code implementations • 3 Nov 2020 • Takumi Kawashima, Qing Yu, Akari Asai, Daiki Ikami, Kiyoharu Aizawa
We propose a new optimization framework for aleatoric uncertainty estimation in regression problems.
no code implementations • 16 Sep 2020 • Daichi Horita, Kiyoharu Aizawa
Furthermore, we show that our proposal can interpolate facial makeup images to determine the unique features, compare existing methods, and help users find desirable makeup configurations.
no code implementations • ECCV 2020 • Qing Yu, Daiki Ikami, Go Irie, Kiyoharu Aizawa
Semi-supervised learning (SSL) has been proposed to leverage unlabeled data for training powerful models when only limited labeled data is available.
1 code implementation • 15 Jul 2020 • Zhisheng Zhong, Hiroaki Akutsu, Kiyoharu Aizawa
In this paper, we propose a channel-level variable quantization network to dynamically allocate more bitrates for significant channels and withdraw bitrates for negligible channels.
3 code implementations • 9 May 2020 • Kiyoharu Aizawa, Azuma Fujimoto, Atsushi Otsubo, Toru Ogawa, Yusuke Matsui, Koki Tsubota, Hikaru Ikuta
Manga, or comics, which are a type of multimodal artwork, have been left behind in the recent trend of deep learning applications because of the lack of a proper dataset.
no code implementations • 28 Mar 2020 • Mohammed Almatrafi, Raymond Baldwin, Kiyoharu Aizawa, Keigo Hirakawa
We propose DistSurf-OF, a novel optical flow method for neuromorphic cameras.
1 code implementation • ICCV 2019 • Qing Yu, Kiyoharu Aizawa
Unlike previous methods, we also utilize unlabeled data for unsupervised training and we use these unlabeled data to maximize the discrepancy between the decision boundaries of two classifiers to push OOD samples outside the manifold of the in-distribution (ID) samples, which enables us to detect OOD samples that are far from the support of the ID samples.
Out-of-Distribution Detection Out of Distribution (OOD) Detection
no code implementations • ICCV 2019 • Satoshi Kosugi, Toshihiko Yamasaki, Kiyoharu Aizawa
Weakly supervised object detection (WSOD), where a detector is trained with only image-level annotations, is attracting more and more attention.
no code implementations • 3 May 2019 • Masaya Kaneko, Ken Sakurada, Kiyoharu Aizawa
We propose a novel and efficient representation for single-view depth estimation using Convolutional Neural Networks (CNNs).
no code implementations • 18 Apr 2019 • Onkar Krishna, Kiyoharu Aizawa, Go Irie
Observer's of different age-group have shown different scene viewing tendencies independent to the class of the image viewed.
no code implementations • 3 Mar 2019 • Masashi Anzawa, Sosuke Amano, Yoko Yamakata, Keiko Motonaga, Akiko Kamei, Kiyoharu Aizawa
We investigate image recognition of multiple food items in a single photo, focusing on a buffet restaurant application, where menu changes at every meal, and only a few images per class are available.
2 code implementations • 3 Sep 2018 • Junjun Jiang, Yi Yu, Suhua Tang, Jiayi Ma, Akiko Aizawa, Kiyoharu Aizawa
To this end, this study incorporates the contextual information of image patch and proposes a powerful and efficient context-patch based face hallucination approach, namely Thresholding Locality-constrained Representation and Reproducing learning (TLcR-RL).
no code implementations • 26 Aug 2018 • Kazuya Iwami, Satoshi Ikehata, Kiyoharu Aizawa
Camera geo-localization from a monocular video is a fundamental task for video analysis and autonomous navigation.
no code implementations • CVPR 2018 • Daiki Ikami, Toshihiko Yamasaki, Kiyoharu Aizawa
We propose a local optimization method, which is widely applicable to graph-based clustering cost functions.
no code implementations • CVPR 2018 • Daiki Ikami, Toshihiko Yamasaki, Kiyoharu Aizawa
M-estimator using iteratively reweighted least squares (IRLS) is one of the best-known methods for robust estimation.
no code implementations • 8 May 2018 • Yi Yu, Suhua Tang, Kiyoharu Aizawa, Akiko Aizawa
Given a photo as input, this model performs (i) exact venue search (find the venue where the photo was taken), and (ii) group venue search (find relevant venues with the same category as that of the photo), by the cross-modal correlation between the input photo and textual description of venues.
no code implementations • 8 Apr 2018 • Shota Horiguchi, Sosuke Amano, Makoto Ogawa, Kiyoharu Aizawa
In this paper, we address the personalization problem, which involves adapting to the user's domain incrementally using a very limited number of samples.
3 code implementations • CVPR 2018 • Naoto Inoue, Ryosuke Furuta, Toshihiko Yamasaki, Kiyoharu Aizawa
Can we detect common objects in a variety of image domains without instance-level annotations?
Ranked #5 on Weakly Supervised Object Detection on Watercolor2k (using extra training data)
1 code implementation • 30 Mar 2018 • Akito Takeki, Daiki Ikami, Go Irie, Kiyoharu Aizawa
Convolutional neural network (CNN) architectures utilize downsampling layers, which restrict the subsequent layers to learn spatially invariant features while reducing computational costs.
1 code implementation • CVPR 2018 • Daiki Tanaka, Daiki Ikami, Toshihiko Yamasaki, Kiyoharu Aizawa
Deep neural networks (DNNs) trained on large-scale datasets have exhibited significant performance in image classification.
Ranked #39 on Image Classification on Clothing1M
5 code implementations • 23 Mar 2018 • Toru Ogawa, Atsushi Otsubo, Rei Narita, Yusuke Matsui, Toshihiko Yamasaki, Kiyoharu Aizawa
We annotated an existing image dataset of comics and created the largest annotation dataset, named Manga109-annotations.
no code implementations • 29 Dec 2017 • Shota Horiguchi, Daiki Ikami, Kiyoharu Aizawa
However, in these DML studies, there were no equitable comparisons between features extracted from a DML-based network and those from a softmax-based network.
1 code implementation • 12 Sep 2017 • Yusuke Matsui, Keisuke Ogaki, Toshihiko Yamasaki, Kiyoharu Aizawa
Data clustering is a fundamental operation in data analysis.
no code implementations • CVPR 2017 • Ionut Cosmin Duta, Bogdan Ionescu, Kiyoharu Aizawa, Nicu Sebe
The proposed method addresses an important problem of video understanding: how to build a video representation that incorporates the CNN features over the entire video.
Action Recognition In Videos Temporal Action Localization +1
1 code implementation • 21 Jun 2017 • Paulina Hensman, Kiyoharu Aizawa
The final results are sharp, clear, and in high resolution, and stay true to the character's original color scheme.
no code implementations • CVPR 2017 • Daiki Ikami, Toshihiko Yamasaki, Kiyoharu Aizawa
We propose the residual expansion (RE) algorithm: a global (or near-global) optimization method for nonconvex least squares problems.
no code implementations • 20 May 2017 • Onkar Krishna, Kiyoharu Aizawa, Andrea Helo, Rama Pia
In this paper, we investigated how visual scene processing changes with age and we propose an age-adapted framework that helps to develop a computational model that can predict saliency across different age groups.
no code implementations • 21 Apr 2017 • Yusuke Matsui, Toshihiko Yamasaki, Kiyoharu Aizawa
In this paper, we propose a product quantization table (PQTable); a fast search method for product-quantized codes via hash-tables.
no code implementations • CVPR 2016 • Keisuke Midorikawa, Toshihiko Yamasaki, Kiyoharu Aizawa
We propose a model that represents various isotropic reflectance functions by using the principal components of items in a dataset, and formulate the uncalibrated photometric stereo as a regression problem.
no code implementations • ICCV 2015 • Yusuke Matsui, Toshihiko Yamasaki, Kiyoharu Aizawa
We propose the product quantization table (PQTable), a product quantization-based hash table that is fast and requires neither parameter tuning nor training steps.
no code implementations • 15 Oct 2015 • Yusuke Matsui, Kota Ito, Yuji Aramaki, Toshihiko Yamasaki, Kiyoharu Aizawa
From the experiments, we verified that: (1) the retrieval accuracy of the proposed method is higher than those of previous methods; (2) the proposed method can localize an object instance with reasonable runtime and accuracy; and (3) sketch querying is useful for manga search.
no code implementations • CVPR 2014 • Satoshi Ikehata, Kiyoharu Aizawa
This paper presents a photometric stereo method that is purely pixelwise and handles general isotropic surfaces in a stable manner.