Search Results for author: Takayuki Okatani

Found 61 papers, 20 papers with code

Temporal Insight Enhancement: Mitigating Temporal Hallucination in Multimodal Large Language Models

no code implementations18 Jan 2024 Li Sun, Liuan Wang, Jun Sun, Takayuki Okatani

This study introduces an innovative method to address event-level hallucinations in MLLMs, focusing on specific temporal understanding in video content.

Hallucination

SBCFormer: Lightweight Network Capable of Full-size ImageNet Classification at 1 FPS on Single Board Computers

1 code implementation7 Nov 2023 Xiangyong Lu, Masanori Suganuma, Takayuki Okatani

For the first time, it achieves an ImageNet-1K top-1 accuracy of around 80% at a speed of 1. 0 frame/sec on the SBC.

Management

Visual Abductive Reasoning Meets Driving Hazard Prediction

1 code implementation7 Oct 2023 Korawat Charoenpitaks, Van-Quang Nguyen, Masanori Suganuma, Masahiro Takahashi, Ryoma Niihara, Takayuki Okatani

To enable research in this understudied area, a new dataset named the DHPR (Driving Hazard Prediction and Reasoning) dataset is created.

Anomaly Detection Visual Abductive Reasoning

That's BAD: Blind Anomaly Detection by Implicit Local Feature Clustering

no code implementations6 Jul 2023 Jie Zhang, Masanori Suganuma, Takayuki Okatani

They consider an unsupervised setting, specifically the one-class setting, in which we assume the availability of a set of normal (\textit{i. e.}, anomaly-free) images for training.

Anomaly Detection Clustering +1

Reference-based Motion Blur Removal: Learning to Utilize Sharpness in the Reference Image

no code implementations6 Jul 2023 Han Zou, Masanori Suganuma, Takayuki Okatani

We can utilize an alternative shot of the identical scene, just like in video deblurring, or we can even employ a distinct image from another scene.

Deblurring Image Deblurring

RefVSR++: Exploiting Reference Inputs for Reference-based Video Super-resolution

no code implementations6 Jul 2023 Han Zou, Masanori Suganuma, Takayuki Okatani

Then, we propose an improved method, RefVSR++, which can aggregate two features in parallel in the temporal direction, one for aggregating the fused LR and Ref inputs and the other for Ref inputs over time.

Reference-based Video Super-Resolution Video Super-Resolution

Contextual Affinity Distillation for Image Anomaly Detection

no code implementations6 Jul 2023 Jie Zhang, Masanori Suganuma, Takayuki Okatani

The local student, which is used in previous studies mainly focuses on structural anomaly detection while the global student pays attention to logical anomalies.

Anomaly Detection Knowledge Distillation

Bridge Damage Cause Estimation Using Multiple Images Based on Visual Question Answering

no code implementations18 Feb 2023 Tatsuro Yamane, Pang-jo Chun, Ji Dang, Takayuki Okatani

For this, a VQA model was developed that uses bridge images for dataset creation and outputs the damage or member name and its existence based on the images and questions.

Question Answering Visual Question Answering

SuperGF: Unifying Local and Global Features for Visual Localization

no code implementations23 Dec 2022 Wenzheng Song, Ran Yan, Boshu Lei, Takayuki Okatani

In this study, we present a novel method called SuperGF, which effectively unifies local and global features for visual localization, leading to a higher trade-off between localization accuracy and computational efficiency.

Computational Efficiency Image Retrieval +4

GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features

2 code implementations20 Jul 2022 Van-Quang Nguyen, Masanori Suganuma, Takayuki Okatani

Current state-of-the-art methods for image captioning employ region-based features, as they provide object-level information that is essential to describe the content of images; they are usually extracted by an object detector such as Faster R-CNN.

Image Captioning

Rectifying Open-set Object Detection: A Taxonomy, Practical Applications, and Proper Evaluation

no code implementations20 Jul 2022 Yusuke Hosoya, Masanori Suganuma, Takayuki Okatani

In this paper, we first point out that the recent studies' formalization of OSOD, which generalizes open-set recognition (OSR) and thus considers an unlimited variety of unknown objects, has a fundamental issue.

Image Classification Object +3

Single-image Defocus Deblurring by Integration of Defocus Map Prediction Tracing the Inverse Problem Computation

no code implementations7 Jul 2022 Qian Ye, Masanori Suganuma, Takayuki Okatani

Considering the spatial variant property of the defocus blur and the blur level indicated in the defocus map, we employ the defocus map as conditional guidance to adjust the features from the input blurring images instead of simple concatenation.

Deblurring Image Deblurring +1

Learning Regularized Multi-Scale Feature Flow for High Dynamic Range Imaging

no code implementations6 Jul 2022 Qian Ye, Masanori Suganuma, Jun Xiao, Takayuki Okatani

Reconstructing ghosting-free high dynamic range (HDR) images of dynamic scenes from a set of multi-exposure images is a challenging task, especially with large object motion and occlusions, leading to visible artifacts using existing methods.

Vocal Bursts Intensity Prediction

Rethinking Unsupervised Domain Adaptation for Semantic Segmentation

2 code implementations30 Jun 2022 Zhijie Wang, Masanori Suganuma, Takayuki Okatani

Due to its high annotation cost, researchers have developed many UDA methods for semantic segmentation, which assume no labeled sample is available in the target domain.

Semantic Segmentation Unsupervised Domain Adaptation

Symmetry-aware Neural Architecture for Embodied Visual Navigation

no code implementations17 Dec 2021 Shuang Liu, Takayuki Okatani

We then propose a network design for the actor and the critic to inherently attain these symmetries.

Reinforcement Learning (RL) Visual Navigation

Improved Few-shot Segmentation by Redefinition of the Roles of Multi-level CNN Features

no code implementations14 Sep 2021 Zhijie Wang, Masanori Suganuma, Takayuki Okatani

This study is concerned with few-shot segmentation, i. e., segmenting the region of an unseen object class in a query image, given support image(s) of its instances.

Cross-Region Domain Adaptation for Class-level Alignment

no code implementations14 Sep 2021 Zhijie Wang, Xing Liu, Masanori Suganuma, Takayuki Okatani

To cope with this, we propose a method that applies adversarial training to align two feature distributions in the target domain.

Semantic Segmentation Synthetic-to-Real Translation +1

Matching in the Dark: A Dataset for Matching Image Pairs of Low-light Scenes

no code implementations ICCV 2021 Wenzheng Song, Masanori Suganuma, Xing Liu, Noriyuki Shimobayashi, Daisuke Maruta, Takayuki Okatani

To consider if and how well we can utilize such information stored in RAW-format images for image matching, we have created a new dataset named MID (matching in the dark).

Look Wide and Interpret Twice: Improving Performance on Interactive Instruction-following Tasks

1 code implementation1 Jun 2021 Van-Quang Nguyen, Masanori Suganuma, Takayuki Okatani

It then integrates the prediction with the visual information etc., yielding the final prediction of an action and an object.

Instruction Following

Pushing the Envelope of Thin Crack Detection

no code implementations9 Jan 2021 Liang Xu, Taro Hatsutani, Xing Liu, Engkarat Techapanurak, Han Zou, Takayuki Okatani

We experimentally show that this makes it possible to detect cracks from an image of one-third the resolution of images used for annotation with about the same accuracy.

Bridging In- and Out-of-distribution Samples for Their Better Discriminability

no code implementations7 Jan 2021 Engkarat Techapanurak, Anh-Chuong Dang, Takayuki Okatani

We estimate where the generated samples by a single image transformation lie between ID and OOD using a network trained on clean ID samples.

Out of Distribution (OOD) Detection

How Can CNNs Use Image Position for Segmentation?

no code implementations7 May 2020 Rito Murase, Masanori Suganuma, Takayuki Okatani

We draw a mixed conclusion from the experimental results; the positional encoding certainly works in some cases, but the absolute image position may not be so important for segmentation tasks as we think.

Image Segmentation Medical Image Segmentation +4

Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs

1 code implementation ECCV 2020 Van-Quang Nguyen, Masanori Suganuma, Takayuki Okatani

It has been a primary concern in recent studies of vision and language tasks to design an effective attention mechanism dealing with interactions between the two modalities.

Visual Dialog

Analysis of Deep Networks for Monocular Depth Estimation Through Adversarial Attacks with Proposal of a Defense Method

no code implementations20 Nov 2019 Junjie Hu, Takayuki Okatani

However, the prediction of saliency maps is itself vulnerable to the attacks, even though it is not the direct target of the attacks.

Monocular Depth Estimation

Analysis and a Solution of Momentarily Missed Detection for Anchor-based Object Detectors

no code implementations21 Oct 2019 Yusuke Hosoya, Masanori Suganuma, Takayuki Okatani

The employment of convolutional neural networks has led to significant performance improvement on the task of object detection.

object-detection Object Detection

Restoring Images with Unknown Degradation Factors by Recurrent Use of a Multi-branch Network

1 code implementation10 Jul 2019 Xing Liu, Masanori Suganuma, Xiyang Luo, Takayuki Okatani

The employment of convolutional neural networks has achieved unprecedented performance in the task of image restoration for a variety of degradation factors.

Deblurring JPEG Artifact Removal +1

Evaluating Artificial Systems for Pairwise Ranking Tasks Sensitive to Individual Differences

no code implementations30 May 2019 Xing Liu, Takayuki Okatani

There is another type of tasks for which what to predict is human perception itself, in which there are often individual differences.

Hyperparameter-Free Out-of-Distribution Detection Using Softmax of Scaled Cosine Similarity

1 code implementation25 May 2019 Engkarat Techapanurak, Masanori Suganuma, Takayuki Okatani

The ability to detect out-of-distribution (OOD) samples is vital to secure the reliability of deep neural networks in real-world applications.

Metric Learning Out-of-Distribution Detection

Visualization of Convolutional Neural Networks for Monocular Depth Estimation

1 code implementation ICCV 2019 Junjie Hu, Yan Zhang, Takayuki Okatani

We formulate it as an optimization problem of identifying the smallest number of image pixels from which the CNN can estimate a depth map with the minimum difference from the estimate from the entire image.

Interpretable Machine Learning

Toward Explainable Fashion Recommendation

no code implementations15 Jan 2019 Pongsate Tangseng, Takayuki Okatani

For this purpose, we propose a method for quantifying how influential each feature of each item is to the score.

Multi-task Learning of Hierarchical Vision-Language Representation

no code implementations CVPR 2019 Duy-Kien Nguyen, Takayuki Okatani

The representation is hierarchical, and prediction for each task is computed from the representation at its corresponding level of the hierarchy.

Multi-Task Learning Question Answering +3

Feature Quantization for Defending Against Distortion of Images

no code implementations CVPR 2018 Zhun Sun, Mete Ozay, Yan Zhang, Xing Liu, Takayuki Okatani

In this work, we address the problem of improving robustness of convolutional neural networks (CNNs) to image distortion.

Quantization

Recommending Outfits from Personal Closet

no code implementations26 Apr 2018 Pongsate Tangseng, Kota Yamaguchi, Takayuki Okatani

We consider grading a fashion outfit for recommendation, where we assume that users have a closet of items and we aim at producing a score for an arbitrary combination of items in the closet.

General Classification

Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering

1 code implementation CVPR 2018 Duy-Kien Nguyen, Takayuki Okatani

A key solution to visual question answering (VQA) exists in how to fuse visual and language features extracted from an input image and question.

Visual Question Answering

Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps with Accurate Object Boundaries

4 code implementations23 Mar 2018 Junjie Hu, Mete Ozay, Yan Zhang, Takayuki Okatani

Experimental results show that these two improvements enable to attain higher accuracy than the current state-of-the-arts, which is given by finer resolution reconstruction, for example, with small objects and object boundaries.

Monocular Depth Estimation

Exploiting the Potential of Standard Convolutional Autoencoders for Image Restoration by Evolutionary Search

1 code implementation ICML 2018 Masanori Suganuma, Mete Ozay, Takayuki Okatani

Researchers have applied deep neural networks to image restoration tasks, in which they proposed various network architectures, loss functions, and training methods.

Image Restoration

Deep Structured Energy-Based Image Inpainting

1 code implementation24 Jan 2018 Fazil Altinel, Mete Ozay, Takayuki Okatani

In this paper, we propose a structured image inpainting method employing an energy based model.

Image Inpainting Structured Prediction

A vision based system for underwater docking

no code implementations12 Dec 2017 Shuang Liu, Mete Ozay, Takayuki Okatani, Hongli Xu, Kai Sun, Yang Lin

In the experiments, we first evaluate performance of the proposed detection module on UDID and its deformed variations.

Pose Estimation Position

HyperNetworks with statistical filtering for defending adversarial examples

no code implementations6 Nov 2017 Zhun Sun, Mete Ozay, Takayuki Okatani

This problem was addressed by employing several defense methods for detection and rejection of particular types of attacks.

General Classification Image Classification

End-to-end learning potentials for structured attribute prediction

no code implementations6 Aug 2017 Kota Yamaguchi, Takayuki Okatani, Takayuki Umeda, Kazuhiko Murasaki, Kyoko Sudo

We present a structured inference approach in deep neural networks for multiple attribute prediction.

Attribute

Linear Discriminant Generative Adversarial Networks

no code implementations25 Jul 2017 Zhun Sun, Mete Ozay, Takayuki Okatani

We develop a novel method for training of GANs for unsupervised and class conditional generation of images, called Linear Discriminant GAN (LD-GAN).

Improving Robustness of Feature Representations to Image Deformations using Powered Convolution in CNNs

no code implementations25 Jul 2017 Zhun Sun, Mete Ozay, Takayuki Okatani

In this work, we address the problem of improvement of robustness of feature representations learned using convolutional neural networks (CNNs) to image deformation.

object-detection Object Detection +1

Information Potential Auto-Encoders

no code implementations14 Jun 2017 Yan Zhang, Mete Ozay, Zhun Sun, Takayuki Okatani

In order to estimate the entropy of the encoding variables and the mutual information, we propose a non-parametric method.

Representation Learning

Truncating Wide Networks using Binary Tree Architectures

1 code implementation ICCV 2017 Yan Zhang, Mete Ozay, Shuo-Hao Li, Takayuki Okatani

By employing the proposed architecture on a baseline wide network, we can construct and train a new network with same depth but considerably less number of parameters.

General Classification Image Classification

Optimization on Product Submanifolds of Convolution Kernels

no code implementations22 Jan 2017 Mete Ozay, Takayuki Okatani

The results show that geometric adaptive step size computation methods of G-SGD can improve training loss and convergence properties of CNNs.

Optimization on Submanifolds of Convolution Kernels in CNNs

no code implementations22 Oct 2016 Mete Ozay, Takayuki Okatani

Following our theoretical results, we propose a SGD algorithm with assurance of almost sure convergence of the methods to a solution at single minimum of classification loss of CNNs.

General Classification Image Classification

Automatic Attribute Discovery with Neural Activations

1 code implementation25 Jul 2016 Sirion Vittayakorn, Takayuki Umeda, Kazuhiko Murasaki, Kyoko Sudo, Takayuki Okatani, Kota Yamaguchi

This paper proposes an automatic approach to discover and analyze visual attributes from a noisy collection of image-text data on the Web.

Attribute

Design of Kernels in Convolutional Neural Networks for Image Classification

1 code implementation30 Nov 2015 Zhun Sun, Mete Ozay, Takayuki Okatani

Despite the effectiveness of Convolutional Neural Networks (CNNs) for image classification, our understanding of the relationship between shape of convolution kernels and learned representations is limited.

Classification General Classification +1

Integrating Deep Features for Material Recognition

no code implementations20 Nov 2015 Yan Zhang, Mete Ozay, Xing Liu, Takayuki Okatani

We propose a method for integration of features extracted using deep representations of Convolutional Neural Networks (CNNs) each of which is learned using a different image dataset of objects and materials for material recognition.

feature selection Material Recognition +1

Transformation of Markov Random Fields for Marginal Distribution Estimation

no code implementations CVPR 2015 Masaki Saito, Takayuki Okatani

Although downsizing MRFs should directly reduce the computational cost, there is no systematic way of doing this, since it is unclear how to obtain the MRF energy for the downsized MRFs and also how to translate the estimates of their marginal distributions to those of the original MRFs.

Detecting Changes in 3D Structure of a Scene from Multi-view Images Captured by a Vehicle-Mounted Camera

no code implementations CVPR 2013 Ken Sakurada, Takayuki Okatani, Koichiro Deguchi

The proposed method is compared with the methods that use multi-view stereo (MVS) to reconstruct the scene structures of the two time points and then differentiate them to detect changes.

Change Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.