Search Results for author: Mayu Otani

Found 29 papers, 12 papers with code

Would Deep Generative Models Amplify Bias in Future Models?

no code implementations • 4 Apr 2024 • Tianwei Chen, Yusuke Hirota, Mayu Otani, Noa Garcia, Yuta Nakashima

We investigate the impact of deep generative models on potential social biases in upcoming computer vision models.

Paper
Add Code

LayoutFlow: Flow Matching for Layout Generation

no code implementations • 27 Mar 2024 • Julian Jorge Andrade Guerreiro, Naoto Inoue, Kento Masui, Mayu Otani, Hideki Nakayama

Finding a suitable layout represents a crucial task for diverse applications in graphic design.

Denoising

Paper
Add Code

Multimodal Color Recommendation in Vector Graphic Documents

no code implementations • 8 Aug 2023 • Qianru Qiu, Xueting Wang, Mayu Otani

Additionally, it is applicable for another color recommendation task, full palette generation, which generates a complete color palette corresponding to the given text.

Paper
Add Code

Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation

no code implementations • CVPR 2023 • Mayu Otani, Riku Togashi, Yu Sawai, Ryosuke Ishigami, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Shin'ichi Satoh

Human evaluation is critical for validating the performance of text-to-image generative models, as this highly cognitive process requires deep comprehension of text and images.

Text-to-Image Generation

Paper
Add Code

Towards Flexible Multi-modal Document Models

1 code implementation • CVPR 2023 • Naoto Inoue, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi

Creative workflows for generating graphical documents involve complex inter-related tasks, such as aligning elements, choosing appropriate fonts, or employing aesthetically harmonious colors.

Multi-Task Learning Position

Paper
Code

LayoutDM: Discrete Diffusion Model for Controllable Layout Generation

1 code implementation • CVPR 2023 • Naoto Inoue, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi

Controllable layout generation aims at synthesizing plausible arrangement of element bounding boxes with optional constraints, such as type or position of a specific element.

Position

177

Paper
Code

Generative Colorization of Structured Mobile Web Pages

1 code implementation • 22 Dec 2022 • Kotaro Kikuchi, Naoto Inoue, Mayu Otani, Edgar Simo-Serra, Kota Yamaguchi

The web page colorization problem is then formalized as a task of estimating plausible color styles for a given web page content with a given hierarchical structure of the elements.

Colorization Efficient Exploration +1

Paper
Code

Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization

1 code implementation • 18 Nov 2022 • Zongshang Pang, Yuta Nakashima, Mayu Otani, Hajime Nagahara

Video summarization aims to select the most informative subset of frames in a video to facilitate efficient video browsing.

Image Classification Representation Learning +1

Paper
Code

Video Summarization Overview

no code implementations • 21 Oct 2022 • Mayu Otani, Yale Song, Yang Wang

With the broad growth of video capturing devices and applications on the web, it is more demanding to provide desired video content for users efficiently.

Video Summarization

Paper
Add Code

Color Recommendation for Vector Graphic Documents based on Multi-Palette Representation

no code implementations • 22 Sep 2022 • Qianru Qiu, Xueting Wang, Mayu Otani, Yuki Iwazaki

We train the model and build a color recommendation system on a large-scale dataset of vector graphic documents.

Paper
Add Code

Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks

no code implementations • 23 Aug 2022 • Tianwei Chen, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima, Hajime Nagahara

Is more data always better to train vision-and-language models?

Paper
Add Code

Does Robustness on ImageNet Transfer to Downstream Tasks?

no code implementations • CVPR 2022 • Yutaro Yamada, Mayu Otani

For object detection and semantic segmentation, we find that a vanilla Swin Transformer, a variant of Vision Transformer tailored for dense prediction tasks, transfers robustness better than Convolutional Neural Networks that are trained to be robust to the corrupted version of ImageNet.

Classification Image Classification +5

Paper
Add Code

AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval

no code implementations • CVPR 2022 • Riku Togashi, Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkila, Tetsuya Sakai

First, it is rank-insensitive: It ignores the rank positions of successfully localised moments in the top-$K$ ranked list by treating the list as a set.

Moment Retrieval Retrieval

Paper
Add Code

Optimal Correction Cost for Object Detection Evaluation

1 code implementation • CVPR 2022 • Mayu Otani, Riku Togashi, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Shin'ichi Satoh

OC-cost computes the cost of correcting detections to ground truths as a measure of accuracy.

Object object-detection +2

Paper
Code

Transferring Domain-Agnostic Knowledge in Video Question Answering

no code implementations • 26 Oct 2021 • Tianran Wu, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima, Haruo Takemura

Video question answering (VideoQA) is designed to answer a given question based on a relevant video clip.

Question Answering Transfer Learning +1

Paper
Add Code

Constrained Graphic Layout Generation via Latent Optimization

1 code implementation • 2 Aug 2021 • Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi

We optimize using the latent space of an off-the-shelf layout generation model, allowing our approach to be complementary to and used with existing layout generation models.

125

Paper
Code

Attending Self-Attention: A Case Study of Visually Grounded Supervision in Vision-and-Language Transformers

no code implementations • ACL 2021 • Jules Samaran, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima

The impressive performances of pre-trained visually grounded language models have motivated a growing body of research investigating what has been learned during the pre-training.

Language Modelling Visual Grounding

Paper
Add Code

A Picture May Be Worth a Hundred Words for Visual Question Answering

no code implementations • 25 Jun 2021 • Yusuke Hirota, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima, Ittetsu Taniguchi, Takao Onoye

This paper delves into the effectiveness of textual representations for image understanding in the specific context of VQA.

Data Augmentation Descriptive +2

Paper
Add Code

Scalable Personalised Item Ranking through Parametric Density Estimation

no code implementations • 11 May 2021 • Riku Togashi, Masahiro Kato, Mayu Otani, Tetsuya Sakai, Shin'ichi Satoh

However, such methods have two main drawbacks particularly in large-scale applications; (1) the pairwise approach is severely inefficient due to the quadratic computational cost; and (2) even recent model-based samplers (e. g. IRGAN) cannot achieve practical efficiency due to the training of an extra model.

Density Estimation Learning-To-Rank

Paper
Add Code

Density-Ratio Based Personalised Ranking from Implicit Feedback

no code implementations • 19 Jan 2021 • Riku Togashi, Masahiro Kato, Mayu Otani, Shin'ichi Satoh

Learning from implicit user feedback is challenging as we can only observe positive samples but never access negative ones.

Density Ratio Estimation

Paper
Add Code

Alleviating Cold-Start Problems in Recommendation through Pseudo-Labelling over Knowledge Graph

2 code implementations • 10 Nov 2020 • Riku Togashi, Mayu Otani, Shin'ichi Satoh

Solving cold-start problems is indispensable to provide meaningful recommendation results for new users and items.

Paper
Code

Uncovering Hidden Challenges in Query-Based Video Moment Retrieval

1 code implementation • 1 Sep 2020 • Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä

In this paper, we present a series of experiments assessing how well the benchmark results reflect the true progress in solving the moment retrieval task.

Moment Retrieval Retrieval +2

Paper
Code

A Dataset and Baselines for Visual Question Answering on Art

1 code implementation • 28 Aug 2020 • Noa Garcia, Chentao Ye, Zihua Liu, Qingtao Hu, Mayu Otani, Chenhui Chu, Yuta Nakashima, Teruko Mitamura

Our dataset inherently consists of visual (painting-based) and knowledge (comment-based) questions.

Question Answering Question Generation +2

Paper
Code

Knowledge-Based Visual Question Answering in Videos

no code implementations • 17 Apr 2020 • Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima

We propose a novel video understanding task by fusing knowledge-based and video question answering.

Question Answering Video Question Answering +2

Paper
Add Code

KnowIT VQA: Answering Knowledge-Based Questions about Videos

no code implementations • 23 Oct 2019 • Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima

We propose a novel video understanding task by fusing knowledge-based and video question answering.

Question Answering Video Question Answering +2

Paper
Add Code

Rethinking the Evaluation of Video Summaries

2 code implementations • CVPR 2019 • Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä

Video summarization is a technique to create a short skim of the original video while preserving the main stories/content.

Video Segmentation Video Semantic Segmentation +1

Paper
Code

iParaphrasing: Extracting Visually Grounded Paraphrases via an Image

1 code implementation • COLING 2018 • Chenhui Chu, Mayu Otani, Yuta Nakashima

These extracted VGPs have the potential to improve language and image multimodal tasks such as visual question answering and image captioning.

Image Captioning Question Answering +1

Paper
Code

Video Summarization using Deep Semantic Features

2 code implementations • 28 Sep 2016 • Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya

For this, we design a deep neural network that maps videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of videos and descriptions.

Clustering Video Summarization

Paper
Code

Learning Joint Representations of Videos and Sentences with Web Image Search

no code implementations • 8 Aug 2016 • Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya

In description generation, the performance level is comparable to the current state-of-the-art, although our embeddings were trained for the retrieval tasks.

Image Retrieval Natural Language Queries +5

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.