no code implementations • 4 Apr 2024 • Tianwei Chen, Yusuke Hirota, Mayu Otani, Noa Garcia, Yuta Nakashima
We investigate the impact of deep generative models on potential social biases in upcoming computer vision models.
no code implementations • 27 Mar 2024 • Julian Jorge Andrade Guerreiro, Naoto Inoue, Kento Masui, Mayu Otani, Hideki Nakayama
Finding a suitable layout represents a crucial task for diverse applications in graphic design.
no code implementations • 8 Aug 2023 • Qianru Qiu, Xueting Wang, Mayu Otani
Additionally, it is applicable for another color recommendation task, full palette generation, which generates a complete color palette corresponding to the given text.
no code implementations • CVPR 2023 • Mayu Otani, Riku Togashi, Yu Sawai, Ryosuke Ishigami, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Shin'ichi Satoh
Human evaluation is critical for validating the performance of text-to-image generative models, as this highly cognitive process requires deep comprehension of text and images.
1 code implementation • CVPR 2023 • Naoto Inoue, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi
Creative workflows for generating graphical documents involve complex inter-related tasks, such as aligning elements, choosing appropriate fonts, or employing aesthetically harmonious colors.
1 code implementation • CVPR 2023 • Naoto Inoue, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi
Controllable layout generation aims at synthesizing plausible arrangement of element bounding boxes with optional constraints, such as type or position of a specific element.
1 code implementation • 22 Dec 2022 • Kotaro Kikuchi, Naoto Inoue, Mayu Otani, Edgar Simo-Serra, Kota Yamaguchi
The web page colorization problem is then formalized as a task of estimating plausible color styles for a given web page content with a given hierarchical structure of the elements.
1 code implementation • 18 Nov 2022 • Zongshang Pang, Yuta Nakashima, Mayu Otani, Hajime Nagahara
Video summarization aims to select the most informative subset of frames in a video to facilitate efficient video browsing.
no code implementations • 21 Oct 2022 • Mayu Otani, Yale Song, Yang Wang
With the broad growth of video capturing devices and applications on the web, it is more demanding to provide desired video content for users efficiently.
no code implementations • 22 Sep 2022 • Qianru Qiu, Xueting Wang, Mayu Otani, Yuki Iwazaki
We train the model and build a color recommendation system on a large-scale dataset of vector graphic documents.
no code implementations • 23 Aug 2022 • Tianwei Chen, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima, Hajime Nagahara
Is more data always better to train vision-and-language models?
no code implementations • CVPR 2022 • Yutaro Yamada, Mayu Otani
For object detection and semantic segmentation, we find that a vanilla Swin Transformer, a variant of Vision Transformer tailored for dense prediction tasks, transfers robustness better than Convolutional Neural Networks that are trained to be robust to the corrupted version of ImageNet.
no code implementations • CVPR 2022 • Riku Togashi, Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkila, Tetsuya Sakai
First, it is rank-insensitive: It ignores the rank positions of successfully localised moments in the top-$K$ ranked list by treating the list as a set.
1 code implementation • CVPR 2022 • Mayu Otani, Riku Togashi, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Shin'ichi Satoh
OC-cost computes the cost of correcting detections to ground truths as a measure of accuracy.
no code implementations • 26 Oct 2021 • Tianran Wu, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima, Haruo Takemura
Video question answering (VideoQA) is designed to answer a given question based on a relevant video clip.
1 code implementation • 2 Aug 2021 • Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi
We optimize using the latent space of an off-the-shelf layout generation model, allowing our approach to be complementary to and used with existing layout generation models.
no code implementations • ACL 2021 • Jules Samaran, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima
The impressive performances of pre-trained visually grounded language models have motivated a growing body of research investigating what has been learned during the pre-training.
no code implementations • 25 Jun 2021 • Yusuke Hirota, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima, Ittetsu Taniguchi, Takao Onoye
This paper delves into the effectiveness of textual representations for image understanding in the specific context of VQA.
no code implementations • 11 May 2021 • Riku Togashi, Masahiro Kato, Mayu Otani, Tetsuya Sakai, Shin'ichi Satoh
However, such methods have two main drawbacks particularly in large-scale applications; (1) the pairwise approach is severely inefficient due to the quadratic computational cost; and (2) even recent model-based samplers (e. g. IRGAN) cannot achieve practical efficiency due to the training of an extra model.
no code implementations • 19 Jan 2021 • Riku Togashi, Masahiro Kato, Mayu Otani, Shin'ichi Satoh
Learning from implicit user feedback is challenging as we can only observe positive samples but never access negative ones.
2 code implementations • 10 Nov 2020 • Riku Togashi, Mayu Otani, Shin'ichi Satoh
Solving cold-start problems is indispensable to provide meaningful recommendation results for new users and items.
1 code implementation • 1 Sep 2020 • Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä
In this paper, we present a series of experiments assessing how well the benchmark results reflect the true progress in solving the moment retrieval task.
1 code implementation • 28 Aug 2020 • Noa Garcia, Chentao Ye, Zihua Liu, Qingtao Hu, Mayu Otani, Chenhui Chu, Yuta Nakashima, Teruko Mitamura
Our dataset inherently consists of visual (painting-based) and knowledge (comment-based) questions.
no code implementations • 17 Apr 2020 • Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima
We propose a novel video understanding task by fusing knowledge-based and video question answering.
no code implementations • 23 Oct 2019 • Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima
We propose a novel video understanding task by fusing knowledge-based and video question answering.
2 code implementations • CVPR 2019 • Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä
Video summarization is a technique to create a short skim of the original video while preserving the main stories/content.
1 code implementation • COLING 2018 • Chenhui Chu, Mayu Otani, Yuta Nakashima
These extracted VGPs have the potential to improve language and image multimodal tasks such as visual question answering and image captioning.
2 code implementations • 28 Sep 2016 • Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya
For this, we design a deep neural network that maps videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of videos and descriptions.
no code implementations • 8 Aug 2016 • Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya
In description generation, the performance level is comparable to the current state-of-the-art, although our embeddings were trained for the retrieval tasks.