Search Results for author: Yuta Nakashima

Found 54 papers, 26 papers with code

Would Deep Generative Models Amplify Bias in Future Models?

no code implementations4 Apr 2024 Tianwei Chen, Yusuke Hirota, Mayu Otani, Noa Garcia, Yuta Nakashima

We investigate the impact of deep generative models on potential social biases in upcoming computer vision models.

Image Captioning Image Generation

Stable Diffusion Exposed: Gender Bias from Prompt to Image

no code implementations5 Dec 2023 Yankun Wu, Yuta Nakashima, Noa Garcia

Recent studies have highlighted biases in generative models, shedding light on their predisposition towards gender-based stereotypes and imbalances.

Instruct Me More! Random Prompting for Visual In-Context Learning

1 code implementation7 Nov 2023 Jiahao Zhang, Bowen Wang, Liangzhi Li, Yuta Nakashima, Hajime Nagahara

Our findings suggest that InMeMo offers a versatile and efficient way to enhance the performance of visual ICL with lightweight training.

Foreground Segmentation In-Context Learning +2

Learning Bottleneck Concepts in Image Classification

1 code implementation CVPR 2023 Bowen Wang, Liangzhi Li, Yuta Nakashima, Hajime Nagahara

Using some image classification tasks as our testbed, we demonstrate BotCL's potential to rebuild neural networks for better interpretability.

Classification Image Classification

Model-Agnostic Gender Debiased Image Captioning

1 code implementation CVPR 2023 Yusuke Hirota, Yuta Nakashima, Noa Garcia

From this observation, we hypothesize that there are two types of gender bias affecting image captioning models: 1) bias that exploits context to predict gender, and 2) bias in the probability of generating certain (often stereotypical) words because of gender.

Image Captioning

Uncurated Image-Text Datasets: Shedding Light on Demographic Bias

1 code implementation CVPR 2023 Noa Garcia, Yusuke Hirota, Yankun Wu, Yuta Nakashima

The increasing tendency to collect large and uncurated datasets to train vision-and-language models has raised concerns about fair representations.

Image Captioning Text-to-Image Generation

Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation

no code implementations CVPR 2023 Mayu Otani, Riku Togashi, Yu Sawai, Ryosuke Ishigami, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Shin'ichi Satoh

Human evaluation is critical for validating the performance of text-to-image generative models, as this highly cognitive process requires deep comprehension of text and images.

Text-to-Image Generation

Inference Time Evidences of Adversarial Attacks for Forensic on Transformers

no code implementations31 Jan 2023 Hugo Lemarchant, Liangzi Li, Yiming Qian, Yuta Nakashima, Hajime Nagahara

Vision Transformers (ViTs) are becoming a very popular paradigm for vision tasks as they achieve state-of-the-art performance on image classification.

Image Classification

Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization

1 code implementation18 Nov 2022 Zongshang Pang, Yuta Nakashima, Mayu Otani, Hajime Nagahara

Video summarization aims to select the most informative subset of frames in a video to facilitate efficient video browsing.

Image Classification Representation Learning +1

Gender and Racial Bias in Visual Question Answering Datasets

no code implementations17 May 2022 Yusuke Hirota, Yuta Nakashima, Noa Garcia

Our findings suggest that there are dangers associated to using VQA datasets without considering and dealing with the potentially harmful stereotypes.

Question Answering Visual Question Answering

AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval

no code implementations CVPR 2022 Riku Togashi, Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkila, Tetsuya Sakai

First, it is rank-insensitive: It ignores the rank positions of successfully localised moments in the top-$K$ ranked list by treating the list as a set.

Moment Retrieval Retrieval

Built Year Prediction from Buddha Face with Heterogeneous Labels

no code implementations2 Sep 2021 Yiming Qian, Cheikh Brahim El Vaigh, Yuta Nakashima, Benjamin Renoust, Hajime Nagahara, Yutaka Fujioka

Buddha statues are a part of human culture, especially of the Asia area, and they have been alongside human civilisation for more than 2, 000 years.

Cultural Vocal Bursts Intensity Prediction

Attending Self-Attention: A Case Study of Visually Grounded Supervision in Vision-and-Language Transformers

no code implementations ACL 2021 Jules Samaran, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima

The impressive performances of pre-trained visually grounded language models have motivated a growing body of research investigating what has been learned during the pre-training.

Language Modelling Visual Grounding

A Picture May Be Worth a Hundred Words for Visual Question Answering

no code implementations25 Jun 2021 Yusuke Hirota, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima, Ittetsu Taniguchi, Takao Onoye

This paper delves into the effectiveness of textual representations for image understanding in the specific context of VQA.

Data Augmentation Descriptive +2

WRIME: A New Dataset for Emotional Intensity Estimation with Subjective and Objective Annotations

1 code implementation NAACL 2021 Tomoyuki Kajiwara, Chenhui Chu, Noriko Takemura, Yuta Nakashima, Hajime Nagahara

We annotate 17, 000 SNS posts with both the writer{'}s subjective emotional intensity and the reader{'}s objective one to construct a Japanese emotion analysis dataset.

Emotion Recognition

Development of a Vertex Finding Algorithm using Recurrent Neural Network

no code implementations28 Jan 2021 Kiichi Goto, Taikan Suehara, Tamaki Yoshioka, Masakazu Kurata, Hajime Nagahara, Yuta Nakashima, Noriko Takemura, Masako Iwasaki

Deep learning is a rapidly-evolving technology with possibility to significantly improve physics reach of collider experiments.

Understanding the Role of Scene Graphs in Visual Question Answering

no code implementations14 Jan 2021 Vinay Damodaran, Sharanya Chakravarthy, Akshay Kumar, Anjana Umapathy, Teruko Mitamura, Yuta Nakashima, Noa Garcia, Chenhui Chu

Visual Question Answering (VQA) is of tremendous interest to the research community with important applications such as aiding visually impaired users and image-based search.

Graph Generation Question Answering +2

Match Them Up: Visually Explainable Few-shot Image Classification

1 code implementation25 Nov 2020 Bowen Wang, Liangzhi Li, Manisha Verma, Yuta Nakashima, Ryo Kawasaki, Hajime Nagahara

Few-shot learning (FSL) approaches are usually based on an assumption that the pre-trained knowledge can be obtained from base (seen) categories and can be well transferred to novel (unseen) categories.

Classification Few-Shot Image Classification +2

Demographic Influences on Contemporary Art with Unsupervised Style Embeddings

no code implementations30 Sep 2020 Nikolai Huckle, Noa Garcia, Yuta Nakashima

Art produced today, on the other hand, is numerous and easily accessible, through the internet and social networks that are used by professional and amateur artists alike to display their work.

Art Analysis

Uncovering Hidden Challenges in Query-Based Video Moment Retrieval

1 code implementation1 Sep 2020 Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä

In this paper, we present a series of experiments assessing how well the benchmark results reflect the true progress in solving the moment retrieval task.

Moment Retrieval Retrieval +2

Depthwise Spatio-Temporal STFT Convolutional Neural Networks for Human Action Recognition

no code implementations22 Jul 2020 Sudhakar Kumawat, Manisha Verma, Yuta Nakashima, Shanmuganathan Raman

To address these issues, we propose spatio-temporal short term Fourier transform (STFT) blocks, a new class of convolutional blocks that can serve as an alternative to the 3D convolutional layer and its variants in 3D CNNs.

Action Recognition Temporal Action Localization

Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions

1 code implementation ECCV 2020 Noa Garcia, Yuta Nakashima

To understand movies, humans constantly reason over the dialogues and actions shown in specific scenes and relate them to the overall storyline already seen.

Question Answering Video Question Answering +1

Constructing a Public Meeting Corpus

no code implementations LREC 2020 Koji Tanaka, Chenhui Chu, Haolin Ren, Benjamin Renoust, Yuta Nakashima, Noriko Takemura, Hajime Nagahara, Takao Fujikawa

In this paper, we propose a full pipeline of analysis of a large corpus about a century of public meeting in historical Australian news papers, from construction to visual exploration.

Optical Character Recognition (OCR)

Yoga-82: A New Dataset for Fine-grained Classification of Human Poses

1 code implementation22 Apr 2020 Manisha Verma, Sudhakar Kumawat, Yuta Nakashima, Shanmuganathan Raman

To handle more variety in human poses, we propose the concept of fine-grained hierarchical pose classification, in which we formulate the pose estimation as a classification task, and propose a dataset, Yoga-82, for large-scale yoga pose recognition with 82 classes.

General Classification Pose Estimation

Knowledge-Based Visual Question Answering in Videos

no code implementations17 Apr 2020 Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima

We propose a novel video understanding task by fusing knowledge-based and video question answering.

Question Answering Video Question Answering +2

BUDA.ART: A Multimodal Content-Based Analysis and Retrieval System for Buddha Statues

no code implementations17 Sep 2019 Benjamin Renoust, Matheus Oliveira Franca, Jacob Chan, Van Le, Ayaka Uesaka, Yuta Nakashima, Hajime Nagahara, Jueren Wang, Yutaka Fujioka

We introduce BUDA. ART, a system designed to assist researchers in Art History, to explore and analyze an archive of pictures of Buddha statues.

Retrieval

Understanding Art through Multi-Modal Retrieval in Paintings

no code implementations24 Apr 2019 Noa Garcia, Benjamin Renoust, Yuta Nakashima

In computer vision, visual arts are often studied from a purely aesthetics perspective, mostly by analysing the visual appearance of an artistic reproduction to infer its style, its author, or its representative features.

Art Analysis Retrieval

Context-Aware Embeddings for Automatic Art Analysis

1 code implementation10 Apr 2019 Noa Garcia, Benjamin Renoust, Yuta Nakashima

Whereas visual representations are able to capture information about the content and the style of an artwork, our proposed context-aware embeddings additionally encode relationships between different artistic attributes, such as author, school, or historical period.

Art Analysis Cross-Modal Retrieval +3

Rethinking the Evaluation of Video Summaries

2 code implementations CVPR 2019 Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä

Video summarization is a technique to create a short skim of the original video while preserving the main stories/content.

Video Segmentation Video Semantic Segmentation +1

Representing a Partially Observed Non-Rigid 3D Human Using Eigen-Texture and Eigen-Deformation

no code implementations7 Jul 2018 Ryosuke Kimura, Akihiko Sayo, Fabian Lorenzo Dayrit, Yuta Nakashima, Hiroshi Kawasaki, Ambrosio Blanco, Katsushi Ikeuchi

For full-body reconstruction with loose clothes, we propose to use lower dimensional embeddings of texture and deformation referred to as eigen-texturing and eigen-deformation, to reproduce views of even unobserved surfaces.

iParaphrasing: Extracting Visually Grounded Paraphrases via an Image

1 code implementation COLING 2018 Chenhui Chu, Mayu Otani, Yuta Nakashima

These extracted VGPs have the potential to improve language and image multimodal tasks such as visual question answering and image captioning.

Image Captioning Question Answering +1

Summarization of User-Generated Sports Video by Using Deep Action Recognition Features

no code implementations25 Sep 2017 Antonio Tejero-de-Pablos, Yuta Nakashima, Tomokazu Sato, Naokazu Yokoya, Marko Linna, Esa Rahtu

The labels are provided by annotators possessing different experience with respect to Kendo to demonstrate how the proposed method adapts to different needs.

Action Recognition Temporal Action Localization +1

Video Summarization using Deep Semantic Features

2 code implementations28 Sep 2016 Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya

For this, we design a deep neural network that maps videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of videos and descriptions.

Clustering Video Summarization

Learning Joint Representations of Videos and Sentences with Web Image Search

no code implementations8 Aug 2016 Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya

In description generation, the performance level is comparable to the current state-of-the-art, although our embeddings were trained for the retrieval tasks.

Image Retrieval Natural Language Queries +5

Cannot find the paper you are looking for? You can Submit a new open access paper.