Search Results for author: Yuta Nakashima

Found 64 papers, 30 papers with code

No Annotations for Object Detection in Art through Stable Diffusion

1 code implementation9 Dec 2024 Patrick Ramos, Nicolas Gonthier, Selina Khan, Yuta Nakashima, Noa Garcia

Object detection in art is a valuable tool for the digital humanities, as it allows for faster identification of objects in artistic and historical images compared to humans.

Object object-detection +1

VASCAR: Content-Aware Layout Generation via Visual-Aware Self-Correction

no code implementations5 Dec 2024 Jiahao Zhang, Ryota Yoshihashi, Shunsuke Kitada, Atsuki Osanai, Yuta Nakashima

To answer this, we propose Visual-Aware Self-Correction LAyout GeneRation (VASCAR) for LVLM-based content-aware layout generation.

ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training

no code implementations14 Oct 2024 Zhouqiang Jiang, Bowen Wang, JunHao Chen, Yuta Nakashima

Recent approaches for visually-rich document understanding (VrDU) uses manually annotated semantic groups, where a semantic group encompasses all semantically relevant but not obviously grouped words.

document understanding Optical Character Recognition (OCR)

Putting People in LLMs' Shoes: Generating Better Answers via Question Rewriter

1 code implementation20 Aug 2024 JunHao Chen, Bowen Wang, Zhouqiang Jiang, Yuta Nakashima

By enhancing the intelligibility of human questions for black-box LLMs, our question rewriter improves the quality of generated answers.

Long Form Question Answering

SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP

no code implementations19 Aug 2024 Yusuke Hirota, Min-Hung Chen, Chien-Yi Wang, Yuta Nakashima, Yu-Chiang Frank Wang, Ryo Hachiuma

To mitigate societal bias in CLIP and overcome these limitations simultaneously, we introduce a simple-yet-effective debiasing method called SANER (societal attribute neutralizer) that eliminates attribute information from CLIP text features only of attribute-neutral descriptions.

Attribute Text-to-Image Generation

DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models

1 code implementation4 Aug 2024 Bowen Wang, Jiuyang Chang, Yiming Qian, Guoxin Chen, JunHao Chen, Zhouqiang Jiang, Jiahao Zhang, Yuta Nakashima, Hajime Nagahara

Large language models (LLMs) have recently showcased remarkable capabilities, spanning a wide range of tasks and applications, including those in the medical domain.

Question Answering

Explainable Image Recognition via Enhanced Slot-attention Based Classifier

no code implementations8 Jul 2024 Bowen Wang, Liangzhi Li, Jiahao Zhang, Yuta Nakashima, Hajime Nagahara

A novel loss function specifically for ESCOUTER is designed to fine-tune the model's behavior, enabling it to toggle between positive and negative explanations.

Explainable artificial intelligence Explainable Artificial Intelligence (XAI)

Enhancing Fake News Detection in Social Media via Label Propagation on Cross-modal Tweet Graph

1 code implementation14 Jun 2024 Wanqing Zhao, Yuta Nakashima, Haiyuan Chen, Noboru Babaguchi

Our method consistently improves the performance over the state-of-the-art methods on all benchmark datasets and effectively demonstrates its aptitude for generalizing fake news detection in social media.

Domain Generalization Fake News Detection

Stable Diffusion Exposed: Gender Bias from Prompt to Image

no code implementations5 Dec 2023 Yankun Wu, Yuta Nakashima, Noa Garcia

Several studies have raised awareness about social biases in image generative models, demonstrating their predisposition towards stereotypes and imbalances.

Image Generation

Instruct Me More! Random Prompting for Visual In-Context Learning

1 code implementation7 Nov 2023 Jiahao Zhang, Bowen Wang, Liangzhi Li, Yuta Nakashima, Hajime Nagahara

Our findings suggest that InMeMo offers a versatile and efficient way to enhance the performance of visual ICL with lightweight training.

Foreground Segmentation In-Context Learning +2

Learning Bottleneck Concepts in Image Classification

1 code implementation CVPR 2023 Bowen Wang, Liangzhi Li, Yuta Nakashima, Hajime Nagahara

Using some image classification tasks as our testbed, we demonstrate BotCL's potential to rebuild neural networks for better interpretability.

Classification Image Classification

Model-Agnostic Gender Debiased Image Captioning

1 code implementation CVPR 2023 Yusuke Hirota, Yuta Nakashima, Noa Garcia

From this observation, we hypothesize that there are two types of gender bias affecting image captioning models: 1) bias that exploits context to predict gender, and 2) bias in the probability of generating certain (often stereotypical) words because of gender.

Image Captioning model

Uncurated Image-Text Datasets: Shedding Light on Demographic Bias

1 code implementation CVPR 2023 Noa Garcia, Yusuke Hirota, Yankun Wu, Yuta Nakashima

The increasing tendency to collect large and uncurated datasets to train vision-and-language models has raised concerns about fair representations.

Image Captioning Text-to-Image Generation

Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation

no code implementations CVPR 2023 Mayu Otani, Riku Togashi, Yu Sawai, Ryosuke Ishigami, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Shin'ichi Satoh

Human evaluation is critical for validating the performance of text-to-image generative models, as this highly cognitive process requires deep comprehension of text and images.

Text-to-Image Generation

Inference Time Evidences of Adversarial Attacks for Forensic on Transformers

no code implementations31 Jan 2023 Hugo Lemarchant, Liangzi Li, Yiming Qian, Yuta Nakashima, Hajime Nagahara

Vision Transformers (ViTs) are becoming a very popular paradigm for vision tasks as they achieve state-of-the-art performance on image classification.

Image Classification

Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization

1 code implementation18 Nov 2022 Zongshang Pang, Yuta Nakashima, Mayu Otani, Hajime Nagahara

Video summarization aims to select the most informative subset of frames in a video to facilitate efficient video browsing.

Diversity Image Classification +2

Gender and Racial Bias in Visual Question Answering Datasets

no code implementations17 May 2022 Yusuke Hirota, Yuta Nakashima, Noa Garcia

Our findings suggest that there are dangers associated to using VQA datasets without considering and dealing with the potentially harmful stereotypes.

Question Answering Visual Question Answering

AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval

no code implementations CVPR 2022 Riku Togashi, Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkila, Tetsuya Sakai

First, it is rank-insensitive: It ignores the rank positions of successfully localised moments in the top-$K$ ranked list by treating the list as a set.

Moment Retrieval Retrieval

Built Year Prediction from Buddha Face with Heterogeneous Labels

no code implementations2 Sep 2021 Yiming Qian, Cheikh Brahim El Vaigh, Yuta Nakashima, Benjamin Renoust, Hajime Nagahara, Yutaka Fujioka

Buddha statues are a part of human culture, especially of the Asia area, and they have been alongside human civilisation for more than 2, 000 years.

Cultural Vocal Bursts Intensity Prediction Prediction

Attending Self-Attention: A Case Study of Visually Grounded Supervision in Vision-and-Language Transformers

no code implementations ACL 2021 Jules Samaran, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima

The impressive performances of pre-trained visually grounded language models have motivated a growing body of research investigating what has been learned during the pre-training.

Language Modeling Language Modelling +1

A Picture May Be Worth a Hundred Words for Visual Question Answering

no code implementations25 Jun 2021 Yusuke Hirota, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima, Ittetsu Taniguchi, Takao Onoye

This paper delves into the effectiveness of textual representations for image understanding in the specific context of VQA.

Data Augmentation Descriptive +3

WRIME: A New Dataset for Emotional Intensity Estimation with Subjective and Objective Annotations

1 code implementation NAACL 2021 Tomoyuki Kajiwara, Chenhui Chu, Noriko Takemura, Yuta Nakashima, Hajime Nagahara

We annotate 17, 000 SNS posts with both the writer{'}s subjective emotional intensity and the reader{'}s objective one to construct a Japanese emotion analysis dataset.

Emotion Recognition

Development of a Vertex Finding Algorithm using Recurrent Neural Network

no code implementations28 Jan 2021 Kiichi Goto, Taikan Suehara, Tamaki Yoshioka, Masakazu Kurata, Hajime Nagahara, Yuta Nakashima, Noriko Takemura, Masako Iwasaki

Deep learning is a rapidly-evolving technology with possibility to significantly improve physics reach of collider experiments.

Decoder

Understanding the Role of Scene Graphs in Visual Question Answering

no code implementations14 Jan 2021 Vinay Damodaran, Sharanya Chakravarthy, Akshay Kumar, Anjana Umapathy, Teruko Mitamura, Yuta Nakashima, Noa Garcia, Chenhui Chu

Visual Question Answering (VQA) is of tremendous interest to the research community with important applications such as aiding visually impaired users and image-based search.

Graph Generation Question Answering +2

Match Them Up: Visually Explainable Few-shot Image Classification

1 code implementation25 Nov 2020 Bowen Wang, Liangzhi Li, Manisha Verma, Yuta Nakashima, Ryo Kawasaki, Hajime Nagahara

Few-shot learning (FSL) approaches are usually based on an assumption that the pre-trained knowledge can be obtained from base (seen) categories and can be well transferred to novel (unseen) categories.

Classification Few-Shot Image Classification +2

Demographic Influences on Contemporary Art with Unsupervised Style Embeddings

no code implementations30 Sep 2020 Nikolai Huckle, Noa Garcia, Yuta Nakashima

Art produced today, on the other hand, is numerous and easily accessible, through the internet and social networks that are used by professional and amateur artists alike to display their work.

Art Analysis

Uncovering Hidden Challenges in Query-Based Video Moment Retrieval

1 code implementation1 Sep 2020 Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä

In this paper, we present a series of experiments assessing how well the benchmark results reflect the true progress in solving the moment retrieval task.

Moment Retrieval Retrieval +2

Depthwise Spatio-Temporal STFT Convolutional Neural Networks for Human Action Recognition

no code implementations22 Jul 2020 Sudhakar Kumawat, Manisha Verma, Yuta Nakashima, Shanmuganathan Raman

To address these issues, we propose spatio-temporal short term Fourier transform (STFT) blocks, a new class of convolutional blocks that can serve as an alternative to the 3D convolutional layer and its variants in 3D CNNs.

Action Recognition Temporal Action Localization

Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions

1 code implementation ECCV 2020 Noa Garcia, Yuta Nakashima

To understand movies, humans constantly reason over the dialogues and actions shown in specific scenes and relate them to the overall storyline already seen.

Question Answering Video Question Answering +1

Constructing a Public Meeting Corpus

no code implementations LREC 2020 Koji Tanaka, Chenhui Chu, Haolin Ren, Benjamin Renoust, Yuta Nakashima, Noriko Takemura, Hajime Nagahara, Takao Fujikawa

In this paper, we propose a full pipeline of analysis of a large corpus about a century of public meeting in historical Australian news papers, from construction to visual exploration.

Optical Character Recognition (OCR)

Yoga-82: A New Dataset for Fine-grained Classification of Human Poses

1 code implementation22 Apr 2020 Manisha Verma, Sudhakar Kumawat, Yuta Nakashima, Shanmuganathan Raman

To handle more variety in human poses, we propose the concept of fine-grained hierarchical pose classification, in which we formulate the pose estimation as a classification task, and propose a dataset, Yoga-82, for large-scale yoga pose recognition with 82 classes.

Diversity General Classification +1

Knowledge-Based Visual Question Answering in Videos

no code implementations17 Apr 2020 Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima

We propose a novel video understanding task by fusing knowledge-based and video question answering.

Question Answering Video Question Answering +2

BUDA.ART: A Multimodal Content-Based Analysis and Retrieval System for Buddha Statues

no code implementations17 Sep 2019 Benjamin Renoust, Matheus Oliveira Franca, Jacob Chan, Van Le, Ayaka Uesaka, Yuta Nakashima, Hajime Nagahara, Jueren Wang, Yutaka Fujioka

We introduce BUDA. ART, a system designed to assist researchers in Art History, to explore and analyze an archive of pictures of Buddha statues.

Retrieval

Understanding Art through Multi-Modal Retrieval in Paintings

no code implementations24 Apr 2019 Noa Garcia, Benjamin Renoust, Yuta Nakashima

In computer vision, visual arts are often studied from a purely aesthetics perspective, mostly by analysing the visual appearance of an artistic reproduction to infer its style, its author, or its representative features.

Art Analysis Retrieval

Context-Aware Embeddings for Automatic Art Analysis

1 code implementation10 Apr 2019 Noa Garcia, Benjamin Renoust, Yuta Nakashima

Whereas visual representations are able to capture information about the content and the style of an artwork, our proposed context-aware embeddings additionally encode relationships between different artistic attributes, such as author, school, or historical period.

Art Analysis Cross-Modal Retrieval +3

Rethinking the Evaluation of Video Summaries

2 code implementations CVPR 2019 Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä

Video summarization is a technique to create a short skim of the original video while preserving the main stories/content.

Video Segmentation Video Semantic Segmentation +1

Representing a Partially Observed Non-Rigid 3D Human Using Eigen-Texture and Eigen-Deformation

no code implementations7 Jul 2018 Ryosuke Kimura, Akihiko Sayo, Fabian Lorenzo Dayrit, Yuta Nakashima, Hiroshi Kawasaki, Ambrosio Blanco, Katsushi Ikeuchi

For full-body reconstruction with loose clothes, we propose to use lower dimensional embeddings of texture and deformation referred to as eigen-texturing and eigen-deformation, to reproduce views of even unobserved surfaces.

iParaphrasing: Extracting Visually Grounded Paraphrases via an Image

1 code implementation COLING 2018 Chenhui Chu, Mayu Otani, Yuta Nakashima

These extracted VGPs have the potential to improve language and image multimodal tasks such as visual question answering and image captioning.

Image Captioning Question Answering +1

Summarization of User-Generated Sports Video by Using Deep Action Recognition Features

no code implementations25 Sep 2017 Antonio Tejero-de-Pablos, Yuta Nakashima, Tomokazu Sato, Naokazu Yokoya, Marko Linna, Esa Rahtu

The labels are provided by annotators possessing different experience with respect to Kendo to demonstrate how the proposed method adapts to different needs.

Action Recognition Temporal Action Localization +1

Video Summarization using Deep Semantic Features

2 code implementations28 Sep 2016 Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya

For this, we design a deep neural network that maps videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of videos and descriptions.

Clustering Video Summarization

Learning Joint Representations of Videos and Sentences with Web Image Search

no code implementations8 Aug 2016 Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya

In description generation, the performance level is comparable to the current state-of-the-art, although our embeddings were trained for the retrieval tasks.

Image Retrieval Natural Language Queries +6

Cannot find the paper you are looking for? You can Submit a new open access paper.