Search Results for author: Satoshi Tsutsui

Found 24 papers, 12 papers with code

Integrating Clinical Knowledge into Concept Bottleneck Models

1 code implementation9 Jul 2024 Winnie Pang, Xueyi Ke, Satoshi Tsutsui, Bihan Wen

Concept bottleneck models (CBMs), which predict human-interpretable concepts (e. g., nucleus shapes in cell images) before predicting the final output (e. g., cell type), provide insights into the decision-making processes of the model.

Clinical Knowledge Decision Making

Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models

no code implementations20 May 2024 Xiyu Wang, YuFei Wang, Satoshi Tsutsui, Weisi Lin, Bihan Wen, Alex C. Kot

Additionally, to mitigate the character confusion of generated results, we propose EpicEvo, a method that customizes a diffusion-based visual story generation model with a single story featuring the new characters seamlessly integrating them into established character dynamics.

Knowledge Distillation Story Generation +1

Recap: Detecting Deepfake Video with Unpredictable Tampered Traces via Recovering Faces and Mapping Recovered Faces

no code implementations19 Aug 2023 Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou

In the recovering stage, the model focuses on randomly masking regions of interest (ROIs) and reconstructing real faces without unpredictable tampered traces, resulting in a relatively good recovery effect for real faces while a poor recovery effect for fake faces.

DeepFake Detection Face Swapping

Mover: Mask and Recovery based Facial Part Consistency Aware Method for Deepfake Video Detection

no code implementations3 Mar 2023 Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou

Specifically, given a real face image, we first pretrain a masked autoencoder to learn facial part consistency by dividing faces into three parts and randomly masking ROIs, which are then recovered based on the unmasked facial parts.

DeepFake Detection Face Swapping

Benchmarking White Blood Cell Classification Under Domain Shift

1 code implementation3 Mar 2023 Satoshi Tsutsui, Zhengyang Su, Bihan Wen

Recognizing the types of white blood cells (WBCs) in microscopic images of human blood smears is a fundamental task in the fields of pathology and hematology.

Benchmarking Classification

Single-Stage Open-world Instance Segmentation with Cross-task Consistency Regularization

2 code implementations18 Aug 2022 Xizhe Xue, Dongdong Yu, Lingqiao Liu, Yu Liu, Satoshi Tsutsui, Ying Li, Zehuan Yuan, Ping Song, Mike Zheng Shou

Based on the single-stage instance segmentation framework, we propose a regularization model to predict foreground pixels and use its relation to instance segmentation to construct a cross-task consistency loss.

Autonomous Driving Object +3

Action Recognition based on Cross-Situational Action-object Statistics

1 code implementation15 Aug 2022 Satoshi Tsutsui, Xizi Wang, Guangyuan Weng, Yayun Zhang, David Crandall, Chen Yu

We set out to identify properties of training data that lead to action recognition models with greater generalization ability.

Action Recognition Object +1

Novel View Synthesis for High-fidelity Headshot Scenes

1 code implementation31 May 2022 Satoshi Tsutsui, Weijia Mao, Sijing Lin, Yunyi Zhu, Murong Ma, Mike Zheng Shou

Based on these observations, we propose a method to use both NeRF and 3DMM to synthesize a high-fidelity novel view of a scene with a face.

Generative Adversarial Network Novel View Synthesis +1

Reinforcing Generated Images via Meta-learning for One-Shot Fine-Grained Visual Recognition

no code implementations22 Apr 2022 Satoshi Tsutsui, Yanwei Fu, David Crandall

One-shot fine-grained visual recognition often suffers from the problem of having few training examples for new fine-grained classes.

Diversity Fine-Grained Image Classification +4

AVA-AVD: Audio-Visual Speaker Diarization in the Wild

7 code implementations29 Nov 2021 Eric Zhongcong Xu, Zeyang Song, Satoshi Tsutsui, Chao Feng, Mang Ye, Mike Zheng Shou

Audio-visual speaker diarization aims at detecting "who spoke when" using both auditory and visual signals.

Relation Network speaker-diarization +1

How You Move Your Head Tells What You Do: Self-supervised Video Representation Learning with Egocentric Cameras and IMU Sensors

no code implementations4 Oct 2021 Satoshi Tsutsui, Ruta Desai, Karl Ridgeway

We are particularly interested in learning egocentric video representations benefiting from the head-motion generated by users' daily activities, which can be easily obtained from IMU sensors embedded in AR/VR devices.

Representation Learning Self-Supervised Learning

Whose hand is this? Person Identification from Egocentric Hand Gestures

no code implementations17 Nov 2020 Satoshi Tsutsui, Yanwei Fu, David Crandall

But while one's own face is not frequently visible, their hands are: in fact, hands are among the most common objects in one's own field of view.

Gesture Recognition Person Identification

A Computational Model of Early Word Learning from the Infant's Point of View

1 code implementation4 Jun 2020 Satoshi Tsutsui, Arjun Chandrasekaran, Md. Alimoor Reza, David Crandall, Chen Yu

Human infants have the remarkable ability to learn the associations between object names and visual objects from inherently ambiguous experiences.

Active Object Manipulation Facilitates Visual Object Learning: An Egocentric Vision Study

no code implementations4 Jun 2019 Satoshi Tsutsui, Dian Zhi, Md. Alimoor Reza, David Crandall, Chen Yu

Inspired by the remarkable ability of the infant visual learning system, a recent study collected first-person images from children to analyze the `training data' that they receive.

Few-Shot Learning Object

edge2vec: Representation learning using edge semantics for biomedical knowledge discovery

1 code implementation7 Sep 2018 Zheng Gao, Gang Fu, Chunping Ouyang, Satoshi Tsutsui, Xiaozhong Liu, Jeremy Yang, Christopher Gessner, Brian Foote, David Wild, Qi Yu, Ying Ding

We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.

Biomedical Information Retrieval Information Retrieval +3

Minimizing Supervision for Free-space Segmentation

1 code implementation16 Nov 2017 Satoshi Tsutsui, Tommi Kerola, Shunta Saito, David J. Crandall

Our work demonstrates the potential for performing free-space segmentation without tedious and costly manual annotation, which will be important for adapting autonomous driving systems to different types of vehicles and environments

Autonomous Driving Autonomous Navigation +3

Distantly Supervised Road Segmentation

no code implementations21 Aug 2017 Satoshi Tsutsui, Tommi Kerola, Shunta Saito

We present an approach for road segmentation that only requires image-level annotations at training time.

Road Segmentation Segmentation

Using Artificial Tokens to Control Languages for Multilingual Image Caption Generation

1 code implementation20 Jun 2017 Satoshi Tsutsui, David Crandall

Recent work in computer vision has yielded impressive results in automatically describing images with natural language.

Caption Generation

A Data Driven Approach for Compound Figure Separation Using Convolutional Neural Networks

no code implementations15 Mar 2017 Satoshi Tsutsui, David Crandall

CNNs eliminate the need for manually designing features and separation rules, but require a large amount of annotated training data.

Transfer Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.