1 code implementation • 9 Jul 2024 • Winnie Pang, Xueyi Ke, Satoshi Tsutsui, Bihan Wen
Concept bottleneck models (CBMs), which predict human-interpretable concepts (e. g., nucleus shapes in cell images) before predicting the final output (e. g., cell type), provide insights into the decision-making processes of the model.
no code implementations • 20 May 2024 • Xiyu Wang, YuFei Wang, Satoshi Tsutsui, Weisi Lin, Bihan Wen, Alex C. Kot
Additionally, to mitigate the character confusion of generated results, we propose EpicEvo, a method that customizes a diffusion-based visual story generation model with a single story featuring the new characters seamlessly integrating them into established character dynamics.
no code implementations • 24 Jan 2024 • Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou
Deepfake videos are becoming increasingly realistic, showing few tampering traces on facial areasthat vary between frames.
no code implementations • 19 Aug 2023 • Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou
In the recovering stage, the model focuses on randomly masking regions of interest (ROIs) and reconstructing real faces without unpredictable tampered traces, resulting in a relatively good recovery effect for real faces while a poor recovery effect for fake faces.
1 code implementation • NeurIPS 2023 • Satoshi Tsutsui, Winnie Pang, Bihan Wen
We then annotated ten thousand WBC images with these attributes.
no code implementations • 3 Mar 2023 • Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou
Specifically, given a real face image, we first pretrain a masked autoencoder to learn facial part consistency by dividing faces into three parts and randomly masking ROIs, which are then recovered based on the unmasked facial parts.
1 code implementation • 3 Mar 2023 • Satoshi Tsutsui, Zhengyang Su, Bihan Wen
Recognizing the types of white blood cells (WBCs) in microscopic images of human blood smears is a fundamental task in the fields of pathology and hematology.
2 code implementations • 18 Aug 2022 • Xizhe Xue, Dongdong Yu, Lingqiao Liu, Yu Liu, Satoshi Tsutsui, Ying Li, Zehuan Yuan, Ping Song, Mike Zheng Shou
Based on the single-stage instance segmentation framework, we propose a regularization model to predict foreground pixels and use its relation to instance segmentation to construct a cross-task consistency loss.
1 code implementation • 15 Aug 2022 • Satoshi Tsutsui, Xizi Wang, Guangyuan Weng, Yayun Zhang, David Crandall, Chen Yu
We set out to identify properties of training data that lead to action recognition models with greater generalization ability.
1 code implementation • 31 May 2022 • Satoshi Tsutsui, Weijia Mao, Sijing Lin, Yunyi Zhu, Murong Ma, Mike Zheng Shou
Based on these observations, we propose a method to use both NeRF and 3DMM to synthesize a high-fidelity novel view of a scene with a face.
no code implementations • 22 Apr 2022 • Satoshi Tsutsui, Yanwei Fu, David Crandall
One-shot fine-grained visual recognition often suffers from the problem of having few training examples for new fine-grained classes.
7 code implementations • 29 Nov 2021 • Eric Zhongcong Xu, Zeyang Song, Satoshi Tsutsui, Chao Feng, Mang Ye, Mike Zheng Shou
Audio-visual speaker diarization aims at detecting "who spoke when" using both auditory and visual signals.
no code implementations • 4 Oct 2021 • Satoshi Tsutsui, Ruta Desai, Karl Ridgeway
We are particularly interested in learning egocentric video representations benefiting from the head-motion generated by users' daily activities, which can be easily obtained from IMU sensors embedded in AR/VR devices.
no code implementations • 12 Jun 2021 • Satoshi Tsutsui, David Crandall, Chen Yu
We analyze egocentric views of attended objects from infants.
no code implementations • 17 Nov 2020 • Satoshi Tsutsui, Yanwei Fu, David Crandall
But while one's own face is not frequently visible, their hands are: in fact, hands are among the most common objects in one's own field of view.
1 code implementation • 4 Jun 2020 • Satoshi Tsutsui, Arjun Chandrasekaran, Md. Alimoor Reza, David Crandall, Chen Yu
Human infants have the remarkable ability to learn the associations between object names and visual objects from inherently ambiguous experiences.
1 code implementation • NeurIPS 2019 • Satoshi Tsutsui, Yanwei Fu, David Crandall
One-shot fine-grained visual recognition often suffers from the problem of training data scarcity for new fine-grained classes.
Fine-Grained Image Classification Fine-Grained Visual Recognition +2
no code implementations • 4 Jun 2019 • Satoshi Tsutsui, Dian Zhi, Md. Alimoor Reza, David Crandall, Chen Yu
Inspired by the remarkable ability of the infant visual learning system, a recent study collected first-person images from children to analyze the `training data' that they receive.
1 code implementation • 7 Sep 2018 • Zheng Gao, Gang Fu, Chunping Ouyang, Satoshi Tsutsui, Xiaozhong Liu, Jeremy Yang, Christopher Gessner, Brian Foote, David Wild, Qi Yu, Ying Ding
We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.
no code implementations • 1 Jun 2018 • Ting-Ting Liang, Satoshi Tsutsui, Liangcai Gao, Jing-Jing Lu, Mengyan Sun
One of the time-consuming routine work for a radiologist is to discern anatomical structures from tomographic images.
1 code implementation • 16 Nov 2017 • Satoshi Tsutsui, Tommi Kerola, Shunta Saito, David J. Crandall
Our work demonstrates the potential for performing free-space segmentation without tedious and costly manual annotation, which will be important for adapting autonomous driving systems to different types of vehicles and environments
no code implementations • 21 Aug 2017 • Satoshi Tsutsui, Tommi Kerola, Shunta Saito
We present an approach for road segmentation that only requires image-level annotations at training time.
1 code implementation • 20 Jun 2017 • Satoshi Tsutsui, David Crandall
Recent work in computer vision has yielded impressive results in automatically describing images with natural language.
no code implementations • 15 Mar 2017 • Satoshi Tsutsui, David Crandall
CNNs eliminate the need for manually designing features and separation rules, but require a large amount of annotated training data.