Search Results for author: Yoichi Sato

Found 59 papers, 11 papers with code

Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation

1 code implementation7 Mar 2024 Ruicong Liu, Takehiko Ohkawa, Mingfang Zhang, Yoichi Sato

These two stereo constraints are used in a complementary manner to generate pseudo-labels, allowing reliable adaptation.

3D Hand Pose Estimation

FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotation

no code implementations1 Feb 2024 Takuma Yagi, Misaki Ohashi, Yifei HUANG, Ryosuke Furuta, Shungo Adachi, Toutai Mitsuyama, Yoichi Sato

The dataset consists of multi-view videos of 32 participants performing mock biological experiments with a total duration of 14. 5 hours.

Object object-detection +1

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

no code implementations30 Nov 2023 Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei HUANG, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge.

Video Understanding

Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos

no code implementations28 Nov 2023 Takehiko Ohkawa, Takuma Yagi, Taichi Nishimura, Ryosuke Furuta, Atsushi Hashimoto, Yoshitaka Ushiku, Yoichi Sato

We propose a novel benchmark for cross-view knowledge transfer of dense video captioning, adapting models from web instructional videos with exocentric views to an egocentric view.

Dense Video Captioning Transfer Learning

Seeking Flat Minima with Mean Teacher on Semi- and Weakly-Supervised Domain Generalization for Object Detection

no code implementations30 Oct 2023 Ryosuke Furuta, Yoichi Sato

In contrast to the conventional domain generalization for object detection that requires labeled data from multiple domains, SS-DGOD and WS-DGOD require labeled data only from one domain and unlabeled or weakly-labeled data from multiple domains for training.

Domain Generalization Object +3

Image Cropping under Design Constraints

no code implementations13 Oct 2023 Takumi Nishiyasu, Wataru Shimoda, Yoichi Sato

We explore two derived approaches, a proposal-based approach, and a heatmap-based approach, and we construct a dataset for evaluating the performance of the proposed approaches on image cropping under design constraints.

Image Cropping

Proposal-based Temporal Action Localization with Point-level Supervision

no code implementations9 Oct 2023 Yuan Yin, Yifei HUANG, Ryosuke Furuta, Yoichi Sato

Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos where only a single point (frame) within every action instance is annotated in training data.

Action Classification Multiple Instance Learning +1

Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction

no code implementations CVPR 2023 Mingfang Zhang, Jinglu Wang, Xiao Li, Yifei HUANG, Yoichi Sato, Yan Lu

The Multiplane Image (MPI), containing a set of fronto-parallel RGBA layers, is an effective and efficient representation for view synthesis from sparse inputs.

3D Reconstruction

Fine-grained Affordance Annotation for Egocentric Hand-Object Interaction Videos

1 code implementation7 Feb 2023 Zecheng Yu, Yifei HUANG, Ryosuke Furuta, Takuma Yagi, Yusuke Goutsu, Yoichi Sato

Object affordance is an important concept in hand-object interaction, providing information on action possibilities based on human motor capacity and objects' physical property thus benefiting tasks such as action anticipation and robot imitation learning.

Action Anticipation Action Recognition +3

Weakly Supervised Temporal Sentence Grounding With Uncertainty-Guided Self-Training

no code implementations CVPR 2023 Yifei HUANG, Lijin Yang, Yoichi Sato

The task of weakly supervised temporal sentence grounding aims at finding the corresponding temporal moments of a language description in the video, given video-language correspondence only at video-level.

Data Augmentation Sentence +2

DeCo: Decomposition and Reconstruction for Compositional Temporal Grounding via Coarse-To-Fine Contrastive Ranking

no code implementations CVPR 2023 Lijin Yang, Quan Kong, Hsuan-Kung Yang, Wadim Kehl, Yoichi Sato, Norimasa Kobori

Compositional temporal grounding is the task of localizing dense action by using known words combined in novel ways in the form of novel query sentences for the actual grounding.

Boundary Detection Sentence

Surgical Skill Assessment via Video Semantic Aggregation

no code implementations4 Aug 2022 Zhenqiang Li, Lin Gu, Weimin WANG, Ryosuke Nakamura, Yoichi Sato

Automated video-based assessment of surgical skills is a promising task in assisting young surgical trainees, especially in poor-resource areas.

Representation Learning

CompNVS: Novel View Synthesis with Scene Completion

no code implementations23 Jul 2022 Zuoyue Li, Tianxing Fan, Zhenqiang Li, Zhaopeng Cui, Yoichi Sato, Marc Pollefeys, Martin R. Oswald

We introduce a scalable framework for novel view synthesis from RGB-D images with largely incomplete scene coverage.

Novel View Synthesis Scene Understanding

Compound Prototype Matching for Few-shot Action Recognition

no code implementations12 Jul 2022 Yifei HUANG, Lijin Yang, Yoichi Sato

Each global prototype is encouraged to summarize a specific aspect from the entire video, for example, the start/evolution of the action.

Few-Shot action recognition Few Shot Action Recognition +1

Precise Affordance Annotation for Egocentric Action Video Datasets

no code implementations11 Jun 2022 Zecheng Yu, Yifei HUANG, Ryosuke Furuta, Takuma Yagi, Yusuke Goutsu, Yoichi Sato

Object affordance is an important concept in human-object interaction, providing information on action possibilities based on human motor capacity and objects' physical property thus benefiting tasks such as action anticipation and robot imitation learning.

Action Anticipation Affordance Recognition +2

Object Instance Identification in Dynamic Environments

1 code implementation10 Jun 2022 Takuma Yagi, Md Tasnimul Hasan, Yoichi Sato

We study the problem of identifying object instances in a dynamic environment where people interact with the objects.

feature selection Object

Efficient Annotation and Learning for 3D Hand Pose Estimation: A Survey

no code implementations5 Jun 2022 Takehiko Ohkawa, Ryosuke Furuta, Yoichi Sato

In this survey, we present a systematic review of 3D hand pose estimation from the perspective of efficient annotation and learning.

3D Hand Pose Estimation Domain Adaptation +1

Domain Adaptive Hand Keypoint and Pixel Localization in the Wild

no code implementations16 Mar 2022 Takehiko Ohkawa, Yu-Jhe Li, Qichen Fu, Ryosuke Furuta, Kris M. Kitani, Yoichi Sato

We aim to improve the performance of regressing hand keypoints and segmenting pixel-level hand masks under new imaging conditions (e. g., outdoors) when we only have labeled images taken under very different conditions (e. g., indoors).

Domain Adaptation Knowledge Distillation

Background Mixup Data Augmentation for Hand and Object-in-Contact Detection

no code implementations28 Feb 2022 Koya Tango, Takehiko Ohkawa, Ryosuke Furuta, Yoichi Sato

Detecting the positions of human hands and objects-in-contact (hand-object detection) in each video frame is vital for understanding human activities from videos.

Contact Detection Data Augmentation +3

Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips

no code implementations2 Dec 2021 Lijin Yang, Yifei HUANG, Yusuke Sugano, Yoichi Sato

Previous works explored to address this problem by applying temporal attention but failed to consider the global context of the full video, which is critical for determining the relatively significant parts.

Action Recognition Video Understanding

Leveraging Human Selective Attention for Medical Image Analysis with Limited Training Data

no code implementations2 Dec 2021 Yifei HUANG, Xiaoxiao Li, Lijin Yang, Lin Gu, Yingying Zhu, Hirofumi Seo, Qiuming Meng, Tatsuya Harada, Yoichi Sato

Then we design a novel Auxiliary Attention Block (AAB) to allow information from SAN to be utilized by the backbone encoder to focus on selective areas.

Tumor Segmentation

Neural Routing by Memory

no code implementations NeurIPS 2021 Kaipeng Zhang, Zhenqiang Li, Zhifeng Li, Wei Liu, Yoichi Sato

However, they use the same procedure sequence for all inputs, regardless of the intermediate features. This paper proffers a simple yet effective idea of constructing parallel procedures and assigning similar intermediate features to the same specialized procedures in a divide-and-conquer fashion.

Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction

1 code implementation19 Oct 2021 Takuma Yagi, Md Tasnimul Hasan, Yoichi Sato

In this study, we introduce a video-based method for predicting contact between a hand and an object.

Object

Ego4D: Around the World in 3,000 Hours of Egocentric Video

5 code implementations CVPR 2022 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

Spatio-Temporal Perturbations for Video Attribution

1 code implementation1 Sep 2021 Zhenqiang Li, Weimin WANG, Zuoyue Li, Yifei HUANG, Yoichi Sato

The attribution method provides a direction for interpreting opaque neural networks in a visual way by identifying and visualizing the input regions/pixels that dominate the output of a network.

Video Understanding

EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2021: Team M3EM Technical Report

no code implementations18 Jun 2021 Lijin Yang, Yifei HUANG, Yusuke Sugano, Yoichi Sato

In this report, we describe the technical details of our submission to the 2021 EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition.

Action Recognition Unsupervised Domain Adaptation

Towards Visually Explaining Video Understanding Networks with Perturbation

2 code implementations1 May 2020 Zhenqiang Li, Weimin WANG, Zuoyue Li, Yifei HUANG, Yoichi Sato

''Making black box models explainable'' is a vital problem that accompanies the development of deep learning networks.

Video Understanding

Manipulation-skill Assessment from Videos with Spatial Attention Network

no code implementations9 Jan 2019 Zhenqiang Li, Yifei Huang, Minjie Cai, Yoichi Sato

Recent advances in computer vision have made it possible to automatically assess from videos the manipulation skills of humans in performing a task, which breeds many important applications in domains such as health rehabilitation and manufacturing.

Mutual Context Network for Jointly Estimating Egocentric Gaze and Actions

no code implementations7 Jan 2019 Yifei Huang, Zhenqiang Li, Minjie Cai, Yoichi Sato

In this work, we address two coupled tasks of gaze prediction and action recognition in egocentric videos by exploring their mutual context.

Action Recognition Gaze Prediction +1

Understanding hand-object manipulation by modeling the contextual relationship between actions, grasp types and object attributes

no code implementations22 Jul 2018 Minjie Cai, Kris Kitani, Yoichi Sato

In the proposed model, we explore various semantic relationships between actions, grasp types and object attributes, and show how the context can be used to boost the recognition of each component.

Object

Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition

2 code implementations ECCV 2018 Yifei Huang, Minjie Cai, Zhenqiang Li, Yoichi Sato

We present a new computational model for gaze prediction in egocentric videos by exploring patterns in temporal shift of gaze fixations (attention transition) that are dependent on egocentric manipulation tasks.

Gaze Prediction Saliency Prediction

Future Person Localization in First-Person Videos

1 code implementation CVPR 2018 Takuma Yagi, Karttikeya Mangalam, Ryo Yonetani, Yoichi Sato

We present a new task that predicts future locations of people observed in first-person videos.

From RGB to Spectrum for Natural Scenes via Manifold-Based Mapping

no code implementations ICCV 2017 Yan Jia, Yinqiang Zheng, Lin Gu, Art Subpa-Asa, Antony Lam, Yoichi Sato, Imari Sato

Spectral analysis of natural scenes can provide much more detailed information about the scene than an ordinary RGB camera.

Dimensionality Reduction

Fast Multi-frame Stereo Scene Flow with Motion Segmentation

no code implementations CVPR 2017 Tatsunori Taniai, Sudipta N. Sinha, Yoichi Sato

This unified framework benefits all four tasks - stereo, optical flow, visual odometry and motion segmentation leading to overall higher accuracy and efficiency.

Motion Segmentation Optical Flow Estimation +3

Hierarchical Gaussian Descriptors with Application to Person Re-Identification

no code implementations14 Jun 2017 Tetsu Matsukawa, Takahiro Okabe, Einoshin Suzuki, Yoichi Sato

To solve this problem, we describe a local region in an image via hierarchical Gaussian distribution in which both means and covariances are included in their parameters.

Image Classification Person Re-Identification

Ego-Surfing: Person Localization in First-Person Videos Using Ego-Motion Signatures

no code implementations15 Jun 2016 Ryo Yonetani, Kris M. Kitani, Yoichi Sato

We envision a future time when wearable cameras are worn by the masses and recording first-person point-of-view videos of everyday life.

Clustering Retrieval +1

Exploiting Spectral-Spatial Correlation for Coded Hyperspectral Image Restoration

no code implementations CVPR 2016 Ying Fu, Yinqiang Zheng, Imari Sato, Yoichi Sato

In this paper, we propose an effective method for coded hyperspectral image restoration, which exploits extensive structure sparsity in the hyperspectral image.

Image Restoration

Recognizing Micro-Actions and Reactions From Paired Egocentric Videos

no code implementations CVPR 2016 Ryo Yonetani, Kris M. Kitani, Yoichi Sato

We aim to understand the dynamics of social interactions between two people by recognizing their actions and reactions using a head-mounted camera.

Video Summarization

Hierarchical Gaussian Descriptor for Person Re-Identification

no code implementations CVPR 2016 Tetsu Matsukawa, Takahiro Okabe, Einoshin Suzuki, Yoichi Sato

In both steps, unlike the hierarchical covariance descriptor, the proposed descriptor can model both the mean and the covariance information of pixel features properly.

Image Classification Person Re-Identification

Continuous 3D Label Stereo Matching using Local Expansion Moves

2 code implementations28 Mar 2016 Tatsunori Taniai, Yasuyuki Matsushita, Yoichi Sato, Takeshi Naemura

The local expansion moves extend traditional expansion moves by two ways: localization and spatial propagation.

Patch Matching Stereo Matching +1

Separating Fluorescent and Reflective Components by Using a Single Hyperspectral Image

no code implementations ICCV 2015 Yinqiang Zheng, Ying Fu, Antony Lam, Imari Sato, Yoichi Sato

This paper introduces a novel method to separate fluorescent and reflective components in the spectral domain.

Adaptive Spatial-Spectral Dictionary Learning for Hyperspectral Image Denoising

no code implementations ICCV 2015 Ying Fu, Antony Lam, Imari Sato, Yoichi Sato

Hyperspectral imaging is beneficial in a diverse range of applications from diagnostic medicine, to agriculture, to surveillance to name a few.

Dictionary Learning Hyperspectral Image Denoising +1

Uncalibrated Photometric Stereo Based on Elevation Angle Recovery From BRDF Symmetry of Isotropic Materials

no code implementations CVPR 2015 Feng Lu, Imari Sato, Yoichi Sato

This sort of symmetry can be observed in a 1D BRDF slice from a subset of surface normals with the same azimuth angle, and we use it to devise an efficient modeling and solution method to constrain and recover the elevation angles of surface normals accurately.

Ego-Surfing First-Person Videos

no code implementations CVPR 2015 Ryo Yonetani, Kris M. Kitani, Yoichi Sato

We incorporate this feature into our proposed approach that computes the motion correlation over supervoxel hierarchies to localize target instances in observer videos.

Illumination and Reflectance Spectra Separation of a Hyperspectral Image Meets Low-Rank Matrix Factorization

no code implementations CVPR 2015 Yinqiang Zheng, Imari Sato, Yoichi Sato

This paper addresses the illumination and reflectance spectra separation (IRSS) problem of a hyperspectral image captured under general spectral illumination.

Computational Efficiency

Learning-by-Synthesis for Appearance-based 3D Gaze Estimation

no code implementations CVPR 2014 Yusuke Sugano, Yasuyuki Matsushita, Yoichi Sato

Unlike existing appearance-based methods that assume person-specific training data, we use a large amount of cross-subject training data to train a 3D gaze estimator.

3D Reconstruction Gaze Estimation +1

Reflectance and Fluorescent Spectra Recovery based on Fluorescent Chromaticity Invariance under Varying Illumination

no code implementations CVPR 2014 Ying Fu, Antony Lam, Yasuyuki Kobashi, Imari Sato, Takahiro Okabe, Yoichi Sato

We then show that given the spectral reflectance and fluorescent chromaticity, the fluorescence absorption and emission spectra can also be estimated.

3D Reconstruction

Shape-Preserving Half-Projective Warps for Image Stitching

no code implementations CVPR 2014 Che-Han Chang, Yoichi Sato, Yung-Yu Chuang

It provides good alignment accuracy as projective warps while preserving the perspective of individual image as similarity warps.

Image Stitching

Uncalibrated Photometric Stereo for Unknown Isotropic Reflectances

no code implementations CVPR 2013 Feng Lu, Yasuyuki Matsushita, Imari Sato, Takahiro Okabe, Yoichi Sato

We propose an uncalibrated photometric stereo method that works with general and unknown isotropic reflectances.

Cannot find the paper you are looking for? You can Submit a new open access paper.