no code implementations • 15 Sep 2024 • Nie Lin, Takehiko Ohkawa, Mingfang Zhang, Yifei HUANG, Ryosuke Furuta, Yoichi Sato
Our experiments demonstrate that our method outperforms conventional contrastive learning approaches that produce positive pairs sorely from a single image with data augmentation.
no code implementations • 22 Jul 2024 • Quan Kong, Yuki Kawana, Rajat Saini, Ashutosh Kumar, Jingjing Pan, Ta Gu, Yohei Ozao, Balazs Opra, David C. Anastasiu, Yoichi Sato, Norimasa Kobori
In this paper, we address the challenge of fine-grained video event understanding in traffic scenarios, vital for autonomous driving and safety.
1 code implementation • 10 Jul 2024 • Liangyang Ouyang, Ruicong Liu, Yifei HUANG, Ryosuke Furuta, Yoichi Sato
Experimental results on VISOR dataset reveal that ActionVOS significantly reduces the mis-segmentation of inactive objects, confirming that actions help the ActionVOS model understand objects' involvement.
no code implementations • 9 Jul 2024 • Mingfang Zhang, Yifei HUANG, Ruicong Liu, Yoichi Sato
Compared with visual signals, Inertial Measurement Units (IMUs) placed on human limbs can capture accurate motion signals while being robust to lighting variation and occlusion.
no code implementations • 12 Jun 2024 • Michele Mazzamuto, Antonino Furnari, Yoichi Sato, Giovanni Maria Farinella
We address the challenge of unsupervised mistake detection in egocentric video of skilled human activities through the analysis of gaze signals.
no code implementations • 2 May 2024 • Masatoshi Tateno, Takuma Yagi, Ryosuke Furuta, Yoichi Sato
We further accumulate the derived object states to consider past state contexts to infer current object state pseudo-labels.
2 code implementations • 25 Mar 2024 • Zicong Fan, Takehiko Ohkawa, Linlin Yang, Nie Lin, Zhishan Zhou, Shihao Zhou, Jiajun Liang, Zhong Gao, Xuanyang Zhang, Xue Zhang, Fei Li, Zheng Liu, Feng Lu, Karim Abou Zeid, Bastian Leibe, Jeongwan On, Seungryul Baek, Aditya Prakash, Saurabh Gupta, Kun He, Yoichi Sato, Otmar Hilliges, Hyung Jin Chang, Angela Yao
A holistic 3Dunderstanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation.
2 code implementations • CVPR 2024 • Ruicong Liu, Takehiko Ohkawa, Mingfang Zhang, Yoichi Sato
These two stereo constraints are used in a complementary manner to generate pseudo-labels, allowing reliable adaptation.
no code implementations • 1 Feb 2024 • Takuma Yagi, Misaki Ohashi, Yifei HUANG, Ryosuke Furuta, Shungo Adachi, Toutai Mitsuyama, Yoichi Sato
The dataset consists of multi-view videos of 32 participants performing mock biological experiments with a total duration of 14. 5 hours.
2 code implementations • CVPR 2024 • Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei HUANG, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray
We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge.
no code implementations • 29 Nov 2023 • Yilin Wen, Hao Pan, Takehiko Ohkawa, Lei Yang, Jia Pan, Yoichi Sato, Taku Komura, Wenping Wang
Furthermore, to faithfully model the semantic dependency and different temporal granularity of hand pose and action, we decompose the framework into two cascaded VAE blocks: the first and latter blocks respectively model the short-span poses and long-span action, and are connected by a mid-level feature representing a sub-second series of hand poses.
no code implementations • 28 Nov 2023 • Takehiko Ohkawa, Takuma Yagi, Taichi Nishimura, Ryosuke Furuta, Atsushi Hashimoto, Yoshitaka Ushiku, Yoichi Sato
We propose a novel benchmark for cross-view knowledge transfer of dense video captioning, adapting models from web instructional videos with exocentric views to an egocentric view.
no code implementations • 30 Oct 2023 • Ryosuke Furuta, Yoichi Sato
In contrast to the conventional domain generalization for object detection that requires labeled data from multiple domains, SS-DGOD and WS-DGOD require labeled data only from one domain and unlabeled or weakly-labeled data from multiple domains for training.
no code implementations • 13 Oct 2023 • Takumi Nishiyasu, Wataru Shimoda, Yoichi Sato
We explore two derived approaches, a proposal-based approach, and a heatmap-based approach, and we construct a dataset for evaluating the performance of the proposed approaches on image cropping under design constraints.
no code implementations • 9 Oct 2023 • Yuan Yin, Yifei HUANG, Ryosuke Furuta, Yoichi Sato
Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos where only a single point (frame) within every action instance is annotated in training data.
no code implementations • 11 May 2023 • Aneeq Zia, Kiran Bhattacharyya, Xi Liu, Max Berniker, Ziheng Wang, Rogerio Nespolo, Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa, Bo Liu, David Austin, Yiheng Wang, Michal Futrega, Jean-Francois Puget, Zhenqiang Li, Yoichi Sato, Ryo Fujii, Ryo Hachiuma, Mana Masuda, Hideo Saito, An Wang, Mengya Xu, Mobarakol Islam, Long Bai, Winnie Pang, Hongliang Ren, Chinedu Nwoye, Luca Sestini, Nicolas Padoy, Maximilian Nielsen, Samuel Schüttler, Thilo Sentker, Hümeyra Husseini, Ivo Baltruschat, Rüdiger Schmitz, René Werner, Aleksandr Matsun, Mugariya Farooq, Numan Saaed, Jose Renato Restom Viera, Mohammad Yaqub, Neil Getty, Fangfang Xia, Zixuan Zhao, Xiaotian Duan, Xing Yao, Ange Lou, Hao Yang, Jintong Han, Jack Noble, Jie Ying Wu, Tamer Abdulbaki Alshirbaji, Nour Aldeen Jalal, Herag Arabian, Ning Ding, Knut Moeller, Weiliang Chen, Quan He, Muhammad Bilal, Taofeek Akinosho, Adnan Qayyum, Massimo Caputo, Hunaid Vohra, Michael Loizou, Anuoluwapo Ajayi, Ilhem Berrou, Faatihah Niyi-Odumosu, Lena Maier-Hein, Danail Stoyanov, Stefanie Speidel, Anthony Jarc
Unfortunately, obtaining the annotations needed to train machine learning models to identify and localize surgical tools is a difficult task.
no code implementations • CVPR 2023 • Mingfang Zhang, Jinglu Wang, Xiao Li, Yifei HUANG, Yoichi Sato, Yan Lu
The Multiplane Image (MPI), containing a set of fronto-parallel RGBA layers, is an effective and efficient representation for view synthesis from sparse inputs.
1 code implementation • 7 Feb 2023 • Zecheng Yu, Yifei HUANG, Ryosuke Furuta, Takuma Yagi, Yusuke Goutsu, Yoichi Sato
Object affordance is an important concept in hand-object interaction, providing information on action possibilities based on human motor capacity and objects' physical property thus benefiting tasks such as action anticipation and robot imitation learning.
no code implementations • CVPR 2023 • Lijin Yang, Quan Kong, Hsuan-Kung Yang, Wadim Kehl, Yoichi Sato, Norimasa Kobori
Compositional temporal grounding is the task of localizing dense action by using known words combined in novel ways in the form of novel query sentences for the actual grounding.
no code implementations • CVPR 2023 • Yifei HUANG, Lijin Yang, Yoichi Sato
The task of weakly supervised temporal sentence grounding aims at finding the corresponding temporal moments of a language description in the video, given video-language correspondence only at video-level.
no code implementations • 21 Nov 2022 • Zhihang Zhong, Mingxi Cheng, Zhirong Wu, Yuhui Yuan, Yinqiang Zheng, Ji Li, Han Hu, Stephen Lin, Yoichi Sato, Imari Sato
Image cropping has progressed tremendously under the data-driven paradigm.
no code implementations • 4 Aug 2022 • Zhenqiang Li, Lin Gu, Weimin WANG, Ryosuke Nakamura, Yoichi Sato
Automated video-based assessment of surgical skills is a promising task in assisting young surgical trainees, especially in poor-resource areas.
no code implementations • 23 Jul 2022 • Zuoyue Li, Tianxing Fan, Zhenqiang Li, Zhaopeng Cui, Yoichi Sato, Marc Pollefeys, Martin R. Oswald
We introduce a scalable framework for novel view synthesis from RGB-D images with largely incomplete scene coverage.
no code implementations • 12 Jul 2022 • Yifei HUANG, Lijin Yang, Yoichi Sato
Each global prototype is encouraged to summarize a specific aspect from the entire video, for example, the start/evolution of the action.
no code implementations • 11 Jun 2022 • Zecheng Yu, Yifei HUANG, Ryosuke Furuta, Takuma Yagi, Yusuke Goutsu, Yoichi Sato
Object affordance is an important concept in human-object interaction, providing information on action possibilities based on human motor capacity and objects' physical property thus benefiting tasks such as action anticipation and robot imitation learning.
1 code implementation • 10 Jun 2022 • Takuma Yagi, Md Tasnimul Hasan, Yoichi Sato
We study the problem of identifying object instances in a dynamic environment where people interact with the objects.
no code implementations • 5 Jun 2022 • Takehiko Ohkawa, Ryosuke Furuta, Yoichi Sato
In this survey, we present a systematic review of 3D hand pose estimation from the perspective of efficient annotation and learning.
no code implementations • 16 Mar 2022 • Takehiko Ohkawa, Yu-Jhe Li, Qichen Fu, Ryosuke Furuta, Kris M. Kitani, Yoichi Sato
We aim to improve the performance of regressing hand keypoints and segmenting pixel-level hand masks under new imaging conditions (e. g., outdoors) when we only have labeled images taken under very different conditions (e. g., indoors).
no code implementations • 28 Feb 2022 • Koya Tango, Takehiko Ohkawa, Ryosuke Furuta, Yoichi Sato
Detecting the positions of human hands and objects-in-contact (hand-object detection) in each video frame is vital for understanding human activities from videos.
no code implementations • CVPR 2022 • Lijin Yang, Yifei HUANG, Yusuke Sugano, Yoichi Sato
Different from previous works, we find that the cross-domain alignment can be more effectively done by using cross-modal interaction first.
no code implementations • 2 Dec 2021 • Lijin Yang, Yifei HUANG, Yusuke Sugano, Yoichi Sato
Previous works explored to address this problem by applying temporal attention but failed to consider the global context of the full video, which is critical for determining the relatively significant parts.
no code implementations • 2 Dec 2021 • Yifei HUANG, Xiaoxiao Li, Lijin Yang, Lin Gu, Yingying Zhu, Hirofumi Seo, Qiuming Meng, Tatsuya Harada, Yoichi Sato
Then we design a novel Auxiliary Attention Block (AAB) to allow information from SAN to be utilized by the backbone encoder to focus on selective areas.
no code implementations • NeurIPS 2021 • Kaipeng Zhang, Zhenqiang Li, Zhifeng Li, Wei Liu, Yoichi Sato
However, they use the same procedure sequence for all inputs, regardless of the intermediate features. This paper proffers a simple yet effective idea of constructing parallel procedures and assigning similar intermediate features to the same specialized procedures in a divide-and-conquer fashion.
1 code implementation • 19 Oct 2021 • Takuma Yagi, Md Tasnimul Hasan, Yoichi Sato
In this study, we introduce a video-based method for predicting contact between a hand and an object.
8 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.
1 code implementation • 1 Sep 2021 • Zhenqiang Li, Weimin WANG, Zuoyue Li, Yifei HUANG, Yoichi Sato
The attribution method provides a direction for interpreting opaque neural networks in a visual way by identifying and visualizing the input regions/pixels that dominate the output of a network.
1 code implementation • 6 Jul 2021 • Takehiko Ohkawa, Takuma Yagi, Atsushi Hashimoto, Yoshitaka Ushiku, Yoichi Sato
We validated our method on domain adaptation of hand segmentation from real and simulation images.
no code implementations • 18 Jun 2021 • Lijin Yang, Yifei HUANG, Yusuke Sugano, Yoichi Sato
In this report, we describe the technical details of our submission to the 2021 EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition.
no code implementations • 18 Jan 2021 • Takuma Yagi, Takumi Nishiyasu, Kunimasa Kawasaki, Moe Matsuki, Yoichi Sato
People spend an enormous amount of time and effort looking for lost objects.
2 code implementations • 1 May 2020 • Zhenqiang Li, Weimin WANG, Zuoyue Li, Yifei HUANG, Yoichi Sato
''Making black box models explainable'' is a vital problem that accompanies the development of deep learning networks.
no code implementations • 11 Mar 2020 • Yoichi Sato, Yasuhiko Takegami, Takamune Asamoto, Yutaro Ono, Tsugeno Hidetoshi, Ryosuke Goto, Akira Kitamura, Seiwa Honda
Gradient-weighted class activation mapping (Grad-CAM) was used to conceptualize the diagnostic basis of the CAD system.
no code implementations • 9 Jan 2019 • Zhenqiang Li, Yifei Huang, Minjie Cai, Yoichi Sato
Recent advances in computer vision have made it possible to automatically assess from videos the manipulation skills of humans in performing a task, which breeds many important applications in domains such as health rehabilitation and manufacturing.
no code implementations • 7 Jan 2019 • Yifei Huang, Zhenqiang Li, Minjie Cai, Yoichi Sato
In this work, we address two coupled tasks of gaze prediction and action recognition in egocentric videos by exploring their mutual context.
no code implementations • 22 Jul 2018 • Minjie Cai, Kris Kitani, Yoichi Sato
In the proposed model, we explore various semantic relationships between actions, grasp types and object attributes, and show how the context can be used to boost the recognition of each component.
2 code implementations • ECCV 2018 • Yifei Huang, Minjie Cai, Zhenqiang Li, Yoichi Sato
We present a new computational model for gaze prediction in egocentric videos by exploring patterns in temporal shift of gaze fixations (attention transition) that are dependent on egocentric manipulation tasks.
1 code implementation • CVPR 2018 • Takuma Yagi, Karttikeya Mangalam, Ryo Yonetani, Yoichi Sato
We present a new task that predicts future locations of people observed in first-person videos.
no code implementations • ICCV 2017 • Yan Jia, Yinqiang Zheng, Lin Gu, Art Subpa-Asa, Antony Lam, Yoichi Sato, Imari Sato
Spectral analysis of natural scenes can provide much more detailed information about the scene than an ordinary RGB camera.
no code implementations • CVPR 2017 • Tatsunori Taniai, Sudipta N. Sinha, Yoichi Sato
This unified framework benefits all four tasks - stereo, optical flow, visual odometry and motion segmentation leading to overall higher accuracy and efficiency.
no code implementations • 14 Jun 2017 • Tetsu Matsukawa, Takahiro Okabe, Einoshin Suzuki, Yoichi Sato
To solve this problem, we describe a local region in an image via hierarchical Gaussian distribution in which both means and covariances are included in their parameters.
no code implementations • ICCV 2017 • Ryo Yonetani, Vishnu Naresh Boddeti, Kris M. Kitani, Yoichi Sato
We propose a privacy-preserving framework for learning visual classifiers by leveraging distributed private image data.
no code implementations • 15 Jun 2016 • Ryo Yonetani, Kris M. Kitani, Yoichi Sato
We envision a future time when wearable cameras are worn by the masses and recording first-person point-of-view videos of everyday life.
no code implementations • CVPR 2016 • Tetsu Matsukawa, Takahiro Okabe, Einoshin Suzuki, Yoichi Sato
In both steps, unlike the hierarchical covariance descriptor, the proposed descriptor can model both the mean and the covariance information of pixel features properly.
no code implementations • CVPR 2016 • Ying Fu, Yinqiang Zheng, Imari Sato, Yoichi Sato
In this paper, we propose an effective method for coded hyperspectral image restoration, which exploits extensive structure sparsity in the hyperspectral image.
1 code implementation • CVPR 2016 • Tatsunori Taniai, Sudipta N. Sinha, Yoichi Sato
We propose a new technique to jointly recover cosegmentation and dense per-pixel correspondence in two images.
no code implementations • CVPR 2016 • Ryo Yonetani, Kris M. Kitani, Yoichi Sato
We aim to understand the dynamics of social interactions between two people by recognizing their actions and reactions using a head-mounted camera.
2 code implementations • 28 Mar 2016 • Tatsunori Taniai, Yasuyuki Matsushita, Yoichi Sato, Takeshi Naemura
The local expansion moves extend traditional expansion moves by two ways: localization and spatial propagation.
no code implementations • ICCV 2015 • Yinqiang Zheng, Ying Fu, Antony Lam, Imari Sato, Yoichi Sato
This paper introduces a novel method to separate fluorescent and reflective components in the spectral domain.
no code implementations • ICCV 2015 • Ying Fu, Antony Lam, Imari Sato, Yoichi Sato
Hyperspectral imaging is beneficial in a diverse range of applications from diagnostic medicine, to agriculture, to surveillance to name a few.
no code implementations • CVPR 2015 • Ryo Yonetani, Kris M. Kitani, Yoichi Sato
We incorporate this feature into our proposed approach that computes the motion correlation over supervoxel hierarchies to localize target instances in observer videos.
no code implementations • CVPR 2015 • Yinqiang Zheng, Imari Sato, Yoichi Sato
This paper addresses the illumination and reflectance spectra separation (IRSS) problem of a hyperspectral image captured under general spectral illumination.
no code implementations • CVPR 2015 • Feng Lu, Imari Sato, Yoichi Sato
This sort of symmetry can be observed in a 1D BRDF slice from a subset of surface normals with the same azimuth angle, and we use it to devise an efficient modeling and solution method to constrain and recover the elevation angles of surface normals accurately.
no code implementations • CVPR 2014 • Ying Fu, Antony Lam, Yasuyuki Kobashi, Imari Sato, Takahiro Okabe, Yoichi Sato
We then show that given the spectral reflectance and fluorescent chromaticity, the fluorescence absorption and emission spectra can also be estimated.
no code implementations • CVPR 2014 • Yusuke Sugano, Yasuyuki Matsushita, Yoichi Sato
Unlike existing appearance-based methods that assume person-specific training data, we use a large amount of cross-subject training data to train a 3D gaze estimator.
no code implementations • CVPR 2014 • Che-Han Chang, Yoichi Sato, Yung-Yu Chuang
It provides good alignment accuracy as projective warps while preserving the perspective of individual image as similarity warps.
no code implementations • CVPR 2013 • Feng Lu, Yasuyuki Matsushita, Imari Sato, Takahiro Okabe, Yoichi Sato
We propose an uncalibrated photometric stereo method that works with general and unknown isotropic reflectances.