no code implementations • 8 Oct 2024 • Paul Streli, Mark Richardson, Fadi Botros, Shugao Ma, Robert Wang, Christian Holz
Our method TouchInsight comprises a neural network to predict the moment of a touch event, the finger making contact, and the touch location.
1 code implementation • CVPR 2024 • Anna Kukleva, Fadime Sener, Edoardo Remelli, Bugra Tekin, Eric Sauser, Bernt Schiele, Shugao Ma
Lately, there has been growing interest in adapting vision-language models (VLMs) to image and third-person video classification due to their success in zero-shot recognition.
no code implementations • 26 Mar 2024 • Sammy Christen, Shreyas Hampali, Fadime Sener, Edoardo Remelli, Tomas Hodan, Eric Sauser, Shugao Ma, Bugra Tekin
For the textual guidance, we contribute comprehensive text descriptions to the GRAB dataset and show that they enable our method to have more fine-grained control over hand-object interactions.
1 code implementation • 14 Mar 2024 • Md Salman Shamil, Dibyadip Chatterjee, Fadime Sener, Shugao Ma, Angela Yao
3D hand pose is an underexplored modality for action recognition.
Ranked #1 on Action Recognition on H2O (2 Hands and Objects)
no code implementations • 12 Mar 2024 • Qihang Fang, Chengcheng Tang, Shugao Ma, Yanchao Yang
Skeleton-based motion representations are robust for action localization and understanding for their invariance to perspective, lighting, and occlusion, compared with images.
1 code implementation • NeurIPS 2023 • Dibyadip Chatterjee, Fadime Sener, Shugao Ma, Angela Yao
Given a set of verbs and objects observed during training, the goal is to generalize the verbs to an open vocabulary of actions with seen and novel objects.
Ranked #1 on Open Vocabulary Action Recognition on Assembly101 (using extra training data)
no code implementations • 31 Jul 2023 • Guodong Ding, Fadime Sener, Shugao Ma, Angela Yao
Our framework constructs a knowledge base with spatial and temporal beliefs based on observed mistakes.
1 code implementation • ICCV 2023 • Shubhra Aich, Jesus Ruiz-Santaquiteria, Zhenyu Lu, Prachi Garg, K J Joseph, Alvaro Fernandez Garcia, Vineeth N Balasubramanian, Kenrick Kin, Chengde Wan, Necati Cihan Camgoz, Shugao Ma, Fernando de la Torre
Our sampling scheme outperforms SOTA methods significantly on two 3D skeleton gesture datasets, the publicly available SHREC 2017, and EgoGesture3D -- which we extract from a publicly available RGBD dataset.
no code implementations • 15 Mar 2022 • Emre Aksan, Shugao Ma, Akin Caliskan, Stanislav Pidhorskyi, Alexander Richard, Shih-En Wei, Jason Saragih, Otmar Hilliges
To mitigate this asymmetry, we introduce a prior model that is conditioned on the runtime inputs and tie this prior space to the 3D face model via a normalizing flow in the latent space.
1 code implementation • CVPR 2021 • Shugao Ma, Tomas Simon, Jason Saragih, Dawei Wang, Yuecheng Li, Fernando de la Torre, Yaser Sheikh
Telecommunication with photorealistic avatars in virtual or augmented reality is a promising path for achieving authentic face-to-face communication in 3D over remote physical distances.
no code implementations • 8 Mar 2021 • Xiaofan Zhang, Dawei Wang, Pierce Chuang, Shugao Ma, Deming Chen, Yuecheng Li
Creating virtual avatars with realistic rendering is one of the most essential and challenging tasks to provide highly immersive virtual reality (VR) experiences.
no code implementations • ECCV 2020 • Hang Chu, Shugao Ma, Fernando de la Torre, Sanja Fidler, Yaser Sheikh
It is important to note that traditional person-specific CAs are learned from few training samples, and typically lack robustness as well as limited expressiveness when transferring facial expressions.
no code implementations • 11 Aug 2020 • Alexander Richard, Colin Lea, Shugao Ma, Juergen Gall, Fernando de la Torre, Yaser Sheikh
Codec Avatars are a recent class of learned, photorealistic face models that accurately represent the geometry and texture of a person in 3D (i. e., for virtual reality), and are almost indistinguishable from video.
3 code implementations • 5 Oct 2019 • Chaitanya Ahuja, Shugao Ma, Louis-Philippe Morency, Yaser Sheikh
In this paper, we introduce a neural architecture named Dyadic Residual-Attention Model (DRAM), which integrates intrapersonal (monadic) and interpersonal (dyadic) dynamics using selective attention to generate sequences of body pose conditioned on audio and body pose of the interlocutor and audio of the human operating the avatar.
1 code implementation • ECCV 2018 • Aayush Bansal, Shugao Ma, Deva Ramanan, Yaser Sheikh
We introduce a data-driven approach for unsupervised video retargeting that translates content from one domain to another while preserving the style native to a domain, i. e., if contents of John Oliver's speech were to be transferred to Stephen Colbert, then the generated content/speech should be in Stephen Colbert's style.
no code implementations • CVPR 2015 • Jianming Zhang, Shugao Ma, Mehrnoosh Sameki, Stan Sclaroff, Margrit Betke, Zhe Lin, Xiaohui Shen, Brian Price, Radomir Mech
We study the problem of Salient Object Subitizing, i. e. predicting the existence and the number of salient objects in an image using holistic cues.
no code implementations • CVPR 2016 • Shugao Ma, Leonid Sigal, Stan Sclaroff
In this work we improve training of temporal deep models to better learn activity progression for activity detection and early detection.
no code implementations • 22 Dec 2015 • Shugao Ma, Sarah Adel Bargal, Jianming Zhang, Leonid Sigal, Stan Sclaroff
In contrast, collecting action images from the Web is much easier and training on images requires much less computation.
Ranked #14 on Action Recognition on ActivityNet (using extra training data)
no code implementations • CVPR 2015 • Shugao Ma, Leonid Sigal, Stan Sclaroff
Using the action vocabulary we then utilize tree mining with subsequent tree clustering and ranking to select a compact set of highly discriminative tree patterns.