Search Results for author: Shugao Ma

Found 18 papers, 7 papers with code

X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization

1 code implementation28 Mar 2024 Anna Kukleva, Fadime Sener, Edoardo Remelli, Bugra Tekin, Eric Sauser, Bernt Schiele, Shugao Ma

Lately, there has been growing interest in adapting vision-language models (VLMs) to image and third-person video classification due to their success in zero-shot recognition.

Video Classification Zero-Shot Learning

DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions

no code implementations26 Mar 2024 Sammy Christen, Shreyas Hampali, Fadime Sener, Edoardo Remelli, Tomas Hodan, Eric Sauser, Shugao Ma, Bugra Tekin

In the grasping stage, the model only generates hand motions, whereas in the interaction phase both hand and object poses are synthesized.

Object

BID: Boundary-Interior Decoding for Unsupervised Temporal Action Localization Pre-Trainin

no code implementations12 Mar 2024 Qihang Fang, Chengcheng Tang, Shugao Ma, Yanchao Yang

Skeleton-based motion representations are robust for action localization and understanding for their invariance to perspective, lighting, and occlusion, compared with images.

Temporal Action Localization Unsupervised Pre-training

Opening the Vocabulary of Egocentric Actions

1 code implementation NeurIPS 2023 Dibyadip Chatterjee, Fadime Sener, Shugao Ma, Angela Yao

Given a set of verbs and objects observed during training, the goal is to generalize the verbs to an open vocabulary of actions with seen and novel objects.

 Ranked #1 on Open Vocabulary Action Recognition on Assembly101 (using extra training data)

Object Open Vocabulary Action Recognition

Every Mistake Counts in Assembly

no code implementations31 Jul 2023 Guodong Ding, Fadime Sener, Shugao Ma, Angela Yao

Our framework constructs a knowledge base with spatial and temporal beliefs based on observed mistakes.

Data-Free Class-Incremental Hand Gesture Recognition

1 code implementation ICCV 2023 Shubhra Aich, Jesus Ruiz-Santaquiteria, Zhenyu Lu, Prachi Garg, K J Joseph, Alvaro Fernandez Garcia, Vineeth N Balasubramanian, Kenrick Kin, Chengde Wan, Necati Cihan Camgoz, Shugao Ma, Fernando de la Torre

Our sampling scheme outperforms SOTA methods significantly on two 3D skeleton gesture datasets, the publicly available SHREC 2017, and EgoGesture3D -- which we extract from a publicly available RGBD dataset.

Class Incremental Learning Hand Gesture Recognition +3

LiP-Flow: Learning Inference-time Priors for Codec Avatars via Normalizing Flows in Latent Space

no code implementations15 Mar 2022 Emre Aksan, Shugao Ma, Akin Caliskan, Stanislav Pidhorskyi, Alexander Richard, Shih-En Wei, Jason Saragih, Otmar Hilliges

To mitigate this asymmetry, we introduce a prior model that is conditioned on the runtime inputs and tie this prior space to the 3D face model via a normalizing flow in the latent space.

Face Model

Pixel Codec Avatars

1 code implementation CVPR 2021 Shugao Ma, Tomas Simon, Jason Saragih, Dawei Wang, Yuecheng Li, Fernando de la Torre, Yaser Sheikh

Telecommunication with photorealistic avatars in virtual or augmented reality is a promising path for achieving authentic face-to-face communication in 3D over remote physical distances.

F-CAD: A Framework to Explore Hardware Accelerators for Codec Avatar Decoding

no code implementations8 Mar 2021 Xiaofan Zhang, Dawei Wang, Pierce Chuang, Shugao Ma, Deming Chen, Yuecheng Li

Creating virtual avatars with realistic rendering is one of the most essential and challenging tasks to provide highly immersive virtual reality (VR) experiences.

Expressive Telepresence via Modular Codec Avatars

no code implementations ECCV 2020 Hang Chu, Shugao Ma, Fernando de la Torre, Sanja Fidler, Yaser Sheikh

It is important to note that traditional person-specific CAs are learned from few training samples, and typically lack robustness as well as limited expressiveness when transferring facial expressions.

Audio- and Gaze-driven Facial Animation of Codec Avatars

no code implementations11 Aug 2020 Alexander Richard, Colin Lea, Shugao Ma, Juergen Gall, Fernando de la Torre, Yaser Sheikh

Codec Avatars are a recent class of learned, photorealistic face models that accurately represent the geometry and texture of a person in 3D (i. e., for virtual reality), and are almost indistinguishable from video.

To React or not to React: End-to-End Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations

3 code implementations5 Oct 2019 Chaitanya Ahuja, Shugao Ma, Louis-Philippe Morency, Yaser Sheikh

In this paper, we introduce a neural architecture named Dyadic Residual-Attention Model (DRAM), which integrates intrapersonal (monadic) and interpersonal (dyadic) dynamics using selective attention to generate sequences of body pose conditioned on audio and body pose of the interlocutor and audio of the human operating the avatar.

Recycle-GAN: Unsupervised Video Retargeting

1 code implementation ECCV 2018 Aayush Bansal, Shugao Ma, Deva Ramanan, Yaser Sheikh

We introduce a data-driven approach for unsupervised video retargeting that translates content from one domain to another while preserving the style native to a domain, i. e., if contents of John Oliver's speech were to be transferred to Stephen Colbert, then the generated content/speech should be in Stephen Colbert's style.

Face to Face Translation Translation +1

Salient Object Subitizing

no code implementations CVPR 2015 Jianming Zhang, Shugao Ma, Mehrnoosh Sameki, Stan Sclaroff, Margrit Betke, Zhe Lin, Xiaohui Shen, Brian Price, Radomir Mech

We study the problem of Salient Object Subitizing, i. e. predicting the existence and the number of salient objects in an image using holistic cues.

Image Retrieval Object +4

Learning Activity Progression in LSTMs for Activity Detection and Early Detection

no code implementations CVPR 2016 Shugao Ma, Leonid Sigal, Stan Sclaroff

In this work we improve training of temporal deep models to better learn activity progression for activity detection and early detection.

Action Detection Activity Detection +1

Space-Time Tree Ensemble for Action Recognition

no code implementations CVPR 2015 Shugao Ma, Leonid Sigal, Stan Sclaroff

Using the action vocabulary we then utilize tree mining with subsequent tree clustering and ranking to select a compact set of highly discriminative tree patterns.

Action Recognition Clustering +1

Cannot find the paper you are looking for? You can Submit a new open access paper.