Search Results for author: Shugao Ma

Found 18 papers, 7 papers with code

X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization

1 code implementation • 28 Mar 2024 • Anna Kukleva, Fadime Sener, Edoardo Remelli, Bugra Tekin, Eric Sauser, Bernt Schiele, Shugao Ma

Lately, there has been growing interest in adapting vision-language models (VLMs) to image and third-person video classification due to their success in zero-shot recognition.

Video Classification Zero-Shot Learning

Paper
Code

DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions

no code implementations • 26 Mar 2024 • Sammy Christen, Shreyas Hampali, Fadime Sener, Edoardo Remelli, Tomas Hodan, Eric Sauser, Shugao Ma, Bugra Tekin

In the grasping stage, the model only generates hand motions, whereas in the interaction phase both hand and object poses are synthesized.

Object

Paper
Add Code

On the Utility of 3D Hand Poses for Action Recognition

1 code implementation • 14 Mar 2024 • Md Salman Shamil, Dibyadip Chatterjee, Fadime Sener, Shugao Ma, Angela Yao

3D hand poses are an under-explored modality for action recognition.

Ranked #1 on 3D Action Recognition on Assembly101

3D Action Recognition

Paper
Code

BID: Boundary-Interior Decoding for Unsupervised Temporal Action Localization Pre-Trainin

no code implementations • 12 Mar 2024 • Qihang Fang, Chengcheng Tang, Shugao Ma, Yanchao Yang

Skeleton-based motion representations are robust for action localization and understanding for their invariance to perspective, lighting, and occlusion, compared with images.

Temporal Action Localization Unsupervised Pre-training

Paper
Add Code

Opening the Vocabulary of Egocentric Actions

1 code implementation • NeurIPS 2023 • Dibyadip Chatterjee, Fadime Sener, Shugao Ma, Angela Yao

Given a set of verbs and objects observed during training, the goal is to generalize the verbs to an open vocabulary of actions with seen and novel objects.

Ranked #1 on Open Vocabulary Action Recognition on Assembly101 (using extra training data)

Object Open Vocabulary Action Recognition

Paper
Code

Every Mistake Counts in Assembly

no code implementations • 31 Jul 2023 • Guodong Ding, Fadime Sener, Shugao Ma, Angela Yao

Our framework constructs a knowledge base with spatial and temporal beliefs based on observed mistakes.

Paper
Add Code

Data-Free Class-Incremental Hand Gesture Recognition

1 code implementation • ICCV 2023 • Shubhra Aich, Jesus Ruiz-Santaquiteria, Zhenyu Lu, Prachi Garg, K J Joseph, Alvaro Fernandez Garcia, Vineeth N Balasubramanian, Kenrick Kin, Chengde Wan, Necati Cihan Camgoz, Shugao Ma, Fernando de la Torre

Our sampling scheme outperforms SOTA methods significantly on two 3D skeleton gesture datasets, the publicly available SHREC 2017, and EgoGesture3D -- which we extract from a publicly available RGBD dataset.

Class Incremental Learning Hand Gesture Recognition +3

Paper
Code

LiP-Flow: Learning Inference-time Priors for Codec Avatars via Normalizing Flows in Latent Space

no code implementations • 15 Mar 2022 • Emre Aksan, Shugao Ma, Akin Caliskan, Stanislav Pidhorskyi, Alexander Richard, Shih-En Wei, Jason Saragih, Otmar Hilliges

To mitigate this asymmetry, we introduce a prior model that is conditioned on the runtime inputs and tie this prior space to the 3D face model via a normalizing flow in the latent space.

Face Model

Paper
Add Code

Pixel Codec Avatars

1 code implementation • CVPR 2021 • Shugao Ma, Tomas Simon, Jason Saragih, Dawei Wang, Yuecheng Li, Fernando de la Torre, Yaser Sheikh

Telecommunication with photorealistic avatars in virtual or augmented reality is a promising path for achieving authentic face-to-face communication in 3D over remote physical distances.

702

Paper
Code

F-CAD: A Framework to Explore Hardware Accelerators for Codec Avatar Decoding

no code implementations • 8 Mar 2021 • Xiaofan Zhang, Dawei Wang, Pierce Chuang, Shugao Ma, Deming Chen, Yuecheng Li

Creating virtual avatars with realistic rendering is one of the most essential and challenging tasks to provide highly immersive virtual reality (VR) experiences.

Paper
Add Code

Expressive Telepresence via Modular Codec Avatars

no code implementations • ECCV 2020 • Hang Chu, Shugao Ma, Fernando de la Torre, Sanja Fidler, Yaser Sheikh

It is important to note that traditional person-specific CAs are learned from few training samples, and typically lack robustness as well as limited expressiveness when transferring facial expressions.

Paper
Add Code

Audio- and Gaze-driven Facial Animation of Codec Avatars

no code implementations • 11 Aug 2020 • Alexander Richard, Colin Lea, Shugao Ma, Juergen Gall, Fernando de la Torre, Yaser Sheikh

Codec Avatars are a recent class of learned, photorealistic face models that accurately represent the geometry and texture of a person in 3D (i. e., for virtual reality), and are almost indistinguishable from video.

Paper
Add Code

To React or not to React: End-to-End Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations

3 code implementations • 5 Oct 2019 • Chaitanya Ahuja, Shugao Ma, Louis-Philippe Morency, Yaser Sheikh

In this paper, we introduce a neural architecture named Dyadic Residual-Attention Model (DRAM), which integrates intrapersonal (monadic) and interpersonal (dyadic) dynamics using selective attention to generate sequences of body pose conditioned on audio and body pose of the interlocutor and audio of the human operating the avatar.

Paper
Code

Recycle-GAN: Unsupervised Video Retargeting

1 code implementation • ECCV 2018 • Aayush Bansal, Shugao Ma, Deva Ramanan, Yaser Sheikh

We introduce a data-driven approach for unsupervised video retargeting that translates content from one domain to another while preserving the style native to a domain, i. e., if contents of John Oliver's speech were to be transferred to Stephen Colbert, then the generated content/speech should be in Stephen Colbert's style.

Face to Face Translation Translation +1

406

Paper
Code

Salient Object Subitizing

no code implementations • CVPR 2015 • Jianming Zhang, Shugao Ma, Mehrnoosh Sameki, Stan Sclaroff, Margrit Betke, Zhe Lin, Xiaohui Shen, Brian Price, Radomir Mech

We study the problem of Salient Object Subitizing, i. e. predicting the existence and the number of salient objects in an image using holistic cues.

Image Retrieval Object +4

Paper
Add Code

Learning Activity Progression in LSTMs for Activity Detection and Early Detection

no code implementations • CVPR 2016 • Shugao Ma, Leonid Sigal, Stan Sclaroff

In this work we improve training of temporal deep models to better learn activity progression for activity detection and early detection.

Action Detection Activity Detection +1

Paper
Add Code

Do Less and Achieve More: Training CNNs for Action Recognition Utilizing Action Images from the Web

no code implementations • 22 Dec 2015 • Shugao Ma, Sarah Adel Bargal, Jianming Zhang, Leonid Sigal, Stan Sclaroff

In contrast, collecting action images from the Web is much easier and training on images requires much less computation.

Ranked #14 on Action Recognition on ActivityNet (using extra training data)

Action Recognition In Videos Temporal Action Localization

Paper
Add Code

Space-Time Tree Ensemble for Action Recognition

no code implementations • CVPR 2015 • Shugao Ma, Leonid Sigal, Stan Sclaroff

Using the action vocabulary we then utilize tree mining with subsequent tree clustering and ranking to select a compact set of highly discriminative tree patterns.

Action Recognition Clustering +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.