Search Results for author: Matthias Grundmann

Found 25 papers, 7 papers with code

PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models

no code implementations • 13 Feb 2024 • Fei Deng, Qifei Wang, Wei Wei, Matthias Grundmann, Tingbo Hou

However, in the vision domain, existing RL-based reward finetuning methods are limited by their instability in large-scale training, rendering them incapable of generalizing to complex, unseen prompts.

Denoising Reinforcement Learning (RL)

Paper
Add Code

Binaural Angular Separation Network

no code implementations • 16 Jan 2024 • Yang Yang, George Sung, Shao-Fu Shih, Hakan Erdogan, Chehung Lee, Matthias Grundmann

We propose a neural network model that can separate target speech sources from interfering sources at different angular regions using two microphones.

Paper
Add Code

StreamVC: Real-Time Low-Latency Voice Conversion

no code implementations • 5 Jan 2024 • Yang Yang, Yury Kartynnik, Yunpeng Li, Jiuqiang Tang, Xing Li, George Sung, Matthias Grundmann

We present StreamVC, a streaming voice conversion solution that preserves the content and prosody of any source speech while matching the voice timbre from any target speech.

Speech Synthesis Voice Conversion

Paper
Add Code

On-device Real-time Custom Hand Gesture Recognition

no code implementations • 19 Sep 2023 • Esha Uboweja, David Tian, Qifei Wang, Yi-Chun Kuo, Joe Zou, Lu Wang, George Sung, Matthias Grundmann

Our framework provides a pre-trained single-hand embedding model that can be fine-tuned for custom gesture recognition.

Hand Gesture Recognition Hand-Gesture Recognition

Paper
Add Code

Blendshapes GHUM: Real-time Monocular Facial Blendshape Prediction

no code implementations • 11 Sep 2023 • Ivan Grishchenko, Geng Yan, Eduard Gabriel Bazavan, Andrei Zanfir, Nikolai Chinaev, Karthik Raveendran, Matthias Grundmann, Cristian Sminchisescu

We present Blendshapes GHUM, an on-device ML pipeline that predicts 52 facial blendshape coefficients at 30+ FPS on modern mobile phones, from a single monocular RGB image and enables facial motion capture applications like virtual avatars.

Paper
Add Code

Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond

no code implementations • ICCV 2023 • Yang Zhao, Tingbo Hou, Yu-Chuan Su, Xuhui Jia. Yandong Li, Matthias Grundmann

An authentic face restoration system is becoming increasingly demanding in many computer vision applications, e. g., image enhancement, video communication, and taking portrait.

Blind Face Restoration Denoising +2

Paper
Add Code

Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations

no code implementations • 21 Apr 2023 • Yu-Hui Chen, Raman Sarokin, Juhyun Lee, Jiuqiang Tang, Chuo-Ling Chang, Andrei Kulik, Matthias Grundmann

The rapid development and application of foundation models have revolutionized the field of artificial intelligence.

Quantization

Paper
Add Code

Guided Speech Enhancement Network

no code implementations • 13 Mar 2023 • Yang Yang, Shao-Fu Shih, Hakan Erdogan, Jamie Menjay Lin, Chehung Lee, Yunpeng Li, George Sung, Matthias Grundmann

Multi-microphone speech enhancement problem is often decomposed into two decoupled steps: a beamformer that provides spatial filtering and a single-channel speech enhancement model that cleans up the beamformer output.

Denoising Speech Enhancement

Paper
Add Code

Efficient Heterogeneous Video Segmentation at the Edge

no code implementations • 24 Aug 2022 • Jamie Menjay Lin, Siargey Pisarchyk, Juhyun Lee, David Tian, Tingbo Hou, Karthik Raveendran, Raman Sarokin, George Sung, Trent Tolley, Matthias Grundmann

We introduce an efficient video segmentation system for resource-limited edge devices leveraging heterogeneous compute.

Video Segmentation Video Semantic Segmentation

Paper
Add Code

BlazePose GHUM Holistic: Real-time 3D Human Landmarks and Pose Estimation

no code implementations • 23 Jun 2022 • Ivan Grishchenko, Valentin Bazarevsky, Andrei Zanfir, Eduard Gabriel Bazavan, Mihai Zanfir, Richard Yee, Karthik Raveendran, Matsvei Zhdanovich, Matthias Grundmann, Cristian Sminchisescu

We present BlazePose GHUM Holistic, a lightweight neural network pipeline for 3D human body landmarks and pose estimation, specifically tailored to real-time on-device inference.

3D Human Pose Estimation

Paper
Add Code

On-device Real-time Hand Gesture Recognition

no code implementations • 29 Oct 2021 • George Sung, Kanstantsin Sokal, Esha Uboweja, Valentin Bazarevsky, Jonathan Baccash, Eduard Gabriel Bazavan, Chuo-Ling Chang, Matthias Grundmann

We present an on-device real-time hand gesture recognition (HGR) system, which detects a set of predefined static gestures from a single RGB camera.

Hand Gesture Recognition Hand-Gesture Recognition

Paper
Add Code

On the Estimation of the Number of Unreachable Peers in the Bitcoin P2P Network by Observation of Peer Announcements

no code implementations • 25 Feb 2021 • Matthias Grundmann, Hedwig Amberg, Hannes Hartenstein

Thus, the number of unreachable peers can only be estimated based on some indicators.

Cryptography and Security Networking and Internet Architecture

Paper
Add Code

Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations

1 code implementation • CVPR 2021 • Adel Ahmadyan, Liangkai Zhang, Jianing Wei, Artsiom Ablavatski, Matthias Grundmann

3D object detection has recently become popular due to many applications in robotics, augmented reality, autonomy, and image retrieval.

Ranked #2 on Monocular 3D Object Detection on Google Objectron

3D Object Tracking 3D Shape Representation +6

2,205

Paper
Code

Instant 3D Object Tracking with Applications in Augmented Reality

no code implementations • 23 Jun 2020 • Adel Ahmadyan, Tingbo Hou, Jianing Wei, Liangkai Zhang, Artsiom Ablavatski, Matthias Grundmann

Our tracker is capable of performing relative-scale 9-DoF tracking in real-time on mobile devices.

3D Object Tracking Object Tracking

Paper
Add Code

Attention Mesh: High-fidelity Face Mesh Prediction in Real-time

1 code implementation • 19 Jun 2020 • Ivan Grishchenko, Artsiom Ablavatski, Yury Kartynnik, Karthik Raveendran, Matthias Grundmann

We present Attention Mesh, a lightweight architecture for 3D face mesh prediction that uses attention to semantically meaningful regions.

Vocal Bursts Intensity Prediction

25,547

Paper
Code

MediaPipe Hands: On-device Real-time Hand Tracking

4 code implementations • 18 Jun 2020 • Fan Zhang, Valentin Bazarevsky, Andrey Vakunov, Andrei Tkachenka, George Sung, Chuo-Ling Chang, Matthias Grundmann

We present a real-time on-device hand tracking pipeline that predicts hand skeleton from single RGB camera for AR/VR applications.

Paper
Code

BlazePose: On-device Real-time Body Pose tracking

7 code implementations • 17 Jun 2020 • Valentin Bazarevsky, Ivan Grishchenko, Karthik Raveendran, Tyler Zhu, Fan Zhang, Matthias Grundmann

We present BlazePose, a lightweight convolutional neural network architecture for human pose estimation that is tailored for real-time inference on mobile devices.

Ranked #1 on 3D Pose Estimation on Google-Yoga

2D Human Pose Estimation 3D Human Pose Estimation +4

25,547

Paper
Code

MobilePose: Real-Time Pose Estimation for Unseen Objects with Weak Shape Supervision

no code implementations • 7 Mar 2020 • Tingbo Hou, Adel Ahmadyan, Liangkai Zhang, Jianing Wei, Matthias Grundmann

The former is used when there is only pose supervision, and the latter is for the case when shape supervision is available, even a weak one.

Ranked #3 on Monocular 3D Object Detection on Google Objectron

Monocular 3D Object Detection Pose Estimation

Paper
Add Code

Instant Motion Tracking and Its Applications to Augmented Reality

no code implementations • 16 Jul 2019 • Jianing Wei, Genzhi Ye, Tyler Mullen, Matthias Grundmann, Adel Ahmadyan, Tingbo Hou

Augmented Reality (AR) brings immersive experiences to users.

Paper
Add Code

Real-time Facial Surface Geometry from Monocular Video on Mobile GPUs

3 code implementations • 15 Jul 2019 • Yury Kartynnik, Artsiom Ablavatski, Ivan Grishchenko, Matthias Grundmann

The relatively dense mesh model of 468 vertices is well-suited for face-based AR effects.

Face Reconstruction

281

Paper
Code

BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs

10 code implementations • 11 Jul 2019 • Valentin Bazarevsky, Yury Kartynnik, Andrey Vakunov, Karthik Raveendran, Matthias Grundmann

We present BlazeFace, a lightweight and well-performing face detector tailored for mobile GPU inference.

Face Detection

12,105

Paper
Code

On-Device Neural Net Inference with Mobile GPUs

no code implementations • 3 Jul 2019 • Juhyun Lee, Nikolay Chirkov, Ekaterina Ignasheva, Yury Pisarchyk, Mogan Shieh, Fabio Riccardi, Raman Sarokin, Andrei Kulik, Matthias Grundmann

On-device inference of machine learning models for mobile phones is desirable due to its lower latency and increased privacy.

Paper
Add Code

MediaPipe: A Framework for Building Perception Pipelines

2 code implementations • 14 Jun 2019 • Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, Wan-Teh Chang, Wei Hua, Manfred Georg, Matthias Grundmann

A developer can use MediaPipe to build prototypes by combining existing perception components, to advance them to polished cross-platform applications and measure system performance and resource consumption on target platforms.

Distributed, Parallel, and Cluster Computing

25,547

Paper
Code

Finding Temporally Consistent Occlusion Boundaries in Videos using Geometric Context

no code implementations • 25 Oct 2015 • S. Hussain Raza, Ahmad Humayun, Matthias Grundmann, David Anderson, Irfan Essa

Our proposed framework provides an efficient approach for finding temporally consistent occlusion boundaries in video by utilizing causality, redundancy in videos, and semantic layout of the scene.

Paper
Add Code

Geometric Context from Videos

no code implementations • CVPR 2013 • S. Hussain Raza, Matthias Grundmann, Irfan Essa

We present a novel algorithm for estimating the broad 3D geometric structure of outdoor video scenes.

Segmentation Video Segmentation +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.