no code implementations • 13 Feb 2024 • Fei Deng, Qifei Wang, Wei Wei, Matthias Grundmann, Tingbo Hou
However, in the vision domain, existing RL-based reward finetuning methods are limited by their instability in large-scale training, rendering them incapable of generalizing to complex, unseen prompts.
no code implementations • 16 Jan 2024 • Yang Yang, George Sung, Shao-Fu Shih, Hakan Erdogan, Chehung Lee, Matthias Grundmann
We propose a neural network model that can separate target speech sources from interfering sources at different angular regions using two microphones.
no code implementations • 5 Jan 2024 • Yang Yang, Yury Kartynnik, Yunpeng Li, Jiuqiang Tang, Xing Li, George Sung, Matthias Grundmann
We present StreamVC, a streaming voice conversion solution that preserves the content and prosody of any source speech while matching the voice timbre from any target speech.
no code implementations • 19 Sep 2023 • Esha Uboweja, David Tian, Qifei Wang, Yi-Chun Kuo, Joe Zou, Lu Wang, George Sung, Matthias Grundmann
Our framework provides a pre-trained single-hand embedding model that can be fine-tuned for custom gesture recognition.
no code implementations • 11 Sep 2023 • Ivan Grishchenko, Geng Yan, Eduard Gabriel Bazavan, Andrei Zanfir, Nikolai Chinaev, Karthik Raveendran, Matthias Grundmann, Cristian Sminchisescu
We present Blendshapes GHUM, an on-device ML pipeline that predicts 52 facial blendshape coefficients at 30+ FPS on modern mobile phones, from a single monocular RGB image and enables facial motion capture applications like virtual avatars.
no code implementations • ICCV 2023 • Yang Zhao, Tingbo Hou, Yu-Chuan Su, Xuhui Jia. Yandong Li, Matthias Grundmann
An authentic face restoration system is becoming increasingly demanding in many computer vision applications, e. g., image enhancement, video communication, and taking portrait.
no code implementations • 21 Apr 2023 • Yu-Hui Chen, Raman Sarokin, Juhyun Lee, Jiuqiang Tang, Chuo-Ling Chang, Andrei Kulik, Matthias Grundmann
The rapid development and application of foundation models have revolutionized the field of artificial intelligence.
no code implementations • 13 Mar 2023 • Yang Yang, Shao-Fu Shih, Hakan Erdogan, Jamie Menjay Lin, Chehung Lee, Yunpeng Li, George Sung, Matthias Grundmann
Multi-microphone speech enhancement problem is often decomposed into two decoupled steps: a beamformer that provides spatial filtering and a single-channel speech enhancement model that cleans up the beamformer output.
no code implementations • 24 Aug 2022 • Jamie Menjay Lin, Siargey Pisarchyk, Juhyun Lee, David Tian, Tingbo Hou, Karthik Raveendran, Raman Sarokin, George Sung, Trent Tolley, Matthias Grundmann
We introduce an efficient video segmentation system for resource-limited edge devices leveraging heterogeneous compute.
no code implementations • 23 Jun 2022 • Ivan Grishchenko, Valentin Bazarevsky, Andrei Zanfir, Eduard Gabriel Bazavan, Mihai Zanfir, Richard Yee, Karthik Raveendran, Matsvei Zhdanovich, Matthias Grundmann, Cristian Sminchisescu
We present BlazePose GHUM Holistic, a lightweight neural network pipeline for 3D human body landmarks and pose estimation, specifically tailored to real-time on-device inference.
no code implementations • 29 Oct 2021 • George Sung, Kanstantsin Sokal, Esha Uboweja, Valentin Bazarevsky, Jonathan Baccash, Eduard Gabriel Bazavan, Chuo-Ling Chang, Matthias Grundmann
We present an on-device real-time hand gesture recognition (HGR) system, which detects a set of predefined static gestures from a single RGB camera.
no code implementations • 25 Feb 2021 • Matthias Grundmann, Hedwig Amberg, Hannes Hartenstein
Thus, the number of unreachable peers can only be estimated based on some indicators.
Cryptography and Security Networking and Internet Architecture
1 code implementation • CVPR 2021 • Adel Ahmadyan, Liangkai Zhang, Jianing Wei, Artsiom Ablavatski, Matthias Grundmann
3D object detection has recently become popular due to many applications in robotics, augmented reality, autonomy, and image retrieval.
Ranked #2 on Monocular 3D Object Detection on Google Objectron
no code implementations • 23 Jun 2020 • Adel Ahmadyan, Tingbo Hou, Jianing Wei, Liangkai Zhang, Artsiom Ablavatski, Matthias Grundmann
Our tracker is capable of performing relative-scale 9-DoF tracking in real-time on mobile devices.
1 code implementation • 19 Jun 2020 • Ivan Grishchenko, Artsiom Ablavatski, Yury Kartynnik, Karthik Raveendran, Matthias Grundmann
We present Attention Mesh, a lightweight architecture for 3D face mesh prediction that uses attention to semantically meaningful regions.
4 code implementations • 18 Jun 2020 • Fan Zhang, Valentin Bazarevsky, Andrey Vakunov, Andrei Tkachenka, George Sung, Chuo-Ling Chang, Matthias Grundmann
We present a real-time on-device hand tracking pipeline that predicts hand skeleton from single RGB camera for AR/VR applications.
7 code implementations • 17 Jun 2020 • Valentin Bazarevsky, Ivan Grishchenko, Karthik Raveendran, Tyler Zhu, Fan Zhang, Matthias Grundmann
We present BlazePose, a lightweight convolutional neural network architecture for human pose estimation that is tailored for real-time inference on mobile devices.
Ranked #1 on 3D Pose Estimation on Google-Yoga
no code implementations • 7 Mar 2020 • Tingbo Hou, Adel Ahmadyan, Liangkai Zhang, Jianing Wei, Matthias Grundmann
The former is used when there is only pose supervision, and the latter is for the case when shape supervision is available, even a weak one.
Ranked #3 on Monocular 3D Object Detection on Google Objectron
no code implementations • 16 Jul 2019 • Jianing Wei, Genzhi Ye, Tyler Mullen, Matthias Grundmann, Adel Ahmadyan, Tingbo Hou
Augmented Reality (AR) brings immersive experiences to users.
3 code implementations • 15 Jul 2019 • Yury Kartynnik, Artsiom Ablavatski, Ivan Grishchenko, Matthias Grundmann
The relatively dense mesh model of 468 vertices is well-suited for face-based AR effects.
10 code implementations • 11 Jul 2019 • Valentin Bazarevsky, Yury Kartynnik, Andrey Vakunov, Karthik Raveendran, Matthias Grundmann
We present BlazeFace, a lightweight and well-performing face detector tailored for mobile GPU inference.
no code implementations • 3 Jul 2019 • Juhyun Lee, Nikolay Chirkov, Ekaterina Ignasheva, Yury Pisarchyk, Mogan Shieh, Fabio Riccardi, Raman Sarokin, Andrei Kulik, Matthias Grundmann
On-device inference of machine learning models for mobile phones is desirable due to its lower latency and increased privacy.
2 code implementations • 14 Jun 2019 • Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, Wan-Teh Chang, Wei Hua, Manfred Georg, Matthias Grundmann
A developer can use MediaPipe to build prototypes by combining existing perception components, to advance them to polished cross-platform applications and measure system performance and resource consumption on target platforms.
Distributed, Parallel, and Cluster Computing
no code implementations • 25 Oct 2015 • S. Hussain Raza, Ahmad Humayun, Matthias Grundmann, David Anderson, Irfan Essa
Our proposed framework provides an efficient approach for finding temporally consistent occlusion boundaries in video by utilizing causality, redundancy in videos, and semantic layout of the scene.
no code implementations • CVPR 2013 • S. Hussain Raza, Matthias Grundmann, Irfan Essa
We present a novel algorithm for estimating the broad 3D geometric structure of outdoor video scenes.