2 code implementations • ICML 2020 • Jie Xu, Yunsheng Tian, Pingchuan Ma, Daniela Rus, Shinjiro Sueda, Wojciech Matusik
Many real-world control problems involve conflicting objectives where we desire a dense and high-quality set of control policies that are optimal for different objective preferences (called Pareto-optimal).
Multi-Objective Reinforcement Learning
reinforcement-learning
1 code implementation • 25 Mar 2023 • Pingchuan Ma, Alexandros Haliassos, Adriana Fernandez-Lopez, Honglie Chen, Stavros Petridis, Maja Pantic
Recently, the performance of automatic, visual, and audio-visual speech recognition (ASR, VSR, and AV-ASR, respectively) has been substantially improved, mainly due to the use of larger models and training sets.
Ranked #1 on
Audio-Visual Speech Recognition
on LRS3-TED
(using extra training data)
Audio-Visual Speech Recognition
Automatic Speech Recognition
+4
no code implementations • 16 Mar 2023 • Tsun-Hsuan Wang, Pingchuan Ma, Andrew Everett Spielberg, Zhou Xian, Hao Zhang, Joshua B. Tenenbaum, Daniela Rus, Chuang Gan
Existing work has typically been tailored for particular environments or representations.
no code implementations • 14 Mar 2023 • Andreas Zinonos, Alexandros Haliassos, Pingchuan Ma, Stavros Petridis, Maja Pantic
Cross-lingual self-supervised learning has been a growing research topic in the last few years.
no code implementations • 12 Dec 2022 • Alexandros Haliassos, Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Maja Pantic
We observe strong results in low- and high-resource labelled data settings when fine-tuning the visual and auditory encoders resulting from a single pre-training stage, in which the encoders are jointly trained.
Ranked #1 on
Speech Recognition
on LRS2
(using extra training data)
no code implementations • 3 Nov 2022 • Pingchuan Ma, Niko Moritz, Stavros Petridis, Christian Fuegen, Maja Pantic
The audio and the visual encoder neural networks are both based on the conformer architecture, which is made streamable using chunk-wise self-attention (CSA) and causal convolution.
Audio-Visual Speech Recognition
Automatic Speech Recognition
+5
1 code implementation • 3 Sep 2022 • Pingchuan Ma, Yujiang Wang, Stavros Petridis, Jie Shen, Maja Pantic
In this paper, we systematically investigate the performance of state-of-the-art data augmentation approaches, temporal models and other training strategies, like self-distillation and using word boundary indicators.
Ranked #1 on
Lipreading
on Lip Reading in the Wild
(using extra training data)
no code implementations • 26 Jul 2022 • Pingchuan Ma, Rui Ding, Shuai Wang, Shi Han, Dongmei Zhang
XInsight is a three-module, end-to-end pipeline designed to extract causal graphs, translate causal primitives into XDA semantics, and quantify the quantitative contribution of each explanation to a data fact.
no code implementations • ICLR 2022 • Pingchuan Ma, Tao Du, Joshua B. Tenenbaum, Wojciech Matusik, Chuang Gan
To train this predictor, we formulate a new loss on rendering variances using gradients from differentiable rendering.
no code implementations • 30 Mar 2022 • Elvis Nava, John Z. Zhang, Mike Y. Michelis, Tao Du, Pingchuan Ma, Benjamin F. Grewe, Wojciech Matusik, Robert K. Katzschmann
For the deformable solid simulation of the swimmer's body, we use state-of-the-art techniques from the field of computer graphics to speed up the finite-element method (FEM).
no code implementations • 24 Mar 2022 • Yujiang Wang, Mingzhi Dong, Jie Shen, Yiming Luo, Yiming Lin, Pingchuan Ma, Stavros Petridis, Maja Pantic
We also investigate face clustering in egocentric videos, a fast-emerging field that has not been studied yet in works related to face clustering.
Ranked #1 on
Face Clustering
on EasyCom
2 code implementations • 26 Feb 2022 • Pingchuan Ma, Stavros Petridis, Maja Pantic
However, these advances are usually due to the larger training sets rather than the model design.
Ranked #1 on
Lipreading
on GRID corpus (mixed-speech)
(using extra training data)
no code implementations • 30 Sep 2021 • John Z. Zhang, Yu Zhang, Pingchuan Ma, Elvis Nava, Tao Du, Philip Arm, Wojciech Matusik, Robert K. Katzschmann
Accurate simulation of soft mechanisms under dynamic actuation is critical for the design of soft robots.
1 code implementation • 9 Sep 2021 • Artsiom Sanakoyeu, Pingchuan Ma, Vadim Tschernezki, Björn Ommer
We propose to build a more expressive representation by jointly splitting the embedding space and the data hierarchically into smaller sub-parts.
no code implementations • 16 Jun 2021 • Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Björn W. Schuller, Maja Pantic
The large amount of audiovisual content being shared online today has drawn substantial attention to the prospect of audiovisual self-supervised learning.
no code implementations • 27 Apr 2021 • Rodrigo Mira, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Björn W. Schuller, Maja Pantic
In this work, we propose a new end-to-end video-to-speech model based on Generative Adversarial Networks (GANs) which translates spoken video to waveform end-to-end without using any intermediate representation or separate waveform synthesis algorithm.
no code implementations • 2 Apr 2021 • Pingchuan Ma, Tao Du, John Z. Zhang, Kui Wu, Andrew Spielberg, Robert K. Katzschmann, Wojciech Matusik
The computational design of soft underwater swimmers is challenging because of the high degrees of freedom in soft-body modeling.
2 code implementations • 12 Feb 2021 • Pingchuan Ma, Stavros Petridis, Maja Pantic
In this work, we present a hybrid CTC/Attention model based on a ResNet-18 and Convolution-augmented transformer (Conformer), that can be trained in an end-to-end manner.
Ranked #3 on
Audio-Visual Speech Recognition
on LRS2
Audio-Visual Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • 15 Jan 2021 • Tao Du, Kui Wu, Pingchuan Ma, Sebastien Wah, Andrew Spielberg, Daniela Rus, Wojciech Matusik
Inspired by Projective Dynamics (PD), we present Differentiable Projective Dynamics (DiffPD), an efficient differentiable soft-body simulator based on PD with implicit time integration.
no code implementations • 21 Dec 2020 • Pingchuan Ma, Shuai Wang
Envisioning the general difficulty for text-to-SQL models to preserve prediction consistency against linguistic and schema variations, we propose MT-Teql, a Metamorphic Testing-based framework for systematically evaluating and augmenting the consistency of TExt-to-SQL models.
1 code implementation • 29 Sep 2020 • Pingchuan Ma, Yujiang Wang, Jie Shen, Stavros Petridis, Maja Pantic
In this work, we present the Densely Connected Temporal Convolutional Network (DC-TCN) for lip-reading of isolated words.
1 code implementation • 13 Jul 2020 • Pingchuan Ma, Brais Martinez, Stavros Petridis, Maja Pantic
However, our most promising lightweight models are on par with the current state-of-the-art while showing a reduction of 8. 2x and 3. 9x in terms of computational cost and number of parameters, respectively, which we hope will enable the deployment of lipreading models in practical applications.
Ranked #4 on
Lipreading
on Lip Reading in the Wild
1 code implementation • ICML 2020 • Pingchuan Ma, Tao Du, Wojciech Matusik
We present a novel, efficient method that generates locally continuous Pareto sets and Pareto fronts, which opens up the possibility of continuous analysis of Pareto optimal solutions in machine learning problems.
1 code implementation • CVPR 2019 • Dmytro Kotovenko, Artsiom Sanakoyeu, Pingchuan Ma, Sabine Lang, Björn Ommer
Recent work has significantly improved the representation of color and texture and computational speed and image resolution.
2 code implementations • 23 Jan 2020 • Brais Martinez, Pingchuan Ma, Stavros Petridis, Maja Pantic
We present results on the largest publicly-available datasets for isolated word recognition in English and Mandarin, LRW and LRW1000, respectively.
Ranked #7 on
Lipreading
on CAS-VSR-W1k (LRW-1000)
no code implementations • 13 Jan 2020 • Abhinav Shukla, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Maja Pantic
Self supervised representation learning has recently attracted a lot of research interest for both the audio and visual modalities.
Ranked #6 on
Speech Emotion Recognition
on CREMA-D
no code implementations • 18 Dec 2019 • Pingchuan Ma, Stavros Petridis, Maja Pantic
In this work, we propose an efficient and straightforward detection method based on the temporal correlation between audio and video streams.
no code implementations • 26 Nov 2019 • Pingchuan Ma, Yao Zhou, Yu Lu, Wei zhang
To this end, we propose the video shuffle, a parameter-free plug-in component that efficiently reallocates the inputs of 2D convolution so that its receptive field can be extended to the temporal dimension.
no code implementations • 14 Nov 2019 • Shiyang Cheng, Pingchuan Ma, Georgios Tzimiropoulos, Stavros Petridis, Adrian Bulat, Jie Shen, Maja Pantic
The proposed model significantly outperforms previous approaches on non-frontal views while retaining the superior performance on frontal and near frontal mouth views.
no code implementations • 14 Jun 2019 • Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Maja Pantic
Speech is a means of communication which relies on both audio and visual information.
no code implementations • 5 Jun 2019 • Pingchuan Ma, Stavros Petridis, Maja Pantic
Several audio-visual speech recognition models have been recently proposed which aim to improve the robustness over audio-only models in the presence of noise.
no code implementations • 2 Apr 2019 • Stavros Petridis, Yujiang Wang, Pingchuan Ma, Zuwei Li, Maja Pantic
In this work, we present an end-to-end visual speech recognition system based on fully-connected layers and Long-Short Memory (LSTM) networks which is suitable for small-scale datasets.
1 code implementation • 7 Feb 2019 • Łukasz Kidziński, Carmichael Ong, Sharada Prasanna Mohanty, Jennifer Hicks, Sean F. Carroll, Bo Zhou, Hongsheng Zeng, Fan Wang, Rongzhong Lian, Hao Tian, Wojciech Jaśkowski, Garrett Andersen, Odd Rune Lykkebø, Nihat Engin Toklu, Pranav Shyam, Rupesh Kumar Srivastava, Sergey Kolesnikov, Oleksii Hrinchuk, Anton Pechenko, Mattias Ljungström, Zhen Wang, Xu Hu, Zehong Hu, Minghui Qiu, Jun Huang, Aleksei Shpilman, Ivan Sosin, Oleg Svidchenko, Aleksandra Malysheva, Daniel Kudenko, Lance Rane, Aditya Bhatt, Zhengfei Wang, Penghui Qi, Zeyang Yu, Peng Peng, Quan Yuan, Wenxin Li, Yunsheng Tian, Ruihan Yang, Pingchuan Ma, Shauharda Khadka, Somdeb Majumdar, Zach Dwiel, Yinyin Liu, Evren Tumer, Jeremy Watson, Marcel Salathé, Sergey Levine, Scott Delp
In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector.
no code implementations • 28 Sep 2018 • Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Georgios Tzimiropoulos, Maja Pantic
Therefore, we could use a CTC loss in combination with an attention-based model in order to force monotonic alignments and at the same time get rid of the conditional independence assumption.
Ranked #5 on
Audio-Visual Speech Recognition
on LRS2
Audio-Visual Speech Recognition
Automatic Speech Recognition (ASR)
+3
2 code implementations • 18 Feb 2018 • Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Feipeng Cai, Georgios Tzimiropoulos, Maja Pantic
In presence of high levels of noise, the end-to-end audiovisual model significantly outperforms both audio-only models.
Ranked #17 on
Lipreading
on Lip Reading in the Wild