Search Results for author: Pingchuan Ma

Found 24 papers, 8 papers with code

Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

1 code implementation ICML 2020 Jie Xu, Yunsheng Tian, Pingchuan Ma, Daniela Rus, Shinjiro Sueda, Wojciech Matusik

Many real-world control problems involve conflicting objectives where we desire a dense and high-quality set of control policies that are optimal for different objective preferences (called Pareto-optimal).

Learning Material Parameters and Hydrodynamics of Soft Robotic Fish via Differentiable Simulation

no code implementations30 Sep 2021 John Z. Zhang, Yu Zhang, Pingchuan Ma, Elvis Nava, Tao Du, Philip Arm, Wojciech Matusik, Robert K. Katzschmann

We address this gap with our differentiable simulation tool by learning the material parameters and hydrodynamics of our robots.

Improving Deep Metric Learning by Divide and Conquer

no code implementations9 Sep 2021 Artsiom Sanakoyeu, Pingchuan Ma, Vadim Tschernezki, Björn Ommer

We propose to build a more expressive representation by jointly splitting the embedding space and the data hierarchically into smaller sub-parts.

Image Retrieval Metric Learning

LiRA: Learning Visual Speech Representations from Audio through Self-supervision

no code implementations16 Jun 2021 Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Björn W. Schuller, Maja Pantic

The large amount of audiovisual content being shared online today has drawn substantial attention to the prospect of audiovisual self-supervised learning.

Lip Reading Self-Supervised Learning

End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks

no code implementations27 Apr 2021 Rodrigo Mira, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Björn W. Schuller, Maja Pantic

In this work, we propose a new end-to-end video-to-speech model based on Generative Adversarial Networks (GANs) which translates spoken video to waveform end-to-end without using any intermediate representation or separate waveform synthesis algorithm.

Lip Reading Speech Synthesis

DiffAqua: A Differentiable Computational Design Pipeline for Soft Underwater Swimmers with Shape Interpolation

no code implementations2 Apr 2021 Pingchuan Ma, Tao Du, John Z. Zhang, Kui Wu, Andrew Spielberg, Robert K. Katzschmann, Wojciech Matusik

The computational design of soft underwater swimmers is challenging because of the high degrees of freedom in soft-body modeling.

End-to-end Audio-visual Speech Recognition with Conformers

no code implementations12 Feb 2021 Pingchuan Ma, Stavros Petridis, Maja Pantic

In this work, we present a hybrid CTC/Attention model based on a ResNet-18 and Convolution-augmented transformer (Conformer), that can be trained in an end-to-end manner.

 Ranked #1 on Lipreading on LRS2

Audio-Visual Speech Recognition Language Modelling +3

DiffPD: Differentiable Projective Dynamics

1 code implementation15 Jan 2021 Tao Du, Kui Wu, Pingchuan Ma, Sebastien Wah, Andrew Spielberg, Daniela Rus, Wojciech Matusik

Inspired by Projective Dynamics (PD), we present Differentiable Projective Dynamics (DiffPD), an efficient differentiable soft-body simulator based on PD with implicit time integration.

MT-Teql: Evaluating and Augmenting Consistency of Text-to-SQL Models with Metamorphic Testing

no code implementations21 Dec 2020 Pingchuan Ma, Shuai Wang

Envisioning the general difficulty for text-to-SQL models to preserve prediction consistency against linguistic and schema variations, we propose MT-Teql, a Metamorphic Testing-based framework for systematically evaluating and augmenting the consistency of TExt-to-SQL models.


Lip-reading with Densely Connected Temporal Convolutional Networks

no code implementations29 Sep 2020 Pingchuan Ma, Yujiang Wang, Jie Shen, Stavros Petridis, Maja Pantic

In this work, we present the Densely Connected Temporal Convolutional Network (DC-TCN) for lip-reading of isolated words.

Lip Reading

Towards Practical Lipreading with Distilled and Efficient Models

1 code implementation13 Jul 2020 Pingchuan Ma, Brais Martinez, Stavros Petridis, Maja Pantic

However, our most promising lightweight models are on par with the current state-of-the-art while showing a reduction of 8. 2x and 3. 9x in terms of computational cost and number of parameters, respectively, which we hope will enable the deployment of lipreading models in practical applications.

Knowledge Distillation Lipreading

Efficient Continuous Pareto Exploration in Multi-Task Learning

1 code implementation ICML 2020 Pingchuan Ma, Tao Du, Wojciech Matusik

We present a novel, efficient method that generates locally continuous Pareto sets and Pareto fronts, which opens up the possibility of continuous analysis of Pareto optimal solutions in machine learning problems.

Multiobjective Optimization Multi-Task Learning

A Content Transformation Block For Image Style Transfer

1 code implementation CVPR 2019 Dmytro Kotovenko, Artsiom Sanakoyeu, Pingchuan Ma, Sabine Lang, Björn Ommer

Recent work has significantly improved the representation of color and texture and computational speed and image resolution.

Image Generation Style Transfer

Lipreading using Temporal Convolutional Networks

2 code implementations23 Jan 2020 Brais Martinez, Pingchuan Ma, Stavros Petridis, Maja Pantic

We present results on the largest publicly-available datasets for isolated word recognition in English and Mandarin, LRW and LRW1000, respectively.

Lipreading Lip Reading

Visually Guided Self Supervised Learning of Speech Representations

no code implementations13 Jan 2020 Abhinav Shukla, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Maja Pantic

Self supervised representation learning has recently attracted a lot of research interest for both the audio and visual modalities.

Emotion Recognition Representation Learning +2

Detecting Adversarial Attacks On Audiovisual Speech Recognition

no code implementations18 Dec 2019 Pingchuan Ma, Stavros Petridis, Maja Pantic

In this work, we propose an efficient and straightforward detection method based on the temporal correlation between audio and video streams.

Audio-Visual Speech Recognition Visual Speech Recognition

Learning Efficient Video Representation with Video Shuffle Networks

no code implementations26 Nov 2019 Pingchuan Ma, Yao Zhou, Yu Lu, Wei zhang

To this end, we propose the video shuffle, a parameter-free plug-in component that efficiently reallocates the inputs of 2D convolution so that its receptive field can be extended to the temporal dimension.

Video Recognition

Towards Pose-invariant Lip-Reading

no code implementations14 Nov 2019 Shiyang Cheng, Pingchuan Ma, Georgios Tzimiropoulos, Stavros Petridis, Adrian Bulat, Jie Shen, Maja Pantic

The proposed model significantly outperforms previous approaches on non-frontal views while retaining the superior performance on frontal and near frontal mouth views.

Lip Reading

Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition

no code implementations5 Jun 2019 Pingchuan Ma, Stavros Petridis, Maja Pantic

Several audio-visual speech recognition models have been recently proposed which aim to improve the robustness over audio-only models in the presence of noise.

Audio-Visual Speech Recognition Visual Speech Recognition

End-to-End Visual Speech Recognition for Small-Scale Datasets

no code implementations2 Apr 2019 Stavros Petridis, Yujiang Wang, Pingchuan Ma, Zuwei Li, Maja Pantic

In this work, we present an end-to-end visual speech recognition system based on fully-connected layers and Long-Short Memory (LSTM) networks which is suitable for small-scale datasets.

General Classification Visual Speech Recognition

Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture

no code implementations28 Sep 2018 Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Georgios Tzimiropoulos, Maja Pantic

Therefore, we could use a CTC loss in combination with an attention-based model in order to force monotonic alignments and at the same time get rid of the conditional independence assumption.

Audio-Visual Speech Recognition Lipreading +1

End-to-end Audiovisual Speech Recognition

2 code implementations18 Feb 2018 Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Feipeng Cai, Georgios Tzimiropoulos, Maja Pantic

In presence of high levels of noise, the end-to-end audiovisual model significantly outperforms both audio-only models.

Lipreading Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.