Search Results for author: Pingchuan Ma

Found 59 papers, 21 papers with code

Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

2 code implementations • ICML 2020 • Jie Xu, Yunsheng Tian, Pingchuan Ma, Daniela Rus, Shinjiro Sueda, Wojciech Matusik

Many real-world control problems involve conflicting objectives where we desire a dense and high-quality set of control policies that are optimal for different objective preferences (called Pareto-optimal).

Multi-Objective Reinforcement Learning reinforcement-learning

220

Paper
Code

ZigMa: A DiT-style Zigzag Mamba Diffusion Model

1 code implementation • 20 Mar 2024 • Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Fischer, Björn Ommer

The diffusion model has long been plagued by scalability and quadratic complexity issues, especially within transformer-based structures.

116

Paper
Code

DepthFM: Fast Monocular Depth Estimation with Flow Matching

no code implementations • 20 Mar 2024 • Ming Gui, Johannes S. Fischer, Ulrich Prestel, Pingchuan Ma, Dmytro Kotovenko, Olga Grebenkova, Stefan Andreas Baumann, Vincent Tao Hu, Björn Ommer

Due to the generative nature of our approach, our model reliably predicts the confidence of its depth estimates.

Monocular Depth Estimation

Paper
Add Code

Eliminating Information Leakage in Hard Concept Bottleneck Models with Supervised, Hierarchical Concept Learning

no code implementations • 3 Feb 2024 • Ao Sun, Yuanyuan Yuan, Pingchuan Ma, Shuai Wang

This paper alleviates the information leakage issue by introducing label supervision in concept predication and constructing a hierarchical concept set.

Paper
Add Code

An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial Scenarios

no code implementations • 27 Jan 2024 • Zongjie Li, Wenying Qiu, Pingchuan Ma, Yichen Li, You Li, Sijia He, Baozheng Jiang, Shuai Wang, Weixi Gu

In this paper, we present a comprehensive empirical study on the accuracy and robustness of LLMs in the context of the Chinese industrial production area.

Paper
Add Code

Motion Flow Matching for Human Motion Synthesis and Editing

no code implementations • 14 Dec 2023 • Vincent Tao Hu, Wenzhe Yin, Pingchuan Ma, Yunlu Chen, Basura Fernando, Yuki M Asano, Efstratios Gavves, Pascal Mettes, Bjorn Ommer, Cees G. M. Snoek

In this paper, we propose \emph{Motion Flow Matching}, a novel generative model designed for human motion generation featuring efficient sampling and effectiveness in motion editing applications.

Motion Interpolation motion prediction +1

Paper
Add Code

Boosting Latent Diffusion with Flow Matching

2 code implementations • 12 Dec 2023 • Johannes S. Fischer, Ming Gui, Pingchuan Ma, Nick Stracke, Stefan A. Baumann, Björn Ommer

We demonstrate that introducing FM between the Diffusion model and the convolutional decoder offers high-resolution image synthesis with reduced computational cost and model size.

Image Generation

Paper
Code

VRPTEST: Evaluating Visual Referring Prompting in Large Multimodal Models

no code implementations • 7 Dec 2023 • Zongjie Li, Chaozheng Wang, Chaowei Liu, Pingchuan Ma, Daoyuan Wu, Shuai Wang, Cuiyun Gao

With recent advancements in Large Multimodal Models (LMMs) across various domains, a novel prompting method called visual referring prompting has emerged, showing significant potential in enhancing human-computer interaction within multimodal systems.

Paper
Add Code

InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models

no code implementations • 4 Dec 2023 • Xunguang Wang, Zhenlan Ji, Pingchuan Ma, Zongjie Li, Shuai Wang

Initially, we utilize a public text-to-image generative model to "reverse" the target response into a target image, and employ GPT-4 to infer a reasonable instruction $\boldsymbol{p}^\prime$ from the target response.

Adversarial Attack Language Modelling +2

Paper
Add Code

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

1 code implementation • 27 Oct 2023 • Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis

TorchAudio is an open-source audio and speech processing library built for PyTorch.

Self-Supervised Learning Speech Enhancement +2

2,379

Paper
Code

Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach

1 code implementation • 10 Oct 2023 • Zhenlan Ji, Pingchuan Ma, Zongjie Li, Shuai Wang

We illustrate the insights that our framework can provide by studying over 3 popular LLMs with over 12 prompt adjustment strategies.

Benchmarking Code Generation +2

Paper
Code

Split and Merge: Aligning Position Biases in Large Language Model based Evaluators

no code implementations • 29 Sep 2023 • Zongjie Li, Chaozheng Wang, Pingchuan Ma, Daoyuan Wu, Shuai Wang, Cuiyun Gao, Yang Liu

Specifically, PORTIA splits the answers into multiple segments, aligns similar content across candidate answers, and then merges them back into a single prompt for evaluation by LLMs.

Language Modelling Large Language Model +1

Paper
Add Code

ASAP: Automated Sequence Planning for Complex Robotic Assembly with Physical Feasibility

no code implementations • 29 Sep 2023 • Yunsheng Tian, Karl D. D. Willis, Bassel Al Omari, Jieliang Luo, Pingchuan Ma, Yichen Li, Farhad Javid, Edward Gu, Joshua Jacob, Shinjiro Sueda, Hui Li, Sachin Chitta, Wojciech Matusik

The automated assembly of complex products requires a system that can automatically plan a physically feasible sequence of actions for assembling many parts together.

Paper
Add Code

Enabling Runtime Verification of Causal Discovery Algorithms with Automated Conditional Independence Reasoning (Extended Version)

no code implementations • 11 Sep 2023 • Pingchuan Ma, Zhenlan Ji, Peisen Yao, Shuai Wang, Kui Ren

Based on the decision procedure to CIR, CICheck includes two variants: ED-CICheck and ED-CICheck, which detect erroneous CI tests (to enhance reliability) and prune excessive CI tests (to enhance privacy), respectively.

Causal Discovery

Paper
Add Code

How Can Large Language Models Help Humans in Design and Manufacturing?

no code implementations • 25 Jul 2023 • Liane Makatura, Michael Foshey, Bohan Wang, Felix HähnLein, Pingchuan Ma, Bolei Deng, Megan Tjandrasuwita, Andrew Spielberg, Crystal Elaine Owens, Peter Yichen Chen, Allan Zhao, Amy Zhu, Wil J Norton, Edward Gu, Joshua Jacob, Yifei Li, Adriana Schulz, Wojciech Matusik

The advancement of Large Language Models (LLMs), including GPT-4, provides exciting new opportunities for generative design.

Paper
Add Code

SparseVSR: Lightweight and Noise Robust Visual Speech Recognition

no code implementations • 10 Jul 2023 • Adriana Fernandez-Lopez, Honglie Chen, Pingchuan Ma, Alexandros Haliassos, Stavros Petridis, Maja Pantic

We evaluate our 50% sparse model on 7 different visual noise types and achieve an overall absolute improvement of more than 2% WER compared to the dense equivalent.

speech-recognition Visual Speech Recognition

Paper
Add Code

Towards Practical Federated Causal Structure Learning

1 code implementation • 15 Jun 2023 • Zhaoyu Wang, Pingchuan Ma, Shuai Wang

Federated learning can solve this problem, but existing solutions for federated causal structure learning make unrealistic assumptions about data and lack convergence guarantees.

Federated Learning

Paper
Code

Causality-Aided Trade-off Analysis for Machine Learning Fairness

no code implementations • 22 May 2023 • Zhenlan Ji, Pingchuan Ma, Shuai Wang, Yanhui Li

This paper uses causality analysis as a principled method for analyzing trade-offs between fairness parameters and other crucial metrics in ML pipelines.

Causal Discovery Causal Inference +1

Paper
Add Code

Explain Any Concept: Segment Anything Meets Concept-Based Explanation

1 code implementation • NeurIPS 2023 • Ao Sun, Pingchuan Ma, Yuanyuan Yuan, Shuai Wang

For computer vision tasks, mainstream pixel-based XAI methods explain DNN decisions by identifying important pixels, and emerging concept-based XAI explore forming explanations with concepts (e. g., a head in an image).

Instance Segmentation Semantic Segmentation

Paper
Code

Medical records condensation: a roadmap towards healthcare data democratisation

no code implementations • 5 May 2023 • Yujiang Wang, Anshul Thakur, Mingzhi Dong, Pingchuan Ma, Stavros Petridis, Li Shang, Tingting Zhu, David A. Clifton

The prevalence of artificial intelligence (AI) has envisioned an era of healthcare democratisation that promises every stakeholder a new and better way of life.

Clinical Knowledge Dataset Condensation +2

Paper
Add Code

"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process

1 code implementation • 4 May 2023 • Pingchuan Ma, Zongjie Li, Ao Sun, Shuai Wang

Moreover, we propose a novel on-the-fly (OTF) repairing scheme that repairs unethical suggestions made by LLMs in real-time.

Moral Scenarios

Paper
Code

Learning Neural Constitutive Laws From Motion Observations for Generalizable PDE Dynamics

no code implementations • 27 Apr 2023 • Pingchuan Ma, Peter Yichen Chen, Bolei Deng, Joshua B. Tenenbaum, Tao Du, Chuang Gan, Wojciech Matusik

Many NN approaches learn an end-to-end model that implicitly models both the governing PDE and constitutive models (or material models).

Out-of-Distribution Generalization

Paper
Add Code

Demonstration of InsightPilot: An LLM-Empowered Automated Data Exploration System

no code implementations • 2 Apr 2023 • Pingchuan Ma, Rui Ding, Shuai Wang, Shi Han, Dongmei Zhang

In brief, an IQuery is an abstraction and automation of data analysis operations, which mimics the approach of data analysts and simplifies the exploration process for users.

Language Modelling Large Language Model

Paper
Add Code

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

no code implementations • CVPR 2023 • Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jáchym Kolář, Stavros Petridis, Maja Pantic, Christian Fuegen

Furthermore, when combined with large-scale pseudo-labeled audio-visual data SynthVSR yields a new state-of-the-art VSR WER of 16. 9% using publicly available data only, surpassing the recent state-of-the-art approaches trained with 29 times more non-public machine-transcribed video data (90, 000 hours).

Lip Reading speech-recognition +1

Paper
Add Code

Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels

1 code implementation • 25 Mar 2023 • Pingchuan Ma, Alexandros Haliassos, Adriana Fernandez-Lopez, Honglie Chen, Stavros Petridis, Maja Pantic

Recently, the performance of automatic, visual, and audio-visual speech recognition (ASR, VSR, and AV-ASR, respectively) has been substantially improved, mainly due to the use of larger models and training sets.

Ranked #1 on Automatic Speech Recognition (ASR) on LRS3-TED

Audio-Visual Speech Recognition Automatic Speech Recognition +4

124

Paper
Code

SoftZoo: A Soft Robot Co-design Benchmark For Locomotion In Diverse Environments

no code implementations • 16 Mar 2023 • Tsun-Hsuan Wang, Pingchuan Ma, Andrew Everett Spielberg, Zhou Xian, Hao Zhang, Joshua B. Tenenbaum, Daniela Rus, Chuang Gan

Existing work has typically been tailored for particular environments or representations.

Paper
Add Code

Learning Cross-lingual Visual Speech Representations

no code implementations • 14 Mar 2023 • Andreas Zinonos, Alexandros Haliassos, Pingchuan Ma, Stavros Petridis, Maja Pantic

Cross-lingual self-supervised learning has been a growing research topic in the last few years.

Representation Learning Self-Supervised Learning

Paper
Add Code

Cross-Image-Attention for Conditional Embeddings in Deep Metric Learning

no code implementations • CVPR 2023 • Dmytro Kotovenko, Pingchuan Ma, Timo Milbich, Björn Ommer

Experiments on established DML benchmarks show that our cross-attention conditional embedding during training improves the underlying standard DML pipeline significantly so that it outperforms the state-of-the-art.

Metric Learning

Paper
Add Code

Jointly Learning Visual and Auditory Speech Representations from Raw Data

1 code implementation • 12 Dec 2022 • Alexandros Haliassos, Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Maja Pantic

We observe strong results in low- and high-resource labelled data settings when fine-tuning the visual and auditory encoders resulting from a single pre-training stage, in which the encoders are jointly trained.

Ranked #1 on Speech Recognition on LRS2 (using extra training data)

Audio-Visual Speech Recognition Lipreading +2

Paper
Code

Streaming Audio-Visual Speech Recognition with Alignment Regularization

no code implementations • 3 Nov 2022 • Pingchuan Ma, Niko Moritz, Stavros Petridis, Christian Fuegen, Maja Pantic

In this work, we propose a streaming AV-ASR system based on a hybrid connectionist temporal classification (CTC)/attention neural network architecture.

Audio-Visual Speech Recognition Automatic Speech Recognition +5

Paper
Add Code

Training Strategies for Improved Lip-reading

1 code implementation • 3 Sep 2022 • Pingchuan Ma, Yujiang Wang, Stavros Petridis, Jie Shen, Maja Pantic

In this paper, we systematically investigate the performance of state-of-the-art data augmentation approaches, temporal models and other training strategies, like self-distillation and using word boundary indicators.

Ranked #1 on Lipreading on Lip Reading in the Wild (using extra training data)

Data Augmentation Lipreading +1

362

Paper
Code

XInsight: eXplainable Data Analysis Through The Lens of Causality

no code implementations • 26 Jul 2022 • Pingchuan Ma, Rui Ding, Shuai Wang, Shi Han, Dongmei Zhang

XInsight is a three-module, end-to-end pipeline designed to extract causal graphs, translate causal primitives into XDA semantics, and quantify the quantitative contribution of each explanation to a data fact.

Decision Making

Paper
Add Code

RISP: Rendering-Invariant State Predictor with Differentiable Simulation and Rendering for Cross-Domain Parameter Estimation

no code implementations • ICLR 2022 • Pingchuan Ma, Tao Du, Joshua B. Tenenbaum, Wojciech Matusik, Chuang Gan

To train this predictor, we formulate a new loss on rendering variances using gradients from differentiable rendering.

Imitation Learning

Paper
Add Code

Fast Aquatic Swimmer Optimization with Differentiable Projective Dynamics and Neural Network Hydrodynamic Models

no code implementations • 30 Mar 2022 • Elvis Nava, John Z. Zhang, Mike Y. Michelis, Tao Du, Pingchuan Ma, Benjamin F. Grewe, Wojciech Matusik, Robert K. Katzschmann

For the deformable solid simulation of the swimmer's body, we use state-of-the-art techniques from the field of computer graphics to speed up the finite-element method (FEM).

Computational Efficiency

Paper
Add Code

Self-supervised Video-centralised Transformer for Video Face Clustering

no code implementations • 24 Mar 2022 • Yujiang Wang, Mingzhi Dong, Jie Shen, Yiming Luo, Yiming Lin, Pingchuan Ma, Stavros Petridis, Maja Pantic

We also investigate face clustering in egocentric videos, a fast-emerging field that has not been studied yet in works related to face clustering.

Ranked #1 on Face Clustering on EasyCom

Clustering Contrastive Learning +1

Paper
Add Code

Visual Speech Recognition for Multiple Languages in the Wild

2 code implementations • 26 Feb 2022 • Pingchuan Ma, Stavros Petridis, Maja Pantic

However, these advances are usually due to the larger training sets rather than the model design.

Ranked #1 on Lipreading on GRID corpus (mixed-speech) (using extra training data)

Hyperparameter Optimization Lipreading +2

287

Paper
Code

Sim2Real for Soft Robotic Fish via Differentiable Simulation

no code implementations • 30 Sep 2021 • John Z. Zhang, Yu Zhang, Pingchuan Ma, Elvis Nava, Tao Du, Philip Arm, Wojciech Matusik, Robert K. Katzschmann

Accurate simulation of soft mechanisms under dynamic actuation is critical for the design of soft robots.

MORPH

Paper
Add Code

Improving Deep Metric Learning by Divide and Conquer

1 code implementation • 9 Sep 2021 • Artsiom Sanakoyeu, Pingchuan Ma, Vadim Tschernezki, Björn Ommer

We propose to build a more expressive representation by jointly splitting the embedding space and the data hierarchically into smaller sub-parts.

Image Retrieval Metric Learning +1

Paper
Code

LiRA: Learning Visual Speech Representations from Audio through Self-supervision

no code implementations • 16 Jun 2021 • Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Björn W. Schuller, Maja Pantic

The large amount of audiovisual content being shared online today has drawn substantial attention to the prospect of audiovisual self-supervised learning.

Lip Reading Self-Supervised Learning +1

Paper
Add Code

End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks

no code implementations • 27 Apr 2021 • Rodrigo Mira, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Björn W. Schuller, Maja Pantic

In this work, we propose a new end-to-end video-to-speech model based on Generative Adversarial Networks (GANs) which translates spoken video to waveform end-to-end without using any intermediate representation or separate waveform synthesis algorithm.

Lip Reading Speech Synthesis

Paper
Add Code

DiffAqua: A Differentiable Computational Design Pipeline for Soft Underwater Swimmers with Shape Interpolation

no code implementations • 2 Apr 2021 • Pingchuan Ma, Tao Du, John Z. Zhang, Kui Wu, Andrew Spielberg, Robert K. Katzschmann, Wojciech Matusik

The computational design of soft underwater swimmers is challenging because of the high degrees of freedom in soft-body modeling.

Paper
Add Code

End-to-end Audio-visual Speech Recognition with Conformers

3 code implementations • 12 Feb 2021 • Pingchuan Ma, Stavros Petridis, Maja Pantic

In this work, we present a hybrid CTC/Attention model based on a ResNet-18 and Convolution-augmented transformer (Conformer), that can be trained in an end-to-end manner.

Ranked #3 on Audio-Visual Speech Recognition on LRS2

Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +6

15,397

Paper
Code

DiffPD: Differentiable Projective Dynamics

no code implementations • 15 Jan 2021 • Tao Du, Kui Wu, Pingchuan Ma, Sebastien Wah, Andrew Spielberg, Daniela Rus, Wojciech Matusik

Inspired by Projective Dynamics (PD), we present Differentiable Projective Dynamics (DiffPD), an efficient differentiable soft-body simulator based on PD with implicit time integration.

Friction

Paper
Add Code

MT-Teql: Evaluating and Augmenting Consistency of Text-to-SQL Models with Metamorphic Testing

no code implementations • 21 Dec 2020 • Pingchuan Ma, Shuai Wang

Envisioning the general difficulty for text-to-SQL models to preserve prediction consistency against linguistic and schema variations, we propose MT-Teql, a Metamorphic Testing-based framework for systematically evaluating and augmenting the consistency of TExt-to-SQL models.

Text-To-SQL

Paper
Add Code

Lip-reading with Densely Connected Temporal Convolutional Networks

1 code implementation • 29 Sep 2020 • Pingchuan Ma, Yujiang Wang, Jie Shen, Stavros Petridis, Maja Pantic

In this work, we present the Densely Connected Temporal Convolutional Network (DC-TCN) for lip-reading of isolated words.

Lip Reading

362

Paper
Code

Towards Practical Lipreading with Distilled and Efficient Models

1 code implementation • 13 Jul 2020 • Pingchuan Ma, Brais Martinez, Stavros Petridis, Maja Pantic

However, our most promising lightweight models are on par with the current state-of-the-art while showing a reduction of 8. 2x and 3. 9x in terms of computational cost and number of parameters, respectively, which we hope will enable the deployment of lipreading models in practical applications.

Ranked #4 on Lipreading on Lip Reading in the Wild

Knowledge Distillation Lipreading

362

Paper
Code

Efficient Continuous Pareto Exploration in Multi-Task Learning

1 code implementation • ICML 2020 • Pingchuan Ma, Tao Du, Wojciech Matusik

We present a novel, efficient method that generates locally continuous Pareto sets and Pareto fronts, which opens up the possibility of continuous analysis of Pareto optimal solutions in machine learning problems.

BIG-bench Machine Learning Multiobjective Optimization +1

131

Paper
Code

A Content Transformation Block For Image Style Transfer

1 code implementation • CVPR 2019 • Dmytro Kotovenko, Artsiom Sanakoyeu, Pingchuan Ma, Sabine Lang, Björn Ommer

Recent work has significantly improved the representation of color and texture and computational speed and image resolution.

Image Generation Style Transfer

Paper
Code

Lipreading using Temporal Convolutional Networks

2 code implementations • 23 Jan 2020 • Brais Martinez, Pingchuan Ma, Stavros Petridis, Maja Pantic

We present results on the largest publicly-available datasets for isolated word recognition in English and Mandarin, LRW and LRW1000, respectively.

Ranked #7 on Lipreading on CAS-VSR-W1k (LRW-1000)

Lipreading Lip Reading

362

Paper
Code

Visually Guided Self Supervised Learning of Speech Representations

no code implementations • 13 Jan 2020 • Abhinav Shukla, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Maja Pantic

Self supervised representation learning has recently attracted a lot of research interest for both the audio and visual modalities.

Ranked #8 on Speech Emotion Recognition on CREMA-D

Representation Learning Self-Supervised Learning +3

Paper
Add Code

Detecting Adversarial Attacks On Audiovisual Speech Recognition

no code implementations • 18 Dec 2019 • Pingchuan Ma, Stavros Petridis, Maja Pantic

In this work, we propose an efficient and straightforward detection method based on the temporal correlation between audio and video streams.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Add Code

Learning Efficient Video Representation with Video Shuffle Networks

no code implementations • 26 Nov 2019 • Pingchuan Ma, Yao Zhou, Yu Lu, Wei zhang

To this end, we propose the video shuffle, a parameter-free plug-in component that efficiently reallocates the inputs of 2D convolution so that its receptive field can be extended to the temporal dimension.

Video Recognition

Paper
Add Code

Towards Pose-invariant Lip-Reading

no code implementations • 14 Nov 2019 • Shiyang Cheng, Pingchuan Ma, Georgios Tzimiropoulos, Stavros Petridis, Adrian Bulat, Jie Shen, Maja Pantic

The proposed model significantly outperforms previous approaches on non-frontal views while retaining the superior performance on frontal and near frontal mouth views.

Lip Reading

Paper
Add Code

Video-Driven Speech Reconstruction using Generative Adversarial Networks

no code implementations • 14 Jun 2019 • Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Maja Pantic

Speech is a means of communication which relies on both audio and visual information.

Paper
Add Code

Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition

no code implementations • 5 Jun 2019 • Pingchuan Ma, Stavros Petridis, Maja Pantic

Several audio-visual speech recognition models have been recently proposed which aim to improve the robustness over audio-only models in the presence of noise.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Add Code

End-to-End Visual Speech Recognition for Small-Scale Datasets

no code implementations • 2 Apr 2019 • Stavros Petridis, Yujiang Wang, Pingchuan Ma, Zuwei Li, Maja Pantic

In this work, we present an end-to-end visual speech recognition system based on fully-connected layers and Long-Short Memory (LSTM) networks which is suitable for small-scale datasets.

General Classification speech-recognition +1

Paper
Add Code

Artificial Intelligence for Prosthetics - challenge solutions

1 code implementation • 7 Feb 2019 • Łukasz Kidziński, Carmichael Ong, Sharada Prasanna Mohanty, Jennifer Hicks, Sean F. Carroll, Bo Zhou, Hongsheng Zeng, Fan Wang, Rongzhong Lian, Hao Tian, Wojciech Jaśkowski, Garrett Andersen, Odd Rune Lykkebø, Nihat Engin Toklu, Pranav Shyam, Rupesh Kumar Srivastava, Sergey Kolesnikov, Oleksii Hrinchuk, Anton Pechenko, Mattias Ljungström, Zhen Wang, Xu Hu, Zehong Hu, Minghui Qiu, Jun Huang, Aleksei Shpilman, Ivan Sosin, Oleg Svidchenko, Aleksandra Malysheva, Daniel Kudenko, Lance Rane, Aditya Bhatt, Zhengfei Wang, Penghui Qi, Zeyang Yu, Peng Peng, Quan Yuan, Wenxin Li, Yunsheng Tian, Ruihan Yang, Pingchuan Ma, Shauharda Khadka, Somdeb Majumdar, Zach Dwiel, Yinyin Liu, Evren Tumer, Jeremy Watson, Marcel Salathé, Sergey Levine, Scott Delp

In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector.

Imitation Learning reinforcement-learning +1

Paper
Code

Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture

no code implementations • 28 Sep 2018 • Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Georgios Tzimiropoulos, Maja Pantic

Therefore, we could use a CTC loss in combination with an attention-based model in order to force monotonic alignments and at the same time get rid of the conditional independence assumption.

Ranked #5 on Audio-Visual Speech Recognition on LRS2

Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

End-to-end Audiovisual Speech Recognition

2 code implementations • 18 Feb 2018 • Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Feipeng Cai, Georgios Tzimiropoulos, Maja Pantic

In presence of high levels of noise, the end-to-end audiovisual model significantly outperforms both audio-only models.

Ranked #17 on Lipreading on Lip Reading in the Wild

Lipreading speech-recognition +1

173

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.