Search Results for author: Honglie Chen

Found 7 papers, 3 papers with code

SparseVSR: Lightweight and Noise Robust Visual Speech Recognition

no code implementations10 Jul 2023 Adriana Fernandez-Lopez, Honglie Chen, Pingchuan Ma, Alexandros Haliassos, Stavros Petridis, Maja Pantic

We evaluate our 50% sparse model on 7 different visual noise types and achieve an overall absolute improvement of more than 2% WER compared to the dense equivalent.

speech-recognition Visual Speech Recognition

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

no code implementations CVPR 2023 Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jáchym Kolář, Stavros Petridis, Maja Pantic, Christian Fuegen

Furthermore, when combined with large-scale pseudo-labeled audio-visual data SynthVSR yields a new state-of-the-art VSR WER of 16. 9% using publicly available data only, surpassing the recent state-of-the-art approaches trained with 29 times more non-public machine-transcribed video data (90, 000 hours).

Lip Reading speech-recognition +1

Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels

1 code implementation25 Mar 2023 Pingchuan Ma, Alexandros Haliassos, Adriana Fernandez-Lopez, Honglie Chen, Stavros Petridis, Maja Pantic

Recently, the performance of automatic, visual, and audio-visual speech recognition (ASR, VSR, and AV-ASR, respectively) has been substantially improved, mainly due to the use of larger models and training sets.

Audio-Visual Speech Recognition Automatic Speech Recognition +4

Audio-Visual Synchronisation in the wild

no code implementations8 Dec 2021 Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman

Finally, we set the first benchmark for general audio-visual synchronisation with over 160 diverse classes in the new VGG-Sound Sync video dataset.

Lip Reading

VGGSound: A Large-scale Audio-Visual Dataset

2 code implementations29 Apr 2020 Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew Zisserman

Our goal is to collect a large-scale audio-visual dataset with low label noise from videos in the wild using computer vision techniques.

Image Classification

AutoCorrect: Deep Inductive Alignment of Noisy Geometric Annotations

no code implementations14 Aug 2019 Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew Zisserman

We propose AutoCorrect, a method to automatically learn object-annotation alignments from a dataset with annotations affected by geometric noise.

Object

Cannot find the paper you are looking for? You can Submit a new open access paper.