Search Results for author: Honglie Chen

Found 7 papers, 3 papers with code

SparseVSR: Lightweight and Noise Robust Visual Speech Recognition

no code implementations • 10 Jul 2023 • Adriana Fernandez-Lopez, Honglie Chen, Pingchuan Ma, Alexandros Haliassos, Stavros Petridis, Maja Pantic

We evaluate our 50% sparse model on 7 different visual noise types and achieve an overall absolute improvement of more than 2% WER compared to the dense equivalent.

speech-recognition Visual Speech Recognition

Paper
Add Code

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

no code implementations • CVPR 2023 • Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jáchym Kolář, Stavros Petridis, Maja Pantic, Christian Fuegen

Furthermore, when combined with large-scale pseudo-labeled audio-visual data SynthVSR yields a new state-of-the-art VSR WER of 16. 9% using publicly available data only, surpassing the recent state-of-the-art approaches trained with 29 times more non-public machine-transcribed video data (90, 000 hours).

Lip Reading speech-recognition +1

Paper
Add Code

Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels

1 code implementation • 25 Mar 2023 • Pingchuan Ma, Alexandros Haliassos, Adriana Fernandez-Lopez, Honglie Chen, Stavros Petridis, Maja Pantic

Recently, the performance of automatic, visual, and audio-visual speech recognition (ASR, VSR, and AV-ASR, respectively) has been substantially improved, mainly due to the use of larger models and training sets.

Ranked #1 on Automatic Speech Recognition (ASR) on LRS3-TED

Audio-Visual Speech Recognition Automatic Speech Recognition +4

124

Paper
Code

Audio-Visual Synchronisation in the wild

no code implementations • 8 Dec 2021 • Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman

Finally, we set the first benchmark for general audio-visual synchronisation with over 160 diverse classes in the new VGG-Sound Sync video dataset.

Lip Reading

Paper
Add Code

Localizing Visual Sounds the Hard Way

1 code implementation • CVPR 2021 • Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman

We show that our algorithm achieves state-of-the-art performance on the popular Flickr SoundNet dataset.

Contrastive Learning

Paper
Code

VGGSound: A Large-scale Audio-Visual Dataset

2 code implementations • 29 Apr 2020 • Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew Zisserman

Our goal is to collect a large-scale audio-visual dataset with low label noise from videos in the wild using computer vision techniques.

Image Classification

Paper
Code

AutoCorrect: Deep Inductive Alignment of Noisy Geometric Annotations

no code implementations • 14 Aug 2019 • Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew Zisserman

We propose AutoCorrect, a method to automatically learn object-annotation alignments from a dataset with annotations affected by geometric noise.

Object

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.