Search Results for author: Jeong Hun Yeo

Found 8 papers, 2 papers with code

Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

1 code implementation • 23 Feb 2024 • Jeong Hun Yeo, Seunghee Han, Minsu Kim, Yong Man Ro

In visual speech processing, context modeling capability is one of the most important requirements due to the ambiguous nature of lip movements.

Ranked #4 on Lipreading on LRS3-TED (using extra training data)

Lipreading Lip Reading +3

271

Paper
Code

Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units

no code implementations • 18 Jan 2024 • Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Se Jin Park, Yong Man Ro

By using the visual speech units as the inputs of our system, we pre-train the model to predict corresponding text outputs on massive multilingual data constructed by merging several VSR databases.

Sentence speech-recognition +1

Paper
Add Code

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

no code implementations • 15 Sep 2023 • Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro

To this end, we start with importing the rich knowledge related to image comprehension and language modeling from a large-scale pre-trained vision-language model into Im2Sp.

Image Comprehension Language Modelling +1

Paper
Add Code

Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper

no code implementations • 15 Sep 2023 • Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, Yong Man Ro

Different from previous methods that tried to improve the VSR performance for the target language by using knowledge learned from other languages, we explore whether we can increase the amount of training data itself for the different languages without human intervention.

Language Identification speech-recognition +1

Paper
Add Code

Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge

no code implementations • ICCV 2023 • Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Yong Man Ro

In order to mitigate the challenge, we try to learn general speech knowledge, the ability to model lip movements, from a high-resource language through the prediction of speech units.

Lip Reading

Paper
Add Code

AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model

no code implementations • 15 Aug 2023 • Jeong Hun Yeo, Minsu Kim, Jeongsoo Choi, Dae Hoe Kim, Yong Man Ro

Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip movements.

Quantization speech-recognition +1

Paper
Add Code

Multi-Temporal Lip-Audio Memory for Visual Speech Recognition

no code implementations • 8 May 2023 • Jeong Hun Yeo, Minsu Kim, Yong Man Ro

Visual Speech Recognition (VSR) is a task to predict a sentence or word from lip movements.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading

1 code implementation • The AAAI Conference on Artificial Intelligence (AAAI) 2022 • Minsu Kim, Jeong Hun Yeo, Yong Man Ro

With the multi-head key memories, MVM extracts possible candidate audio features from the memory, which allows the lip reading model to consider the possibility of which pronunciations can be represented from the input lip movement.

Ranked #2 on Lipreading on CAS-VSR-W1k (LRW-1000)

Lip Reading

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.