Search Results for author: Egor Lakomkin

Found 14 papers, 1 papers with code

AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs

no code implementations12 Nov 2023 Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Ke Li, Junteng Jia, Yuan Shangguan, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

In this work, we extend the instruction-tuned Llama-2 model with end-to-end general-purpose speech processing and reasoning abilities while maintaining the wide range of original LLM capabilities, without using any carefully curated paired data.

Question Answering

End-to-End Speech Recognition Contextualization with Large Language Models

no code implementations19 Sep 2023 Egor Lakomkin, Chunyang Wu, Yassir Fathullah, Ozlem Kalinli, Michael L. Seltzer, Christian Fuegen

Overall, we demonstrate that by only adding a handful number of trainable parameters via adapters, we can unlock contextualized speech recognition capability for the pretrained LLM while keeping the same text-only input functionality.

Language Modelling speech-recognition +1

Prompting Large Language Models with Speech Recognition Abilities

no code implementations21 Jul 2023 Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Ke Li, Jinxi Guo, Wenhan Xiong, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

Furthermore, we perform ablation studies to investigate whether the LLM can be completely frozen during training to maintain its original capabilities, scaling up the audio encoder, and increasing the audio encoder striding to generate fewer embeddings.

Abstractive Text Summarization Automatic Speech Recognition +3

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

no code implementations CVPR 2023 Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jáchym Kolář, Stavros Petridis, Maja Pantic, Christian Fuegen

Furthermore, when combined with large-scale pseudo-labeled audio-visual data SynthVSR yields a new state-of-the-art VSR WER of 16. 9% using publicly available data only, surpassing the recent state-of-the-art approaches trained with 29 times more non-public machine-transcribed video data (90, 000 hours).

Lip Reading speech-recognition +1

Egocentric Audio-Visual Noise Suppression

no code implementations7 Nov 2022 Roshan Sharma, Weipeng He, Ju Lin, Egor Lakomkin, Yang Liu, Kaustubh Kalgaonkar

In this paper, we first demonstrate that egocentric visual information is helpful for noise suppression.

Action Classification Event Detection +3

KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos

1 code implementation EMNLP 2018 Egor Lakomkin, Sven Magg, Cornelius Weber, Stefan Wermter

In this paper, we describe KT-Speech-Crawler: an approach for automatic dataset construction for speech recognition by crawling YouTube videos.

speech-recognition Speech Recognition

Incorporating End-to-End Speech Recognition Models for Sentiment Analysis

no code implementations28 Feb 2019 Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg, Stefan Wermter

We argue that using ground-truth transcriptions during training and evaluation phases leads to a significant discrepancy in performance compared to real-world conditions, as the spoken text has to be recognized on the fly and can contain speech recognition mistakes.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks

no code implementations6 Apr 2018 Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg, Stefan Wermter

Speech emotion recognition (SER) is an important aspect of effective human-robot collaboration and received a lot of attention from the research community.

Data Augmentation Speech Emotion Recognition

Reusing Neural Speech Representations for Auditory Emotion Recognition

no code implementations IJCNLP 2017 Egor Lakomkin, Cornelius Weber, Sven Magg, Stefan Wermter

Acoustic emotion recognition aims to categorize the affective state of the speaker and is still a difficult task for machine learning models.

Emotion Recognition General Classification +1

GradAscent at EmoInt-2017: Character- and Word-Level Recurrent Neural Network Models for Tweet Emotion Intensity Detection

no code implementations30 Mar 2018 Egor Lakomkin, Chandrakant Bothe, Stefan Wermter

Given the text of a tweet and its emotion category (anger, joy, fear, and sadness), the participants were asked to build a system that assigns emotion intensity values.

The OMG-Emotion Behavior Dataset

no code implementations14 Mar 2018 Pablo Barros, Nikhil Churamani, Egor Lakomkin, Henrique Siqueira, Alexander Sutherland, Stefan Wermter

This paper is the basis paper for the accepted IJCNN challenge One-Minute Gradual-Emotion Recognition (OMG-Emotion) by which we hope to foster long-emotion classification using neural models for the benefit of the IJCNN community.

Human-Computer Interaction

GradAscent at EmoInt-2017: Character and Word Level Recurrent Neural Network Models for Tweet Emotion Intensity Detection

no code implementations WS 2017 Egor Lakomkin, Ch Bothe, rakant, Stefan Wermter

Given the text of a tweet and its emotion category (anger, joy, fear, and sadness), the participants were asked to build a system that assigns emotion intensity values.

Language Modelling Machine Translation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.