Search Results for author: Shaoshi Ling

Found 9 papers, 5 papers with code

Efficient Long-Form Speech Recognition for General Speech In-Context Learning

no code implementations29 Sep 2024 Hao Yen, Shaoshi Ling, Guoli Ye

We propose a novel approach to end-to-end automatic speech recognition (ASR) to achieve efficient speech in-context learning (SICL) for (i) long-form speech decoding, (ii) test-time speaker adaptation, and (iii) test-time contextual biasing.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation

no code implementations14 Sep 2023 Shaoshi Ling, Guoli Ye, Rui Zhao, Yifan Gong

The attention-based encoder-decoder (AED) speech recognition model has been widely successful in recent years.

Automatic Speech Recognition Decoder +4

Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition

1 code implementation17 Jul 2023 Shaoshi Ling, Yuxuan Hu, Shuangbei Qian, Guoli Ye, Yao Qian, Yifan Gong, Ed Lin, Michael Zeng

Most end-to-end (E2E) speech recognition models are composed of encoder and decoder blocks that perform acoustic and language modeling functions.

Decoder Language Modeling +4

Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask

no code implementations8 Oct 2021 Shaoshi Ling, Chen Shen, Meng Cai, Zejun Ma

In the recent trend of semi-supervised speech recognition, both self-supervised representation learning and pseudo-labeling have shown promising results.

Pseudo Label Representation Learning +2

DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector Quantization

1 code implementation11 Dec 2020 Shaoshi Ling, Yuzong Liu

In speech representation learning, a large amount of unlabeled data is used in a self-supervised manner to learn a feature representation.

Diversity Quantization +4

BERTphone: Phonetically-Aware Encoder Representations for Utterance-Level Speaker and Language Recognition

1 code implementation30 Jun 2019 Shaoshi Ling, Julian Salazar, Yuzong Liu, Katrin Kirchhoff

We introduce BERTphone, a Transformer encoder trained on large speech corpora that outputs phonetically-aware contextual representation vectors that can be used for both speaker and language recognition.

Avg Representation Learning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.