Search Results for author: Shang-Wen Li

Found 51 papers, 24 papers with code

DINOv2: Learning Robust Visual Features without Supervision

11 code implementations • 14 Apr 2023 • Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski

The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision.

Ranked #1 on Image Classification on CIFAR-10

Domain Generalization Fine-Grained Image Classification +5

124,527

Paper
Code

Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation

3 code implementations • 18 May 2020 • Po-Han Chi, Pei-Hung Chung, Tsung-Han Wu, Chun-Cheng Hsieh, Yen-Hao Chen, Shang-Wen Li, Hung-Yi Lee

We use the representations with two downstream tasks, speaker identification, and phoneme classification.

Self-Supervised Learning Speaker Identification

2,084

Paper
Code

TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech

6 code implementations • 12 Jul 2020 • Andy T. Liu, Shang-Wen Li, Hung-Yi Lee

We present a large-scale comparison of various self-supervised models.

Keyword Spotting Self-Supervised Learning +3

2,084

Paper
Code

SUPERB: Speech processing Universal PERformance Benchmark

5 code implementations • 3 May 2021 • Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-Yi Lee

SUPERB is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks with minimal architecture changes and labeled data.

Representation Learning Self-Supervised Learning

2,083

Paper
Code

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

1 code implementation • ACL 2022 • Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Jeff Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-Yi Lee

In this paper, we introduce SUPERB-SG, a new benchmark focused on evaluating the semantic and generative capabilities of pre-trained models by increasing task diversity and difficulty over SUPERB.

Self-Supervised Learning Transfer Learning

2,083

Paper
Code

Self-supervised Representation Learning for Speech Processing

1 code implementation • NAACL (ACL) 2022 • Hung-Yi Lee, Abdelrahman Mohamed, Shinji Watanabe, Tara Sainath, Karen Livescu, Shang-Wen Li, Shu-wen Yang, Katrin Kirchhoff

Due to the growing popularity of SSL, and the shared mission of the areas in bringing speech and language technologies to more use cases with better quality and scaling the technologies for under-represented languages, we propose this tutorial to systematically survey the latest SSL techniques, tools, datasets, and performance achievement in speech processing.

Representation Learning

2,083

Paper
Code

Demystifying CLIP Data

2 code implementations • 28 Sep 2023 • Hu Xu, Saining Xie, Xiaoqing Ellen Tan, Po-Yao Huang, Russell Howes, Vasu Sharma, Shang-Wen Li, Gargi Ghosh, Luke Zettlemoyer, Christoph Feichtenhofer

We believe that the main ingredient to the success of CLIP is its data and not the model architecture or pre-training objective.

991

Paper
Code

Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

1 code implementation • 5 Sep 2023 • Lili Yu, Bowen Shi, Ramakanth Pasunuru, Benjamin Muller, Olga Golovneva, Tianlu Wang, Arun Babu, Binh Tang, Brian Karrer, Shelly Sheynin, Candace Ross, Adam Polyak, Russell Howes, Vasu Sharma, Puxin Xu, Hovhannes Tamoyan, Oron Ashual, Uriel Singer, Shang-Wen Li, Susan Zhang, Richard James, Gargi Ghosh, Yaniv Taigman, Maryam Fazel-Zarandi, Asli Celikyilmaz, Luke Zettlemoyer, Armen Aghajanyan

It is also a general-purpose model that can do both text-to-image and image-to-text generation, allowing us to introduce self-contained contrastive decoding methods that produce high-quality outputs.

Ranked #2 on Text-to-Image Generation on MS COCO

Language Modelling Retrieval +2

318

Paper
Code

DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings

1 code implementation • NAACL 2022 • Yung-Sung Chuang, Rumen Dangovski, Hongyin Luo, Yang Zhang, Shiyu Chang, Marin Soljačić, Shang-Wen Li, Wen-tau Yih, Yoon Kim, James Glass

We propose DiffCSE, an unsupervised contrastive learning framework for learning sentence embeddings.

Ranked #13 on Semantic Textual Similarity on STS16

Contrastive Learning Language Modelling +3

284

Paper
Code

SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks

1 code implementation • 31 Mar 2022 • Kai-Wei Chang, Wei-Cheng Tseng, Shang-Wen Li, Hung-Yi Lee

We report in this paper the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM).

Language Modelling Self-Supervised Learning

Paper
Code

Pairwise Supervised Contrastive Learning of Sentence Representations

1 code implementation • EMNLP 2021 • Dejiao Zhang, Shang-Wen Li, Wei Xiao, Henghui Zhu, Ramesh Nallapati, Andrew O. Arnold, Bing Xiang

Many recent successes in sentence representation learning have been achieved by simply fine-tuning on the Natural Language Inference (NLI) datasets with triplet loss or siamese loss.

Contrastive Learning Natural Language Inference +4

Paper
Code

QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition

1 code implementation • 3 Mar 2022 • Andy T. Liu, Wei Xiao, Henghui Zhu, Dejiao Zhang, Shang-Wen Li, Andrew Arnold

Recently, prompt-based learning for pre-trained language models has succeeded in few-shot Named Entity Recognition (NER) by exploiting prompts as task guidance to increase label efficiency.

Few-shot NER Named Entity Recognition +2

Paper
Code

Knowledge Grounded Conversational Symptom Detection with Graph Memory Networks

1 code implementation • EMNLP (ClinicalNLP) 2020 • Hongyin Luo, Shang-Wen Li, James Glass

Given a set of explicit symptoms provided by the patient to initiate a dialog for diagnosing, the system is trained to collect implicit symptoms by asking questions, in order to collect more information for making an accurate diagnosis.

Goal-Oriented Dialog

Paper
Code

DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering

1 code implementation • 9 Mar 2022 • Guan-Ting Lin, Yung-Sung Chuang, Ho-Lam Chung, Shu-wen Yang, Hsuan-Jui Chen, Shuyan Dong, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Lin-shan Lee

We empirically showed that DUAL yields results comparable to those obtained by cascading ASR and text QA model and robust to real-world data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

MAViL: Masked Audio-Video Learners

1 code implementation • NeurIPS 2023 • Po-Yao Huang, Vasu Sharma, Hu Xu, Chaitanya Ryali, Haoqi Fan, Yanghao Li, Shang-Wen Li, Gargi Ghosh, Jitendra Malik, Christoph Feichtenhofer

We present Masked Audio-Video Learners (MAViL) to train audio-visual representations.

Contrastive Learning Retrieval

Paper
Code

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

1 code implementation • 19 Sep 2023 • Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-Yi Lee

Audio-visual representation learning aims to develop systems with human-like perception by utilizing correlation between auditory and visual information.

audio-visual learning Representation Learning

Paper
Code

Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering

1 code implementation • 26 May 2023 • Yung-Sung Chuang, Wei Fang, Shang-Wen Li, Wen-tau Yih, James Glass

We propose EAR, a query Expansion And Reranking approach for improving passage retrieval, with the application to open-domain question answering.

Open-Domain Question Answering Passage Retrieval +1

Paper
Code

Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition

2 code implementations • 27 Mar 2022 • Guan-Ting Lin, Shang-Wen Li, Hung-Yi Lee

Although deep learning-based end-to-end Automatic Speech Recognition (ASR) has shown remarkable performance in recent years, it suffers severe performance regression on test samples drawn from different data distributions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model

2 code implementations • 19 May 2023 • Puyuan Peng, Shang-Wen Li, Okko Räsänen, Abdelrahman Mohamed, David Harwath

In this paper, we show that representations capturing syllabic units emerge when training a self-supervised speech model with a visually-grounded training objective.

Language Modelling Masked Language Modeling +3

Paper
Code

Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining

1 code implementation • 26 Oct 2020 • Cheng-I Lai, Yung-Sung Chuang, Hung-Yi Lee, Shang-Wen Li, James Glass

Much recent work on Spoken Language Understanding (SLU) is limited in at least one of three ways: models were trained on oracle text input and neglected ASR errors, models were trained to predict only intents without the slot values, or models were trained on a large amount of in-house data.

Language Modelling Spoken Language Understanding

Paper
Code

Cooperative Self-training of Machine Reading Comprehension

1 code implementation • NAACL 2022 • Hongyin Luo, Shang-Wen Li, Mingye Gao, Seunghak Yu, James Glass

Pretrained language models have significantly improved the performance of downstream language understanding tasks, including extractive question answering, by providing high-quality contextualized word embeddings.

Ranked #1 on Question Answering on MRQA out-of-domain

Extractive Question-Answering Machine Reading Comprehension +6

Paper
Code

A Coarse-to-Fine Indoor Layout Estimation (CFILE) Method

1 code implementation • 3 Jul 2016 • Yuzhuo Ren, Chen Chen, Shang-Wen Li, C. -C. Jay Kuo

The task of estimating the spatial layout of cluttered indoor scenes from a single RGB image is addressed in this work.

Paper
Code

SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT

1 code implementation • 16 Oct 2023 • Cheol Jun Cho, Abdelrahman Mohamed, Shang-Wen Li, Alan W Black, Gopala K. Anumanchipalli

Data-driven unit discovery in self-supervised learning (SSL) of speech has embarked on a new era of spoken language processing.

Language Modelling Self-Supervised Learning +1

Paper
Code

Mitigating Biases in Toxic Language Detection through Invariant Rationalization

1 code implementation • ACL (WOAH) 2021 • Yung-Sung Chuang, Mingye Gao, Hongyin Luo, James Glass, Hung-Yi Lee, Yun-Nung Chen, Shang-Wen Li

Automatic detection of toxic language plays an essential role in protecting social media users, especially minority groups, from verbal abuse.

Natural Language Understanding

Paper
Code

Learning Robust Dialog Policies in Noisy Environments

no code implementations • 11 Dec 2017 • Maryam Fazel-Zarandi, Shang-Wen Li, Jin Cao, Jared Casale, Peter Henderson, David Whitney, Alborz Geramifard

In this paper, we focus on learning robust dialog policies to recover from these errors.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Measuring and Predicting Tag Importance for Image Retrieval

no code implementations • 28 Feb 2016 • Shang-Wen Li, Sanjay Purushotham, Chen Chen, Yuzhuo Ren, C. -C. Jay Kuo

Textual data such as tags, sentence descriptions are combined with visual cues to reduce the semantic gap for image retrieval applications in today's Multimodal Image Retrieval (MIR) systems.

Image Retrieval Retrieval +2

Paper
Add Code

GAL: A Global-Attributes Assisted Labeling System for Outdoor Scenes

no code implementations • 3 Apr 2016 • Yuzhuo Ren, Chen Chen, Shang-Wen Li, C. -C. Jay Kuo

The proposed Global-attributes Assisted Labeling (GAL) system exploits both local features and global attributes.

Paper
Add Code

Prototypical Q Networks for Automatic Conversational Diagnosis and Few-Shot New Disease Adaption

no code implementations • 19 May 2020 • Hongyin Luo, Shang-Wen Li, James Glass

Experiments showed that the ProtoQN significantly outperformed the baseline DQN model in both supervised and few-shot learning scenarios, and achieves state-of-the-art few-shot learning performances.

Few-Shot Learning

Paper
Add Code

Style Attuned Pre-training and Parameter Efficient Fine-tuning for Spoken Language Understanding

no code implementations • 9 Oct 2020 • Jin Cao, Jun Wang, Wael Hamza, Kelly Vanee, Shang-Wen Li

The light encoder architecture separates the shared pre-trained networks from the mappings of generally encoded knowledge to specific domains of SLU, allowing for the domain adaptation to be performed solely at the light encoder and thus increasing efficiency.

Domain Adaptation Language Modelling +1

Paper
Add Code

Towards Semi-Supervised Semantics Understanding from Speech

no code implementations • 11 Nov 2020 • Cheng-I Lai, Jin Cao, Sravan Bodapati, Shang-Wen Li

Much recent work on Spoken Language Understanding (SLU) falls short in at least one of three ways: models were trained on oracle text input and neglected the Automatics Speech Recognition (ASR) outputs, models were trained to predict only intents without the slot values, or models were trained on a large amount of in-house data.

speech-recognition Speech Recognition +1

Paper
Add Code

Meta learning to classify intent and slot labels with noisy few shot examples

no code implementations • 30 Nov 2020 • Shang-Wen Li, Jason Krone, Shuyan Dong, Yi Zhang, Yaser Al-Onaizan

Recently deep learning has dominated many machine learning areas, including spoken language understanding (SLU).

Benchmarking intent-classification +3

Paper
Add Code

Educational Content Linking for Enhancing Learning Need Remediation in MOOCs

no code implementations • 31 Dec 2020 • Shang-Wen Li

By linking and organizing pieces of learning content scattered in various course materials into an easily accessible structure, we hypothesize that this framework can provide learners guidance and improve content navigation.

Paper
Add Code

Zero-shot Generalization in Dialog State Tracking through Generative Question Answering

no code implementations • EACL 2021 • Shuyang Li, Jin Cao, Mukund Sridhar, Henghui Zhu, Shang-Wen Li, Wael Hamza, Julian McAuley

Dialog State Tracking (DST), an integral part of modern dialog systems, aims to track user preferences and constraints (slots) in task-oriented dialogs.

dialog state tracking Domain Adaptation +4

Paper
Add Code

Meta-learning for downstream aware and agnostic pretraining

no code implementations • 6 Jun 2021 • Hongyin Luo, Shuyan Dong, Yung-Sung Chuang, Shang-Wen Li

Neural network pretraining is gaining attention due to its outstanding performance in natural language processing applications.

Meta-Learning

Paper
Add Code

Meta Learning and Its Applications to Natural Language Processing

no code implementations • ACL 2021 • Hung-Yi Lee, Ngoc Thang Vu, Shang-Wen Li

Meta-learning is one of the most important new techniques in machine learning in recent years.

Dialogue Generation Few-Shot Learning +2

Paper
Add Code

Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora

no code implementations • NAACL 2022 • Xisen Jin, Dejiao Zhang, Henghui Zhu, Wei Xiao, Shang-Wen Li, Xiaokai Wei, Andrew Arnold, Xiang Ren

We evaluate PTLM's ability to adapt to new corpora while retaining learned knowledge in earlier corpora.

Continual Learning Continual Pretraining +2

Paper
Add Code

Meta Learning for Natural Language Processing: A Survey

no code implementations • NAACL 2022 • Hung-Yi Lee, Shang-Wen Li, Ngoc Thang Vu

Deep learning has been the mainstream technique in natural language processing (NLP) area.

Meta-Learning

Paper
Add Code

Self-Supervised Speech Representation Learning: A Review

no code implementations • 21 May 2022 • Abdelrahman Mohamed, Hung-Yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe

Although self-supervised speech representation is still a nascent research area, it is closely related to acoustic word embedding and learning with zero lexical resources, both of which have seen active research for many years.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Exploring Efficient-tuning Methods in Self-supervised Speech Models

no code implementations • 10 Oct 2022 • Zih-Ching Chen, Chin-Lun Fu, Chih-Ying Liu, Shang-Wen Li, Hung-Yi Lee

In downstream tasks, the parameters of SSL models are frozen, and only the adapters are trained.

Self-Supervised Learning

Paper
Add Code

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning

no code implementations • 16 Oct 2022 • Tzu-hsun Feng, Annie Dong, Ching-Feng Yeh, Shu-wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe, Abdelrahman Mohamed, Shang-Wen Li, Hung-Yi Lee

We present the SUPERB challenge at SLT 2022, which aims at learning self-supervised speech representation for better performance, generalization, and efficiency.

Audio Generation Representation Learning +2

Paper
Add Code

Introducing Semantics into Speech Encoders

no code implementations • 15 Nov 2022 • Derek Xu, Shuyan Dong, Changhan Wang, Suyoun Kim, Zhaojiang Lin, Akshat Shrivastava, Shang-Wen Li, Liang-Hsuan Tseng, Alexei Baevski, Guan-Ting Lin, Hung-Yi Lee, Yizhou Sun, Wei Wang

Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +10

Paper
Add Code

SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks

no code implementations • 1 Mar 2023 • Kai-Wei Chang, Yu-Kai Wang, Hua Shen, Iu-thing Kang, Wei-Cheng Tseng, Shang-Wen Li, Hung-Yi Lee

For speech processing, SpeechPrompt shows its high parameter efficiency and competitive performance on a few speech classification tasks.

Ranked #17 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)

Classification Language Modelling +1

Paper
Add Code

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

no code implementations • 18 May 2023 • Jiatong Shi, Dan Berrebbi, William Chen, Ho-Lam Chung, En-Pei Hu, Wei Ping Huang, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Shinji Watanabe

Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to benchmark the performance of Self-Supervised Learning (SSL) models on various speech processing tasks.

Automatic Speech Recognition Language Identification +3

Paper
Add Code

Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target

no code implementations • 29 May 2023 • Guan-Wei Wu, Guan-Ting Lin, Shang-Wen Li, Hung-Yi Lee

However, the absence of intermediate targets and training guidance for textless SLU often results in suboptimal performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model

no code implementations • 4 Oct 2023 • Kai-Wei Chang, Ming-Hsin Chen, Yun-Ping Lin, Jing Neng Hsu, Paul Kuo-Ming Huang, Chien-yu Huang, Shang-Wen Li, Hung-Yi Lee

Notably, in the low-resource scenario, prompting consistently outperforms adapter tuning.

Cross-Lingual ASR slot-filling +1

Paper
Add Code

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

no code implementations • 9 Oct 2023 • Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Shinji Watanabe

The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification.

Language Identification speech-recognition +1

Paper
Add Code

An Exploration of In-Context Learning for Speech Language Model

no code implementations • 19 Oct 2023 • Ming-Hao Hsu, Kai-Wei Chang, Shang-Wen Li, Hung-Yi Lee

Despite the success of ICL in NLP, little work is exploring the possibility of ICL in speech processing.

Few-Shot Learning In-Context Learning +1

Paper
Add Code

FLAP: Fast Language-Audio Pre-training

no code implementations • 2 Nov 2023 • Ching-Feng Yeh, Po-Yao Huang, Vasu Sharma, Shang-Wen Li, Gargi Gosh

We propose Fast Language-Audio Pre-training (FLAP), a self-supervised approach that efficiently and effectively learns aligned audio and language representations through masking, contrastive learning and reconstruction.

AudioCaps Contrastive Learning +2

Paper
Add Code

GSQA: An End-to-End Model for Generative Spoken Question Answering

no code implementations • 15 Dec 2023 • Min-Han Shih, Ho-Lam Chung, Yu-Chi Pai, Ming-Hao Hsu, Guan-Ting Lin, Shang-Wen Li, Hung-Yi Lee

Furthermore, the GSQA model has only been fine-tuned on the spoken extractive QA dataset.

Question Answering

Paper
Add Code

SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering

no code implementations • 24 Jan 2024 • Chyi-Jiunn Lin, Guan-Ting Lin, Yung-Sung Chuang, Wei-Lun Wu, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Lin-shan Lee

However, the real-world problem of Open-domain SQA (openSQA), in which the machine needs to first retrieve passages that possibly contain the answer from a spoken archive in addition, was never considered.

Passage Retrieval Question Answering +4

Paper
Add Code

A Large-Scale Evaluation of Speech Foundation Models

no code implementations • 15 Apr 2024 • Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-Yi Lee

In this work, we establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the paradigm for speech.

Benchmarking

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.