Search Results for author: Shang-Wen Li

Found 51 papers, 24 papers with code

Self-supervised Representation Learning for Speech Processing

1 code implementation NAACL (ACL) 2022 Hung-Yi Lee, Abdelrahman Mohamed, Shinji Watanabe, Tara Sainath, Karen Livescu, Shang-Wen Li, Shu-wen Yang, Katrin Kirchhoff

Due to the growing popularity of SSL, and the shared mission of the areas in bringing speech and language technologies to more use cases with better quality and scaling the technologies for under-represented languages, we propose this tutorial to systematically survey the latest SSL techniques, tools, datasets, and performance achievement in speech processing.

Representation Learning

Demystifying CLIP Data

2 code implementations28 Sep 2023 Hu Xu, Saining Xie, Xiaoqing Ellen Tan, Po-Yao Huang, Russell Howes, Vasu Sharma, Shang-Wen Li, Gargi Ghosh, Luke Zettlemoyer, Christoph Feichtenhofer

We believe that the main ingredient to the success of CLIP is its data and not the model architecture or pre-training objective.

SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks

1 code implementation31 Mar 2022 Kai-Wei Chang, Wei-Cheng Tseng, Shang-Wen Li, Hung-Yi Lee

We report in this paper the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM).

Language Modelling Self-Supervised Learning

Pairwise Supervised Contrastive Learning of Sentence Representations

1 code implementation EMNLP 2021 Dejiao Zhang, Shang-Wen Li, Wei Xiao, Henghui Zhu, Ramesh Nallapati, Andrew O. Arnold, Bing Xiang

Many recent successes in sentence representation learning have been achieved by simply fine-tuning on the Natural Language Inference (NLI) datasets with triplet loss or siamese loss.

Contrastive Learning Natural Language Inference +4

QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition

1 code implementation3 Mar 2022 Andy T. Liu, Wei Xiao, Henghui Zhu, Dejiao Zhang, Shang-Wen Li, Andrew Arnold

Recently, prompt-based learning for pre-trained language models has succeeded in few-shot Named Entity Recognition (NER) by exploiting prompts as task guidance to increase label efficiency.

Few-shot NER Named Entity Recognition +2

Knowledge Grounded Conversational Symptom Detection with Graph Memory Networks

1 code implementation EMNLP (ClinicalNLP) 2020 Hongyin Luo, Shang-Wen Li, James Glass

Given a set of explicit symptoms provided by the patient to initiate a dialog for diagnosing, the system is trained to collect implicit symptoms by asking questions, in order to collect more information for making an accurate diagnosis.

Goal-Oriented Dialog

Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering

1 code implementation26 May 2023 Yung-Sung Chuang, Wei Fang, Shang-Wen Li, Wen-tau Yih, James Glass

We propose EAR, a query Expansion And Reranking approach for improving passage retrieval, with the application to open-domain question answering.

Open-Domain Question Answering Passage Retrieval +1

Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition

2 code implementations27 Mar 2022 Guan-Ting Lin, Shang-Wen Li, Hung-Yi Lee

Although deep learning-based end-to-end Automatic Speech Recognition (ASR) has shown remarkable performance in recent years, it suffers severe performance regression on test samples drawn from different data distributions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model

2 code implementations19 May 2023 Puyuan Peng, Shang-Wen Li, Okko Räsänen, Abdelrahman Mohamed, David Harwath

In this paper, we show that representations capturing syllabic units emerge when training a self-supervised speech model with a visually-grounded training objective.

Language Modelling Masked Language Modeling +3

Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining

1 code implementation26 Oct 2020 Cheng-I Lai, Yung-Sung Chuang, Hung-Yi Lee, Shang-Wen Li, James Glass

Much recent work on Spoken Language Understanding (SLU) is limited in at least one of three ways: models were trained on oracle text input and neglected ASR errors, models were trained to predict only intents without the slot values, or models were trained on a large amount of in-house data.

Language Modelling Spoken Language Understanding

Cooperative Self-training of Machine Reading Comprehension

1 code implementation NAACL 2022 Hongyin Luo, Shang-Wen Li, Mingye Gao, Seunghak Yu, James Glass

Pretrained language models have significantly improved the performance of downstream language understanding tasks, including extractive question answering, by providing high-quality contextualized word embeddings.

Extractive Question-Answering Machine Reading Comprehension +6

A Coarse-to-Fine Indoor Layout Estimation (CFILE) Method

1 code implementation3 Jul 2016 Yuzhuo Ren, Chen Chen, Shang-Wen Li, C. -C. Jay Kuo

The task of estimating the spatial layout of cluttered indoor scenes from a single RGB image is addressed in this work.

Measuring and Predicting Tag Importance for Image Retrieval

no code implementations28 Feb 2016 Shang-Wen Li, Sanjay Purushotham, Chen Chen, Yuzhuo Ren, C. -C. Jay Kuo

Textual data such as tags, sentence descriptions are combined with visual cues to reduce the semantic gap for image retrieval applications in today's Multimodal Image Retrieval (MIR) systems.

Image Retrieval Retrieval +2

GAL: A Global-Attributes Assisted Labeling System for Outdoor Scenes

no code implementations3 Apr 2016 Yuzhuo Ren, Chen Chen, Shang-Wen Li, C. -C. Jay Kuo

The proposed Global-attributes Assisted Labeling (GAL) system exploits both local features and global attributes.

Prototypical Q Networks for Automatic Conversational Diagnosis and Few-Shot New Disease Adaption

no code implementations19 May 2020 Hongyin Luo, Shang-Wen Li, James Glass

Experiments showed that the ProtoQN significantly outperformed the baseline DQN model in both supervised and few-shot learning scenarios, and achieves state-of-the-art few-shot learning performances.

Few-Shot Learning

Style Attuned Pre-training and Parameter Efficient Fine-tuning for Spoken Language Understanding

no code implementations9 Oct 2020 Jin Cao, Jun Wang, Wael Hamza, Kelly Vanee, Shang-Wen Li

The light encoder architecture separates the shared pre-trained networks from the mappings of generally encoded knowledge to specific domains of SLU, allowing for the domain adaptation to be performed solely at the light encoder and thus increasing efficiency.

Domain Adaptation Language Modelling +1

Towards Semi-Supervised Semantics Understanding from Speech

no code implementations11 Nov 2020 Cheng-I Lai, Jin Cao, Sravan Bodapati, Shang-Wen Li

Much recent work on Spoken Language Understanding (SLU) falls short in at least one of three ways: models were trained on oracle text input and neglected the Automatics Speech Recognition (ASR) outputs, models were trained to predict only intents without the slot values, or models were trained on a large amount of in-house data.

speech-recognition Speech Recognition +1

Educational Content Linking for Enhancing Learning Need Remediation in MOOCs

no code implementations31 Dec 2020 Shang-Wen Li

By linking and organizing pieces of learning content scattered in various course materials into an easily accessible structure, we hypothesize that this framework can provide learners guidance and improve content navigation.

Meta-learning for downstream aware and agnostic pretraining

no code implementations6 Jun 2021 Hongyin Luo, Shuyan Dong, Yung-Sung Chuang, Shang-Wen Li

Neural network pretraining is gaining attention due to its outstanding performance in natural language processing applications.

Meta-Learning

Self-Supervised Speech Representation Learning: A Review

no code implementations21 May 2022 Abdelrahman Mohamed, Hung-Yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe

Although self-supervised speech representation is still a nascent research area, it is closely related to acoustic word embedding and learning with zero lexical resources, both of which have seen active research for many years.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks

no code implementations1 Mar 2023 Kai-Wei Chang, Yu-Kai Wang, Hua Shen, Iu-thing Kang, Wei-Cheng Tseng, Shang-Wen Li, Hung-Yi Lee

For speech processing, SpeechPrompt shows its high parameter efficiency and competitive performance on a few speech classification tasks.

Ranked #17 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)

Classification Language Modelling +1

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

no code implementations18 May 2023 Jiatong Shi, Dan Berrebbi, William Chen, Ho-Lam Chung, En-Pei Hu, Wei Ping Huang, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Shinji Watanabe

Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to benchmark the performance of Self-Supervised Learning (SSL) models on various speech processing tasks.

Automatic Speech Recognition Language Identification +3

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

no code implementations9 Oct 2023 Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Shinji Watanabe

The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification.

Language Identification speech-recognition +1

An Exploration of In-Context Learning for Speech Language Model

no code implementations19 Oct 2023 Ming-Hao Hsu, Kai-Wei Chang, Shang-Wen Li, Hung-Yi Lee

Despite the success of ICL in NLP, little work is exploring the possibility of ICL in speech processing.

Few-Shot Learning In-Context Learning +1

FLAP: Fast Language-Audio Pre-training

no code implementations2 Nov 2023 Ching-Feng Yeh, Po-Yao Huang, Vasu Sharma, Shang-Wen Li, Gargi Gosh

We propose Fast Language-Audio Pre-training (FLAP), a self-supervised approach that efficiently and effectively learns aligned audio and language representations through masking, contrastive learning and reconstruction.

AudioCaps Contrastive Learning +2

SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering

no code implementations24 Jan 2024 Chyi-Jiunn Lin, Guan-Ting Lin, Yung-Sung Chuang, Wei-Lun Wu, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Lin-shan Lee

However, the real-world problem of Open-domain SQA (openSQA), in which the machine needs to first retrieve passages that possibly contain the answer from a spoken archive in addition, was never considered.

Passage Retrieval Question Answering +4

Cannot find the paper you are looking for? You can Submit a new open access paper.