Search Results for author: Michael Zeng

Found 71 papers, 34 papers with code

Modeling Entity Knowledge for Fact Verification

no code implementations • EMNLP (FEVER) 2021 • Yang Liu, Chenguang Zhu, Michael Zeng

Fact verification is a challenging task of identifying the truthfulness of given claims based on the retrieval of relevant evidence texts.

Descriptive Fact Verification +1

Paper
Add Code

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

no code implementations • 10 Apr 2024 • Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng

CoVoMix is capable of first converting dialogue text into multiple streams of discrete tokens, with each token stream representing semantic information for individual talkers.

Dialogue Generation

Paper
Add Code

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

no code implementations • 12 Feb 2024 • Naoyuki Kanda, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Yufei Xia, Jinzhu Li, Yanqing Liu, Sheng Zhao, Michael Zeng

In this work, we propose ELaTE, a zero-shot TTS that can generate natural laughing speech of any speaker based on a short audio prompt with precise control of laughter timing and expression.

Paper
Add Code

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

no code implementations • 10 Nov 2023 • Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan

We introduce Florence-2, a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.

Multi-Task Learning object-detection +1

Paper
Add Code

Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction

no code implementations • 25 Sep 2023 • Leying Zhang, Yao Qian, Linfeng Yu, Heming Wang, Xinkai Wang, Hemin Yang, Long Zhou, Shujie Liu, Yanmin Qian, Michael Zeng

Additionally, we introduce Regenerate-DCEM (R-DCEM) that can regenerate and optimize speech quality based on pre-processed speech from a discriminative model.

Speech Extraction

Paper
Add Code

Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition

1 code implementation • 17 Jul 2023 • Shaoshi Ling, Yuxuan Hu, Shuangbei Qian, Guoli Ye, Yao Qian, Yifan Gong, Ed Lin, Michael Zeng

Most end-to-end (E2E) speech recognition models are composed of encoder and decoder blocks that perform acoustic and language modeling functions.

Language Modelling Large Language Model +2

60,245

Paper
Code

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers

no code implementations • 30 May 2023 • Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng

State-of-the-art large-scale universal speech models (USMs) show a decent automatic speech recognition (ASR) performance across multiple domains and languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

1 code implementation • NeurIPS 2023 • Chenyang Le, Yao Qian, Long Zhou, Shujie Liu, Yanmin Qian, Michael Zeng, Xuedong Huang

Joint speech-language training is challenging due to the large demand for training data and GPU consumption, as well as the modality gap between speech and language.

Language Modelling Multi-Task Learning +2

Paper
Code

i-Code Studio: A Configurable and Composable Framework for Integrative AI

no code implementations • 23 May 2023 • Yuwei Fang, Mahmoud Khademi, Chenguang Zhu, ZiYi Yang, Reid Pryzant, Yichong Xu, Yao Qian, Takuya Yoshioka, Lu Yuan, Michael Zeng, Xuedong Huang

Artificial General Intelligence (AGI) requires comprehensive understanding and generation capabilities for a variety of tasks spanning different modalities and functionalities.

Question Answering Retrieval +4

Paper
Add Code

InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT

no code implementations • 22 May 2023 • Yichong Xu, Ruochen Xu, Dan Iter, Yang Liu, Shuohang Wang, Chenguang Zhu, Michael Zeng

While large models such as GPT-3 demonstrate exceptional performance in zeroshot and fewshot summarization tasks, their extensive serving and fine-tuning costs hinder their utilization in various applications.

Paper
Add Code

LMGQS: A Large-scale Dataset for Query-focused Summarization

no code implementations • 22 May 2023 • Ruochen Xu, Song Wang, Yang Liu, Shuohang Wang, Yichong Xu, Dan Iter, Chenguang Zhu, Michael Zeng

We hypothesize that there is a hidden query for each summary sentence in a generic summarization annotation, and we utilize a large-scale pretrained language model to recover it.

Language Modelling Query-focused Summarization +1

Paper
Add Code

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

no code implementations • 21 May 2023 • ZiYi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Mei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang

The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities.

Paper
Add Code

Any-to-Any Generation via Composable Diffusion

1 code implementation • NeurIPS 2023 • Zineng Tang, ZiYi Yang, Chenguang Zhu, Michael Zeng, Mohit Bansal

We present Composable Diffusion (CoDi), a novel generative model capable of generating any combination of output modalities, such as language, image, video, or audio, from any combination of input modalities.

Ranked #7 on Audio Generation on AudioCaps

Audio Generation

1,631

Paper
Code

Automatic Prompt Optimization with "Gradient Descent" and Beam Search

4 code implementations • 4 May 2023 • Reid Pryzant, Dan Iter, Jerry Li, Yin Tat Lee, Chenguang Zhu, Michael Zeng

Large Language Models (LLMs) have shown impressive performance as general purpose agents, but their abilities remain highly dependent on prompts which are hand written with onerous trial-and-error effort.

3,177

Paper
Code

Code-Switching Text Generation and Injection in Mandarin-English ASR

no code implementations • 20 Mar 2023 • Haibin Yu, Yuxuan Hu, Yao Qian, Ma Jin, Linquan Liu, Shujie Liu, Yu Shi, Yanmin Qian, Edward Lin, Michael Zeng

Code-switching speech refers to a means of expression by mixing two or more languages within a single utterance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

1 code implementation • 20 Mar 2023 • Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang

We propose MM-REACT, a system paradigm that integrates ChatGPT with a pool of vision experts to achieve multimodal reasoning and action.

Ranked #22 on Visual Question Answering on MM-Vet

Multimodal Reasoning Visual Question Answering

906

Paper
Code

Target Sound Extraction with Variable Cross-modality Clues

1 code implementation • 15 Mar 2023 • Chenda Li, Yao Qian, Zhuo Chen, Dongmei Wang, Takuya Yoshioka, Shujie Liu, Yanmin Qian, Michael Zeng

Automatic target sound extraction (TSE) is a machine learning approach to mimic the human auditory perception capability of attending to a sound source of interest from a mixture of sources.

AudioCaps Target Sound Extraction

Paper
Code

Unifying Vision, Text, and Layout for Universal Document Processing

2 code implementations • CVPR 2023 • Zineng Tang, ZiYi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal

UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation.

Ranked #5 on Visual Question Answering (VQA) on InfographicVQA (using extra training data)

document understanding Image Reconstruction +1

1,631

Paper
Code

ReCo: Region-Controlled Text-to-Image Generation

no code implementations • CVPR 2023 • Zhengyuan Yang, JianFeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang

Human evaluation on PaintSkill shows that ReCo is +19. 28% and +17. 21% more accurate in generating images with correct object count and spatial relationship than the T2I model.

Ranked #2 on Conditional Text-to-Image Synthesis on COCO-MIG

Conditional Text-to-Image Synthesis Position

Paper
Add Code

UniSumm and SummZoo: Unified Model and Diverse Benchmark for Few-Shot Summarization

1 code implementation • 17 Nov 2022 • Yulong Chen, Yang Liu, Ruochen Xu, ZiYi Yang, Chenguang Zhu, Michael Zeng, Yue Zhang

The high annotation costs and diverse demands of various summarization tasks motivate the development of few-shot summarization.

Paper
Code

MACSum: Controllable Summarization with Mixed Attributes

1 code implementation • 9 Nov 2022 • Yusen Zhang, Yang Liu, ZiYi Yang, Yuwei Fang, Yulong Chen, Dragomir Radev, Chenguang Zhu, Michael Zeng, Rui Zhang

We propose two simple and effective parameter-efficient approaches for the new task of mixed controllable summarization based on hard prompt tuning and soft prefix tuning.

Attribute Specificity

Paper
Code

Task Compass: Scaling Multi-task Pre-training with Task Prefix

1 code implementation • 12 Oct 2022 • Zhuosheng Zhang, Shuohang Wang, Yichong Xu, Yuwei Fang, Wenhao Yu, Yang Liu, Hai Zhao, Chenguang Zhu, Michael Zeng

Leveraging task-aware annotated data as supervised signals to assist with self-supervised learning on large-scale unlabeled data has become a new trend in pre-training language models.

Ranked #1 on Sentence Completion on HellaSwag

Common Sense Reasoning Data Augmentation +4

Paper
Code

Generate rather than Retrieve: Large Language Models are Strong Context Generators

1 code implementation • 21 Sep 2022 • Wenhao Yu, Dan Iter, Shuohang Wang, Yichong Xu, Mingxuan Ju, Soumya Sanyal, Chenguang Zhu, Michael Zeng, Meng Jiang

We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.

Language Modelling Large Language Model +1

265

Paper
Code

Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

1 code implementation • 21 Aug 2022 • Pengcheng He, Baolin Peng, Liyang Lu, Song Wang, Jie Mei, Yang Liu, Ruochen Xu, Hany Hassan Awadalla, Yu Shi, Chenguang Zhu, Wayne Xiong, Michael Zeng, Jianfeng Gao, Xuedong Huang

Z-Code++ creates new state of the art on 9 out of 13 text summarization tasks across 5 languages.

Abstractive Text Summarization Language Modelling +1

1,842

Paper
Code

Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning

no code implementations • 3 Jun 2022 • Yujia Xie, Luowei Zhou, Xiyang Dai, Lu Yuan, Nguyen Bach, Ce Liu, Michael Zeng

Thanks to the strong zero-shot capability of foundation models, we start by constructing a rich semantic representation of the image (e. g., image tags, object attributes / locations, captions) as a structured textual prompt, called visual clues, using a vision foundation model.

Image Paragraph Captioning Language Modelling +1

Paper
Add Code

Automatic Rule Induction for Interpretable Semi-Supervised Learning

1 code implementation • 18 May 2022 • Reid Pryzant, ZiYi Yang, Yichong Xu, Chenguang Zhu, Michael Zeng

Semi-supervised learning has shown promise in allowing NLP models to generalize from small amounts of labeled data.

Relation Extraction

Paper
Code

i-Code: An Integrative and Composable Multimodal Learning Framework

no code implementations • 3 May 2022 • ZiYi Yang, Yuwei Fang, Chenguang Zhu, Reid Pryzant, Dongdong Chen, Yu Shi, Yichong Xu, Yao Qian, Mei Gao, Yi-Ling Chen, Liyang Lu, Yujia Xie, Robert Gmyr, Noel Codella, Naoyuki Kanda, Bin Xiao, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang

Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview.

Contrastive Learning Video Understanding

Paper
Add Code

Impossible Triangle: What's Next for Pre-trained Language Models?

no code implementations • 13 Apr 2022 • Chenguang Zhu, Michael Zeng

Recent development of large-scale pre-trained language models (PLM) have significantly improved the capability of models in various NLP tasks, in terms of performance after task-specific fine-tuning and zero-shot / few-shot learning.

Data Augmentation Few-Shot Learning +2

Paper
Add Code

Training Data is More Valuable than You Think: A Simple and Effective Method by Retrieving from Training Data

1 code implementation • ACL 2022 • Shuohang Wang, Yichong Xu, Yuwei Fang, Yang Liu, Siqi Sun, Ruochen Xu, Chenguang Zhu, Michael Zeng

Surprisingly, we found that REtrieving from the traINing datA (REINA) only can lead to significant gains on multiple NLG and NLU tasks.

Language Modelling Machine Translation +3

116

Paper
Code

AdaPrompt: Adaptive Model Training for Prompt-based NLP

no code implementations • 10 Feb 2022 • Yulong Chen, Yang Liu, Li Dong, Shuohang Wang, Chenguang Zhu, Michael Zeng, Yue Zhang

However, for prompt learning, there are still two salient gaps between NLP tasks and pretraining.

Continual Pretraining Language Modelling +1

Paper
Add Code

Unsupervised Multi-Granularity Summarization

2 code implementations • 29 Jan 2022 • Ming Zhong, Yang Liu, Suyu Ge, Yuning Mao, Yizhu Jiao, Xingxing Zhang, Yichong Xu, Chenguang Zhu, Michael Zeng, Jiawei Han

In this paper, we propose the first unsupervised multi-granularity summarization framework, GranuSum.

Abstractive Text Summarization

Paper
Code

CLIP-Event: Connecting Text and Images with Event Structures

1 code implementation • CVPR 2022 • Manling Li, Ruochen Xu, Shuohang Wang, Luowei Zhou, Xudong Lin, Chenguang Zhu, Michael Zeng, Heng Ji, Shih-Fu Chang

Vision-language (V+L) pretraining models have achieved great success in supporting multimedia applications by understanding the alignments between images and text.

Contrastive Learning Event Extraction +2

Paper
Code

Sequence-level self-learning with multiple hypotheses

no code implementations • 10 Dec 2021 • Kenichi Kumatani, Dimitrios Dimitriadis, Yashesh Gaur, Robert Gmyr, Sefik Emre Eskimez, Jinyu Li, Michael Zeng

For untranscribed speech data, the hypothesis from an ASR system must be used as a label.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

MLP Architectures for Vision-and-Language Modeling: An Empirical Study

1 code implementation • 8 Dec 2021 • Yixin Nie, Linjie Li, Zhe Gan, Shuohang Wang, Chenguang Zhu, Michael Zeng, Zicheng Liu, Mohit Bansal, Lijuan Wang

Based on this, we ask an even bolder question: can we have an all-MLP architecture for VL modeling, where both VL fusion and the vision encoder are replaced with MLPs?

Language Modelling Visual Question Answering (VQA)

Paper
Code

Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention

2 code implementations • 6 Dec 2021 • Yichong Xu, Chenguang Zhu, Shuohang Wang, Siqi Sun, Hao Cheng, Xiaodong Liu, Jianfeng Gao, Pengcheng He, Michael Zeng, Xuedong Huang

In particular, we focus on the task of Commonsense Reasoning, demonstrating that the proposed external attention mechanism can augment existing transformer models and significantly improve the model's reasoning capabilities.

Ranked #1 on Common Sense Reasoning on CommonsenseQA (using extra training data)

Common Sense Reasoning

106

Paper
Code

Florence: A New Foundation Model for Computer Vision

1 code implementation • 22 Nov 2021 • Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, JianFeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang

Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.

Ranked #1 on Action Recognition In Videos on Kinetics-600

Action Classification Action Recognition In Videos +12

367

Paper
Code

An Empirical Study of Training End-to-End Vision-and-Language Transformers

2 code implementations • CVPR 2022 • Zi-Yi Dou, Yichong Xu, Zhe Gan, JianFeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu, Pengchuan Zhang, Lu Yuan, Nanyun Peng, Zicheng Liu, Michael Zeng

Vision-and-language (VL) pre-training has proven to be highly effective on various VL downstream tasks.

Ranked #20 on Cross-Modal Retrieval on COCO 2014 (using extra training data)

Cross-Modal Retrieval Visual Question Answering (VQA) +1

350

Paper
Code

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

5 code implementations • 26 Oct 2021 • Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Xiangzhan Yu, Furu Wei

Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks.

Denoising Self-Supervised Learning +3

18,315

Paper
Code

Leveraging Knowledge in Multilingual Commonsense Reasoning

no code implementations • Findings (ACL) 2022 • Yuwei Fang, Shuohang Wang, Yichong Xu, Ruochen Xu, Siqi Sun, Chenguang Zhu, Michael Zeng

Then we utilize a diverse of 4 English knowledge sources to provide more comprehensive coverage of knowledge in different formats.

Language Modelling Retrieval +2

Paper
Add Code

End-to-End Segmentation-based News Summarization

no code implementations • Findings (ACL) 2022 • Yang Liu, Chenguang Zhu, Michael Zeng

In this paper, we bring a new way of digesting news content by introducing the task of segmenting a news article into multiple sections and generating the corresponding summary to each section.

News Summarization Text Generation

Paper
Add Code

Dict-BERT: Enhancing Language Model Pre-training with Dictionary

1 code implementation • Findings (ACL) 2022 • Wenhao Yu, Chenguang Zhu, Yuwei Fang, Donghan Yu, Shuohang Wang, Yichong Xu, Michael Zeng, Meng Jiang

In addition to training with the masked language modeling objective, we propose two novel self-supervised pre-training tasks on word and sentence-level alignment between input text sequence and rare word definitions to enhance language modeling representation with dictionary.

Language Modelling Masked Language Modeling +1

Paper
Code

KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering

no code implementations • ACL 2022 • Donghan Yu, Chenguang Zhu, Yuwei Fang, Wenhao Yu, Shuohang Wang, Yichong Xu, Xiang Ren, Yiming Yang, Michael Zeng

The recent proposed Fusion-in-Decoder (FiD), which is built on top of the pretrained generative model T5, achieves the state-of-the-art performance in the reading module.

Answer Generation Open-Domain Question Answering +3

Paper
Add Code

DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization

1 code implementation • 6 Sep 2021 • Ming Zhong, Yang Liu, Yichong Xu, Chenguang Zhu, Michael Zeng

For a dialogue, it corrupts a window of text with dialogue-inspired noise, and guides the model to reconstruct this window based on the content of the remaining conversation.

abstractive question answering Denoising +2

132

Paper
Code

Does Knowledge Help General NLU? An Empirical Study

no code implementations • 1 Sep 2021 • Ruochen Xu, Yuwei Fang, Chenguang Zhu, Michael Zeng

It is often observed in knowledge-centric tasks (e. g., common sense question and answering, relation classification) that the integration of external knowledge such as entity representation into language models can help provide useful information to boost the performance.

Common Sense Reasoning Language Modelling +2

Paper
Add Code

Want To Reduce Labeling Cost? GPT-3 Can Help

1 code implementation • Findings (EMNLP) 2021 • Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu, Michael Zeng

Data annotation is a time-consuming and labor-intensive process for many NLP tasks.

Few-Shot Learning Language Modelling

Paper
Code

A Joint and Domain-Adaptive Approach to Spoken Language Understanding

no code implementations • 25 Jul 2021 • Linhao Zhang, Yu Shi, Linjun Shou, Ming Gong, Houfeng Wang, Michael Zeng

In this paper, we attempt to bridge these two lines of research and propose a joint and domain adaptive approach to SLU.

Domain Adaptation Intent Detection +3

Paper
Add Code

Retrieval Enhanced Model for Commonsense Generation

1 code implementation • Findings (ACL) 2021 • Han Wang, Yang Liu, Chenguang Zhu, Linjun Shou, Ming Gong, Yichong Xu, Michael Zeng

Commonsense generation is a challenging task of generating a plausible sentence describing an everyday scenario using provided concepts.

Retrieval Sentence +1

Paper
Code

MediaSum: A Large-scale Media Interview Dataset for Dialogue Summarization

1 code implementation • NAACL 2021 • Chenguang Zhu, Yang Liu, Jie Mei, Michael Zeng

MediaSum, a large-scale media interview dataset consisting of 463. 6K transcripts with abstractive summaries.

Transfer Learning

Paper
Code

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model

no code implementations • 22 Feb 2021 • Junwei Liao, Yu Shi, Ming Gong, Linjun Shou, Sefik Eskimez, Liyang Lu, Hong Qu, Michael Zeng

Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and ASR system alike will be propagated to the next task in the pipeline.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders

no code implementations • 12 Feb 2021 • Junwei Liao, Yu Shi, Ming Gong, Linjun Shou, Hong Qu, Michael Zeng

However, the performance of using multiple encoders and decoders on zero-shot translation still lags behind universal NMT.

Denoising Machine Translation +2

Paper
Add Code

Speech-language Pre-training for End-to-end Spoken Language Understanding

no code implementations • 11 Feb 2021 • Yao Qian, Ximo Bian, Yu Shi, Naoyuki Kanda, Leo Shen, Zhen Xiao, Michael Zeng

End-to-end (E2E) spoken language understanding (SLU) can infer semantics directly from speech signal without cascading an automatic speech recognizer (ASR) with a natural language understanding (NLU) module.

Ranked #3 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)

Language Modelling Natural Language Understanding +1

Paper
Add Code

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

3 code implementations • 19 Jan 2021 • Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang

In this paper, we propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data, in which supervised phonetic CTC learning and phonetically-aware contrastive self-supervised learning are conducted in a multi-task learning manner.

Multi-Task Learning Representation Learning +3

389

Paper
Code

Fusing Context Into Knowledge Graph for Commonsense Question Answering

2 code implementations • Findings (ACL) 2021 • Yichong Xu, Chenguang Zhu, Ruochen Xu, Yang Liu, Michael Zeng, Xuedong Huang

However, although a KG contains rich structural information, it lacks the context to provide a more precise understanding of the concepts.

Ranked #4 on Common Sense Reasoning on CommonsenseQA (using extra training data)

Common Sense Reasoning Knowledge Graphs +3

106

Paper
Code

LSTM-LM with Long-Term History for First-Pass Decoding in Conversational Speech Recognition

no code implementations • 21 Oct 2020 • Xie Chen, Sarangarajan Parthasarathy, William Gale, Shuangyu Chang, Michael Zeng

The context information is captured by the hidden states of LSTM-LMs across utterance and can be used to guide the first-pass search effectively.

speech-recognition Speech Recognition

Paper
Add Code

Mixed-Lingual Pre-training for Cross-lingual Summarization

1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Ruochen Xu, Chenguang Zhu, Yu Shi, Michael Zeng, Xuedong Huang

Cross-lingual Summarization (CLS) aims at producing a summary in the target language for an article in the source language.

Translation

Paper
Code

SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding

1 code implementation • NAACL 2021 • Yu-An Chung, Chenguang Zhu, Michael Zeng

Besides conducting a self-supervised masked language modeling task on the two individual modules using unpaired speech and text, SPLAT aligns representations from the two modules in a shared latent space using a small amount of paired speech and text.

Language Modelling Masked Language Modeling +1

Paper
Code

JAKET: Joint Pre-training of Knowledge Graph and Language Understanding

no code implementations • 2 Oct 2020 • Donghan Yu, Chenguang Zhu, Yiming Yang, Michael Zeng

Knowledge graphs (KGs) contain rich information about world knowledge, entities and relations.

Knowledge Graphs Language Modelling +1

Paper
Add Code

Mind The Facts: Knowledge-Boosted Coherent Abstractive Text Summarization

no code implementations • 27 Jun 2020 • Beliz Gunel, Chenguang Zhu, Michael Zeng, Xuedong Huang

In this work, we propose a novel architecture that extends Transformer encoder-decoder architecture in order to improve on these shortcomings.

Abstractive Text Summarization Language Modelling +1

Paper
Add Code

Meta Dialogue Policy Learning

no code implementations • 3 Jun 2020 • Yumo Xu, Chenguang Zhu, Baolin Peng, Michael Zeng

Dialog policy determines the next-step actions for agents and hence is central to a dialogue system.

Meta-Learning Transfer Learning

Paper
Add Code

Data Augmentation for Spoken Language Understanding via Pretrained Language Models

1 code implementation • 29 Apr 2020 • Baolin Peng, Chenguang Zhu, Michael Zeng, Jianfeng Gao

The training of spoken language understanding (SLU) models often faces the problem of data scarcity.

Data Augmentation Spoken Language Understanding +1

Paper
Code

Improving Readability for Automatic Speech Recognition Transcription

no code implementations • 9 Apr 2020 • Junwei Liao, Sefik Emre Eskimez, Liyang Lu, Yu Shi, Ming Gong, Linjun Shou, Hong Qu, Michael Zeng

In this work, we propose a novel NLP task called ASR post-processing for readability (APR) that aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining

3 code implementations • Findings of the Association for Computational Linguistics 2020 • Chenguang Zhu, Ruochen Xu, Michael Zeng, Xuedong Huang

With the abundance of automatic meeting transcripts, meeting summarization is of great interest to both participants and other parties.

Abstractive Text Summarization Meeting Summarization

Paper
Code

Enhancing Factual Consistency of Abstractive Summarization

no code implementations • NAACL 2021 • Chenguang Zhu, William Hinthorn, Ruochen Xu, Qingkai Zeng, Michael Zeng, Xuedong Huang, Meng Jiang

Automatic abstractive summaries are found to often distort or fabricate facts in the article.

Abstractive Text Summarization Graph Attention

Paper
Add Code

Few-shot Natural Language Generation for Task-Oriented Dialog

2 code implementations • Findings of the Association for Computational Linguistics 2020 • Baolin Peng, Chenguang Zhu, Chunyuan Li, Xiujun Li, Jinchao Li, Michael Zeng, Jianfeng Gao

It is pre-trained on a large set of annotated NLG corpus to acquire the controllable generation ability, and fine-tuned with only a few domain-specific labels to adapt to new domains.

Ranked #4 on Data-to-Text Generation on MULTIWOZ 2.1

Data-to-Text Generation Few-Shot Learning

188

Paper
Code

TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising

no code implementations • Findings of the Association for Computational Linguistics 2020 • Ziyi Yang, Chenguang Zhu, Robert Gmyr, Michael Zeng, Xuedong Huang, Eric Darve

Text summarization aims to extract essential information from a piece of text and transform the text into a concise version.

Abstractive Text Summarization Denoising

Paper
Add Code

Leveraging Lead Bias for Zero-shot Abstractive News Summarization

no code implementations • 25 Dec 2019 • Chenguang Zhu, Ziyi Yang, Robert Gmyr, Michael Zeng, Xuedong Huang

A typical journalistic convention in news articles is to deliver the most salient information in the beginning, also known as the lead bias.

Domain Adaptation News Summarization

Paper
Add Code

Multi-task Learning for Natural Language Generation in Task-Oriented Dialogue

no code implementations • IJCNLP 2019 • Chenguang Zhu, Michael Zeng, Xuedong Huang

In this paper, we propose a novel multi-task learning framework, NLG-LM, for natural language generation.

Language Modelling Multi-Task Learning +1

Paper
Add Code

SIM: A Slot-Independent Neural Model for Dialogue State Tracking

no code implementations • WS 2019 • Chenguang Zhu, Michael Zeng, Xuedong Huang

In this paper, we put forward a slot-independent neural model (SIM) to track dialogue states while keeping the model complexity invariant to the number of dialogue slots.

Dialogue State Tracking Task-Oriented Dialogue Systems