no code implementations • 24 May 2023 • Chenyang Le, Yao Qian, Long Zhou, Shujie Liu, Michael Zeng, Xuedong Huang
Joint speech-language training is challenging due to the large demand for training data and GPU consumption, as well as the modality gap between speech and language.
no code implementations • 23 May 2023 • Yuwei Fang, Mahmoud Khademi, Chenguang Zhu, ZiYi Yang, Reid Pryzant, Yichong Xu, Yao Qian, Takuya Yoshioka, Lu Yuan, Michael Zeng, Xuedong Huang
Artificial General Intelligence (AGI) requires comprehensive understanding and generation capabilities for a variety of tasks spanning different modalities and functionalities.
no code implementations • 21 May 2023 • ZiYi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Mei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang
The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities.
no code implementations • 21 Aug 2022 • Pengcheng He, Baolin Peng, Liyang Lu, Song Wang, Jie Mei, Yang Liu, Ruochen Xu, Hany Hassan Awadalla, Yu Shi, Chenguang Zhu, Wayne Xiong, Michael Zeng, Jianfeng Gao, Xuedong Huang
Z-Code++ creates new state of the art on 9 out of 13 text summarization tasks across 5 languages.
no code implementations • 3 May 2022 • ZiYi Yang, Yuwei Fang, Chenguang Zhu, Reid Pryzant, Dongdong Chen, Yu Shi, Yichong Xu, Yao Qian, Mei Gao, Yi-Ling Chen, Liyang Lu, Yujia Xie, Robert Gmyr, Noel Codella, Naoyuki Kanda, Bin Xiao, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang
Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview.
2 code implementations • 6 Dec 2021 • Yichong Xu, Chenguang Zhu, Shuohang Wang, Siqi Sun, Hao Cheng, Xiaodong Liu, Jianfeng Gao, Pengcheng He, Michael Zeng, Xuedong Huang
In particular, we focus on the task of Commonsense Reasoning, demonstrating that the proposed external attention mechanism can augment existing transformer models and significantly improve the model's reasoning capabilities.
Ranked #2 on Common Sense Reasoning on CommonsenseQA (using extra training data)
1 code implementation • 22 Nov 2021 • Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, JianFeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang
Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.
Ranked #1 on Action Recognition In Videos on Kinetics-600
no code implementations • 20 Oct 2021 • Hassan Taherian, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Zhuo Chen, Xuedong Huang
Experimental results show that the proposed geometry agnostic model outperforms the model trained on a specific microphone array geometry in both speech quality and automatic speech recognition accuracy.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 18 Oct 2021 • Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang, Zhuo Chen, Xuedong Huang
Our results show that the proposed models can yield better speech recognition accuracy, speech intelligibility, and perceptual quality than the baseline models, and the multi-task training can alleviate the TSOS issue in addition to improving the speech recognition accuracy.
3 code implementations • 19 Jan 2021 • Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang
In this paper, we propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data, in which supervised phonetic CTC learning and phonetically-aware contrastive self-supervised learning are conducted in a multi-task learning manner.
2 code implementations • Findings (ACL) 2021 • Yichong Xu, Chenguang Zhu, Ruochen Xu, Yang Liu, Michael Zeng, Xuedong Huang
However, although a KG contains rich structural information, it lacks the context to provide a more precise understanding of the concepts.
Ranked #3 on Common Sense Reasoning on CommonsenseQA (using extra training data)
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Ruochen Xu, Chenguang Zhu, Yu Shi, Michael Zeng, Xuedong Huang
Cross-lingual Summarization (CLS) aims at producing a summary in the target language for an article in the source language.
no code implementations • 27 Jun 2020 • Beliz Gunel, Chenguang Zhu, Michael Zeng, Xuedong Huang
In this work, we propose a novel architecture that extends Transformer encoder-decoder architecture in order to improve on these shortcomings.
3 code implementations • Findings of the Association for Computational Linguistics 2020 • Chenguang Zhu, Ruochen Xu, Michael Zeng, Xuedong Huang
With the abundance of automatic meeting transcripts, meeting summarization is of great interest to both participants and other parties.
no code implementations • NAACL 2021 • Chenguang Zhu, William Hinthorn, Ruochen Xu, Qingkai Zeng, Michael Zeng, Xuedong Huang, Meng Jiang
Automatic abstractive summaries are found to often distort or fabricate facts in the article.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Ziyi Yang, Chenguang Zhu, Robert Gmyr, Michael Zeng, Xuedong Huang, Eric Darve
Text summarization aims to extract essential information from a piece of text and transform the text into a concise version.
no code implementations • 25 Dec 2019 • Chenguang Zhu, Ziyi Yang, Robert Gmyr, Michael Zeng, Xuedong Huang
A typical journalistic convention in news articles is to deliver the most salient information in the beginning, also known as the lead bias.
no code implementations • 10 Dec 2019 • Takuya Yoshioka, Igor Abramovski, Cem Aksoylar, Zhuo Chen, Moshe David, Dimitrios Dimitriadis, Yifan Gong, Ilya Gurvich, Xuedong Huang, Yan Huang, Aviv Hurvitz, Li Jiang, Sharon Koubi, Eyal Krupka, Ido Leichter, Changliang Liu, Partha Parthasarathy, Alon Vinnikov, Lingfeng Wu, Xiong Xiao, Wayne Xiong, Huaming Wang, Zhenghao Wang, Jun Zhang, Yong Zhao, Tianyan Zhou
This increases marginally to 1. 6% when 50% of the attendees are unknown to the system.
no code implementations • IJCNLP 2019 • Chenguang Zhu, Michael Zeng, Xuedong Huang
In this paper, we propose a novel multi-task learning framework, NLG-LM, for natural language generation.
no code implementations • WS 2019 • Chenguang Zhu, Michael Zeng, Xuedong Huang
In this paper, we put forward a slot-independent neural model (SIM) to track dialogue states while keeping the model complexity invariant to the number of dialogue slots.
no code implementations • 25 Sep 2019 • Chenguang Zhu, ZiYi Yang, Robert Gmyr, Michael Zeng, Xuedong Huang
For example, the pretrained model without finetuning outperforms pointer-generator network on CNN/DailyMail dataset.
no code implementations • 3 May 2019 • Takuya Yoshioka, Zhuo Chen, Dimitrios Dimitriadis, William Hinthorn, Xuedong Huang, Andreas Stolcke, Michael Zeng
The speaker-attributed WER (SAWER) is 26. 7%.
3 code implementations • 10 Dec 2018 • Chenguang Zhu, Michael Zeng, Xuedong Huang
Conversational question answering (CQA) is a novel QA task that requires understanding of dialogue context.
Ranked #3 on Question Answering on CoQA (Overall metric)
2 code implementations • 15 Mar 2018 • Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dong-dong Zhang, Zhirui Zhang, Ming Zhou
Machine translation has made rapid advances in recent years.
Ranked #3 on Machine Translation on WMT 2017 English-Chinese