1 code implementation • 1 Sep 2024 • Justin Lovelace, Soham Ray, Kwangyoun Kim, Kilian Q. Weinberger, Felix Wu
This work introduces Sample-Efficient Speech Diffusion (SESD), an algorithm for effective speech synthesis in modest data regimes through latent diffusion.
no code implementations • 16 Jan 2024 • Jiyang Tang, Kwangyoun Kim, Suwon Shon, Felix Wu, Prashant Sridhar, Shinji Watanabe
Compared to studies with similar motivations, the proposed loss operates directly on the cross attention weights and is easier to implement.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 23 Jul 2023 • Paloma Sodhi, Felix Wu, Ethan R. Elenberg, Kilian Q. Weinberger, Ryan Mcdonald
A common training technique for language models is teacher forcing (TF).
2 code implementations • 18 May 2023 • Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe
Conformer, a convolution-augmented Transformer variant, has become the de facto encoder architecture for speech processing due to its superior performance in various tasks, including automatic speech recognition (ASR), speech translation (ST) and spoken language understanding (SLU).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 27 Feb 2023 • Yifan Peng, Kwangyoun Kim, Felix Wu, Prashant Sridhar, Shinji Watanabe
Self-supervised speech representation learning (SSL) has shown to be effective in various downstream tasks, but SSL models are usually large and slow.
no code implementations • 20 Dec 2022 • Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan Sharma, Wei-Lun Wu, Hung-Yi Lee, Karen Livescu, Shinji Watanabe
In this work, we introduce several new annotated SLU benchmark tasks based on freely available speech data, which complement existing benchmarks and address gaps in the SLU evaluation landscape.
no code implementations • 16 Dec 2022 • Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu, Shinji Watanabe
During the fine-tuning stage, we introduce an auxiliary loss that encourages this context embedding vector to be similar to context vectors of surrounding segments.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • 30 Sep 2022 • Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu J. Han, Shinji Watanabe
Conformer, combining convolution and self-attention sequentially to capture both local and global information, has shown remarkable performance and is currently regarded as the state-of-the-art for automatic speech recognition (ASR).
Ranked #11 on
Speech Recognition
on LibriSpeech test-other
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 2 May 2022 • Felix Wu, Kwangyoun Kim, Shinji Watanabe, Kyu Han, Ryan Mcdonald, Kilian Q. Weinberger, Yoav Artzi
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data.
Ranked #3 on
Named Entity Recognition (NER)
on SLUE
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+7
1 code implementation • NAACL 2022 • Ankita Pasad, Felix Wu, Suwon Shon, Karen Livescu, Kyu J. Han
In this work we focus on low-resource spoken named entity recognition (NER) and address the question: Beyond self-supervised pre-training, how can we use external speech and/or text data that are not annotated for the task?
1 code implementation • 19 Nov 2021 • Suwon Shon, Ankita Pasad, Felix Wu, Pablo Brusco, Yoav Artzi, Karen Livescu, Kyu J. Han
Historically these have focused on automatic speech recognition (ASR), speaker identification, or other lower-level tasks.
Ranked #1 on
Named Entity Recognition (NER)
on SLUE
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+7
1 code implementation • 14 Sep 2021 • Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi
This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 17 Jun 2021 • Kwangyoun Kim, Felix Wu, Prashant Sridhar, Kyu J. Han, Shinji Watanabe
A Multi-mode ASR model can fulfill various latency requirements during inference -- when a larger latency becomes acceptable, the model can process longer future context to achieve higher accuracy and when a latency budget is not flexible, the model can be less dependent on future context but still achieve reliable accuracy.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 9 Feb 2021 • Ruihan Wu, Chuan Guo, Felix Wu, Rahul Kidambi, Laurens van der Maaten, Kilian Q. Weinberger
We develop a novel approach for paper bidding and assignment that is much more robust against such attacks.
1 code implementation • 22 Jun 2020 • Peter Cha, Paul Ginsparg, Felix Wu, Juan Carrasquilla, Peter L. McMahon, Eun-Ah Kim
Here we propose the "Attention-based Quantum Tomography" (AQT), a quantum state reconstruction using an attention mechanism-based generative network that learns the mixed state density matrix of a noisy quantum state.
1 code implementation • ICLR 2021 • Tianyi Zhang, Felix Wu, Arzoo Katiyar, Kilian Q. Weinberger, Yoav Artzi
We empirically test the impact of these factors, and identify alternative practices that resolve the commonly observed instability of the process.
1 code implementation • CVPR 2021 • Boyi Li, Felix Wu, Ser-Nam Lim, Serge Belongie, Kilian Q. Weinberger
The moments (a. k. a., mean and standard deviation) of latent features are often removed as noise when training image recognition models, to increase stability and reduce training time.
Ranked #32 on
Domain Generalization
on ImageNet-A
no code implementations • 28 Sep 2019 • Felix Wu, Boyi Li, Lequn Wang, Ni Lao, John Blitzer, Kilian Q. Weinberger
This paper introduces Integrated Triaging, a framework that prunes almost all context in early layers of a network, leaving the remaining (deep) layers to scan only a tiny fraction of the full corpus.
2 code implementations • NeurIPS 2019 • Boyi Li, Felix Wu, Kilian Q. Weinberger, Serge Belongie
A popular method to reduce the training time of deep neural networks is to normalize activations at each layer.
18 code implementations • ICLR 2020 • Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, Yoav Artzi
We propose BERTScore, an automatic evaluation metric for text generation.
2 code implementations • 28 Feb 2019 • Felix Wu, Boyi Li, Lequn Wang, Ni Lao, John Blitzer, Kilian Q. Weinberger
In this technical report, we introduce FastFusionNet, an efficient variant of FusionNet [12].
7 code implementations • 19 Feb 2019 • Felix Wu, Tianyi Zhang, Amauri Holanda de Souza Jr., Christopher Fifty, Tao Yu, Kilian Q. Weinberger
Graph Convolutional Networks (GCNs) and their variants have experienced significant attention and have become the de facto methods for learning graph representations.
Ranked #3 on
Text Classification
on Ohsumed
3 code implementations • ICLR 2019 • Felix Wu, Angela Fan, Alexei Baevski, Yann N. Dauphin, Michael Auli
We predict separate convolution kernels based solely on the current time-step in order to determine the importance of context elements.
Ranked #1 on
Machine Translation
on WMT 2017 English-Chinese
no code implementations • ICLR 2019 • Xiaoyun Wang, Minhao Cheng, Joe Eaton, Cho-Jui Hsieh, Felix Wu
In this paper, we propose a new type of "fake node attacks" to attack GCNs by adding malicious fake nodes.
5 code implementations • ICLR 2018 • Qiantong Xu, Gao Huang, Yang Yuan, Chuan Guo, Yu Sun, Felix Wu, Kilian Weinberger
Evaluating generative adversarial networks (GANs) is inherently challenging.
2 code implementations • ICLR 2018 • Felix Wu, Ni Lao, John Blitzer, Guandao Yang, Kilian Weinberger
State-of-the-art deep reading comprehension models are dominated by recurrent neural nets.
1 code implementation • NeurIPS 2017 • Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, Kilian Q. Weinberger
The machine learning community has become increasingly concerned with the potential for bias and discrimination in predictive models.
7 code implementations • ICLR 2018 • Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, Kilian Q. Weinberger
In this paper we investigate image classification with computational resource limits at test time.
General Classification
Handwritten Mathmatical Expression Recognition
+1