no code implementations • 29 May 2024 • Vicky Zayats, Peter Chen, Melissa Ferrari, Dirk Padfield
In cross-modal tasks such as text-to-speech generation (TTS) where the output modality is speech, we show that using a pre-trained speech backbone results in superior performance to the baseline.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 29 May 2024 • Adam Fisch, Jacob Eisenstein, Vicky Zayats, Alekh Agarwal, Ahmad Beirami, Chirag Nagpal, Pete Shaw, Jonathan Berant
Moreover, to account for uncertainty in the reward model we are distilling from, we optimize against a family of reward models that, as a whole, is likely to include at least one reasonable proxy for the preference distribution.
no code implementations • 22 Jun 2023 • Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor, Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Velimirović, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats, Neil Zeghidour, Yu Zhang, Zhishuai Zhang, Lukas Zilka, Christian Frank
AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2.
1 code implementation • 19 May 2023 • Hua Shen, Vicky Zayats, Johann C. Rocholl, Daniel D. Walker, Dirk Padfield
Current disfluency detection models focus on individual utterances each from a single speaker.
no code implementations • NAACL 2022 • Angelica Chen, Vicky Zayats, Daniel D. Walker, Dirk Padfield
In modern interactive speech-based systems, speech is consumed and transcribed incrementally prior to having disfluencies removed.
no code implementations • EMNLP 2021 • Katrin Tomanek, Vicky Zayats, Dirk Padfield, Kara Vaillancourt, Fadi Biadsy
We demonstrate this on two speech adaptation tasks (atypical and accented speech) and for two state-of-the-art ASR architectures.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 21 Apr 2021 • Johann C. Rocholl, Vicky Zayats, Daniel D. Walker, Noah B. Murad, Aaron Schneider, Daniel J. Liebling
However, little exploration has been done in improving the size and inference time of the model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • EACL 2021 • Vicky Zayats, Kristina Toutanova, Mari Ostendorf
Tables in Web documents are pervasive and can be directly used to answer many of the queries searched on the Web, motivating their integration in question answering.
no code implementations • NAACL 2019 • Vicky Zayats, Mari Ostendorf
Disfluencies in spontaneous speech are known to be associated with prosodic disruptions.
no code implementations • 8 Apr 2019 • Vicky Zayats, Trang Tran, Richard Wright, Courtney Mansfield, Mari Ostendorf
This paper explores contexts associated with errors in transcrip-tion of spontaneous speech, shedding light on human perceptionof disfluencies and other conversational speech phenomena.
no code implementations • 17 Nov 2018 • Vicky Zayats, Mari Ostendorf
In this paper we introduce a novel pattern match neural network architecture that uses neighbor similarity scores as features, eliminating the need for feature engineering in a disfluency detection task.
no code implementations • TACL 2018 • Vicky Zayats, Mari Ostendorf
This paper presents a novel approach for modeling threaded discussions on social media using a graph-structured bidirectional LSTM which represents both hierarchical and temporal conversation structure.
no code implementations • 12 Apr 2016 • Vicky Zayats, Mari Ostendorf, Hannaneh Hajishirzi
We introduce a new approach for disfluency detection using a Bidirectional Long-Short Term Memory neural network (BLSTM).