1 code implementation • NAACL (ACL) 2022 • Hung-Yi Lee, Abdelrahman Mohamed, Shinji Watanabe, Tara Sainath, Karen Livescu, Shang-Wen Li, Shu-wen Yang, Katrin Kirchhoff
Due to the growing popularity of SSL, and the shared mission of the areas in bringing speech and language technologies to more use cases with better quality and scaling the technologies for under-represented languages, we propose this tutorial to systematically survey the latest SSL techniques, tools, datasets, and performance achievement in speech processing.
no code implementations • 11 Apr 2025 • Siddhant Arora, Kai-Wei Chang, Chung-Ming Chien, Yifan Peng, Haibin Wu, Yossi Adi, Emmanuel Dupoux, Hung-Yi Lee, Karen Livescu, Shinji Watanabe
The field of spoken language processing is undergoing a shift from training custom-built, task-specific models toward using and optimizing spoken language models (SLMs) which act as universal speech processing systems.
no code implementations • 3 Feb 2025 • Martijn Bartelds, Ananjan Nandi, Moussa Koulako Bala Doumbouya, Dan Jurafsky, Tatsunori Hashimoto, Karen Livescu
This is common in domains like speech, where the widely used connectionist temporal classification (CTC) loss scales with input length and varies with linguistic and acoustic properties, leading to spurious differences between group losses.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 31 Dec 2024 • Yanhong Li, Karen Livescu, Jiawei Zhou
We introduce Chunk-Distilled Language Modeling (CD-LM), an approach to text generation that addresses two challenges in current large language models (LLMs): the inefficiency of token-level generation, and the difficulty of adapting to new data and knowledge.
no code implementations • 25 Nov 2024 • Shester Gueuwou, Xiaodan Du, Greg Shakhnarovich, Karen Livescu, Alexander H. Liu
Sign language processing has traditionally relied on task-specific models, limiting the potential for transfer learning across tasks.
no code implementations • 8 Nov 2024 • Chien-yu Huang, Wei-Chih Chen, Shu-wen Yang, Andy T. Liu, Chen-An Li, Yu-Xiang Lin, Wei-Cheng Tseng, Anuj Diwan, Yi-Jen Shih, Jiatong Shi, William Chen, Xuanjun Chen, Chi-Yuan Hsiao, Puyuan Peng, Shih-Heng Wang, Chun-Yi Kuan, Ke-Han Lu, Kai-Wei Chang, Chih-Kai Yang, Fabian Ritter-Gutierrez, Ming To Chuang, Kuan-Po Huang, Siddhant Arora, You-Kuan Lin, Eunjung Yeo, Kalvin Chang, Chung-Ming Chien, Kwanghee Choi, Cheng-Hsiu Hsieh, Yi-Cheng Lin, Chee-En Yu, I-Hsiang Chiu, Heitor R. Guimarães, Jionghao Han, Tzu-Quan Lin, Tzu-Yuan Lin, Homu Chang, Ting-Wu Chang, Chun Wei Chen, Shou-Jen Chen, Yu-Hua Chen, Hsi-Chun Cheng, Kunal Dhawan, Jia-Lin Fang, Shi-Xin Fang, Kuan-Yu Fang Chiang, Chi An Fu, Hsien-Fu Hsiao, Ching Yu Hsu, Shao-Syuan Huang, Lee Chen Wei, Hsi-Che Lin, Hsuan-Hao Lin, Hsuan-Ting Lin, Jian-Ren Lin, Ting-Chun Liu, Li-Chun Lu, Tsung-Min Pai, Ankita Pasad, Shih-Yun Shan Kuan, Suwon Shon, Yuxun Tang, Yun-Shao Tsai, Jui-Chiang Wei, Tzu-Chieh Wei, Chengxi Wu, Dien-Ruei Wu, Chao-Han Huck Yang, Chieh-Chi Yang, Jia Qi Yip, Shao-Xiang Yuan, Vahid Noroozi, Zhehuai Chen, Haibin Wu, Karen Livescu, David Harwath, Shinji Watanabe, Hung-Yi Lee
We present Dynamic-SUPERB Phase-2, an open and evolving benchmark for the comprehensive evaluation of instruction-based universal speech models.
1 code implementation • 21 Aug 2024 • David Yunis, Kumar Kshitij Patel, Samuel Wheeler, Pedro Savarese, Gal Vardi, Karen Livescu, Michael Maire, Matthew R. Walter
We propose an empirical approach centered on the spectral dynamics of weights -- the behavior of singular values and vectors during optimization -- to unify and clarify several phenomena in deep learning.
no code implementations • 30 Jun 2024 • William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, Shinji Watanabe
We propose XEUS, a Cross-lingual Encoder for Universal Speech, trained on over 1 million hours of data across 4057 languages, extending the language coverage of SSL models 4-fold.
no code implementations • 14 Jun 2024 • Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-Yi Lee, Karen Livescu, Shinji Watanabe
To answer this, we perform an extensive evaluation of multiple supervised and self-supervised SFMs using several evaluation protocols: (i) frozen SFMs with a lightweight prediction head, (ii) frozen SFMs with a complex prediction head, and (iii) fine-tuned SFMs with a lightweight prediction head.
no code implementations • 13 Jun 2024 • Jinchuan Tian, Yifan Peng, William Chen, Kwanghee Choi, Karen Livescu, Shinji Watanabe
The Open Whisper-style Speech Model (OWSM) series was introduced to achieve full transparency in building advanced speech-to-text (S2T) foundation models.
no code implementations • 13 Jun 2024 • Suwon Shon, Kwangyoun Kim, Yi-Te Hsu, Prashant Sridhar, Shinji Watanabe, Karen Livescu
This integration requires the use of a speech encoder, a speech adapter, and an LLM, trained on diverse tasks.
1 code implementation • 12 Jun 2024 • Kwanghee Choi, Ankita Pasad, Tomohiko Nakamura, Satoru Fukayama, Karen Livescu, Shinji Watanabe
Self-supervised speech models (S3Ms) have become an effective backbone for speech applications.
no code implementations • 12 Jun 2024 • Jiatong Shi, Shih-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, Jinchuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-Yi Lee, Shinji Watanabe
This paper presents ML-SUPERB~2. 0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models across downstream models, fine-tuning setups, and efficient model adaptation approaches.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 11 Jun 2024 • Shester Gueuwou, Xiaodan Du, Greg Shakhnarovich, Karen Livescu
A persistent challenge in sign language video processing, including the task of sign language to written language translation, is how we learn representations of sign language in an effective and efficient way that can preserve the important attributes of these languages, while remaining invariant to irrelevant visual differences.
1 code implementation • 21 Feb 2024 • Freda Shi, Kevin Gimpel, Karen Livescu
We present the structured average intersection-over-union ratio (STRUCT-IOU), a similarity metric between constituency parse trees motivated by the problem of evaluating speech parsers.
no code implementations • 15 Dec 2023 • Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe, Karen Livescu
Considering the recent advances in generative large language models (LLM), we hypothesize that an LLM could generate useful context information using the preceding text.
no code implementations • 12 Oct 2023 • Ju-chieh Chou, Chung-Ming Chien, Wei-Ning Hsu, Karen Livescu, Arun Babu, Alexis Conneau, Alexei Baevski, Michael Auli
However, in the field of language modeling, very little effort has been made to model them jointly.
no code implementations • 11 Oct 2023 • Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass
We study phrase structure induction from visually-grounded speech.
no code implementations • 9 Oct 2023 • Chung-Ming Chien, Mingjiamei Zhang, Ju-chieh Chou, Karen Livescu
Recent work on speech representation models jointly pre-trained with text has demonstrated the potential of improving speech representations by encoding speech and text in a shared space.
no code implementations • 4 Oct 2023 • Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, Shinji Watanabe
Recent studies leverage large language models with multi-tasking capabilities, using natural language prompts to guide the model's behavior and surpassing performance of task-specific models.
Ranked #1 on
Spoken Language Understanding
on Fluent Speech Commands
(using extra training data)
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 14 Sep 2023 • Ju-chieh Chou, Chung-Ming Chien, Karen Livescu
In this work, we introduce AV2Wav, a resynthesis-based audio-visual speech enhancement approach that can generate clean speech despite the challenges of real-world training data.
no code implementations • 2 Sep 2023 • Marcelo Sandoval-Castaneda, Yanhong Li, Diane Brentari, Karen Livescu, Gregory Shakhnarovich
This paper presents an in-depth analysis of various self-supervision methods for isolated sign language recognition (ISLR).
1 code implementation • 30 Jun 2023 • Ankita Pasad, Chung-Ming Chien, Shane Settle, Karen Livescu
Many self-supervised speech models (S3Ms) have been introduced over the last few years, improving performance and data efficiency on various speech tasks.
no code implementations • 20 Dec 2022 • Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan Sharma, Wei-Lun Wu, Hung-Yi Lee, Karen Livescu, Shinji Watanabe
In this work, we introduce several new annotated SLU benchmark tasks based on freely available speech data, which complement existing benchmarks and address gaps in the SLU evaluation landscape.
no code implementations • 16 Dec 2022 • Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu, Shinji Watanabe
During the fine-tuning stage, we introduce an auxiliary loss that encourages this context embedding vector to be similar to context vectors of surrounding segments.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • 8 Nov 2022 • Ankita Pasad, Bowen Shi, Karen Livescu
We further investigate the utility of our analyses for downstream tasks by comparing the property trends with performance on speech recognition and spoken language understanding tasks.
5 code implementations • 9 Jun 2022 • Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu
BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.
1 code implementation • 25 May 2022 • Bowen Shi, Diane Brentari, Greg Shakhnarovich, Karen Livescu
Existing work on sign language translation - that is, translation from sign language videos into sentences in a written language - has focused mainly on (1) data collected in a controlled environment or (2) data in a specific domain, which limits the applicability to real-world settings.
Ranked #2 on
Gloss-free Sign Language Translation
on OpenASL
Gloss-free Sign Language Translation
Sign Language Translation
+1
no code implementations • 21 May 2022 • Abdelrahman Mohamed, Hung-Yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe
Although self-supervised speech representation is still a nascent research area, it is closely related to acoustic word embedding and learning with zero lexical resources, both of which have seen active research for many years.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • ACL 2022 • Bowen Shi, Diane Brentari, Greg Shakhnarovich, Karen Livescu
This is an important task since significant content in sign language is often conveyed via fingerspelling, and to our knowledge the task has not been studied before.
1 code implementation • NAACL 2022 • Ankita Pasad, Felix Wu, Suwon Shon, Karen Livescu, Kyu J. Han
In this work we focus on low-resource spoken named entity recognition (NER) and address the question: Beyond self-supervised pre-training, how can we use external speech and/or text data that are not annotated for the task?
1 code implementation • 19 Nov 2021 • Suwon Shon, Ankita Pasad, Felix Wu, Pablo Brusco, Yoav Artzi, Karen Livescu, Kyu J. Han
Historically these have focused on automatic speech recognition (ASR), speaker identification, or other lower-level tasks.
Ranked #1 on
Named Entity Recognition (NER)
on SLUE
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+7
no code implementations • ACL 2022 • Haoyue Shi, Kevin Gimpel, Karen Livescu
We present substructure distribution projection (SubDP), a technique that projects a distribution over structures in one domain to another, by projecting substructure distributions separately.
2 code implementations • CRAC (ACL) 2021 • Shubham Toshniwal, Patrick Xia, Sam Wiseman, Karen Livescu, Kevin Gimpel
While coreference resolution is defined independently of dataset domain, most models for performing coreference resolution do not transfer well to unseen domains.
Ranked #1 on
Coreference Resolution
on Quizbowl
1 code implementation • 10 Jul 2021 • Ankita Pasad, Ju-chieh Chou, Karen Livescu
Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • CVPR 2021 • Bowen Shi, Diane Brentari, Greg Shakhnarovich, Karen Livescu
We propose a benchmark and a suite of evaluation metrics, some of which reflect the effect of detection on the downstream fingerspelling recognition task.
2 code implementations • 26 Feb 2021 • Shubham Toshniwal, Sam Wiseman, Karen Livescu, Kevin Gimpel
Motivated by this issue, we consider the task of language modeling for the game of chess.
no code implementations • Findings (ACL) 2021 • Haoyue Shi, Karen Livescu, Kevin Gimpel
We study a family of data augmentation methods, substructure substitution (SUB2), for natural language processing (NLP) tasks.
no code implementations • 1 Jan 2021 • Shubham Toshniwal, Sam Wiseman, Karen Livescu, Kevin Gimpel
Motivated by this issue, we consider the task of language modeling for the game of chess.
no code implementations • 3 Dec 2020 • Puyuan Peng, Herman Kamper, Karen Livescu
We propose a new unsupervised model for mapping a variable-duration speech segment to a fixed-dimensional representation.
1 code implementation • 24 Nov 2020 • Yushi Hu, Shane Settle, Karen Livescu
In this work, we generalize AWE training to spans of words, producing acoustic span embeddings (ASE), and explore the application of ASE to QbE with arbitrary-length queries in multiple unseen languages.
3 code implementations • EMNLP 2020 • Shubham Toshniwal, Sam Wiseman, Allyson Ettinger, Karen Livescu, Kevin Gimpel
Long document coreference resolution remains a challenging task due to the large memory and runtime requirements of current models.
Ranked #10 on
Coreference Resolution
on CoNLL 2012
no code implementations • EMNLP 2020 • Haoyue Shi, Karen Livescu, Kevin Gimpel
We analyze several recent unsupervised constituency parsing models, which are tuned with respect to the parsing $F_1$ score on the Wall Street Journal (WSJ) development set (1, 700 sentences).
1 code implementation • 1 Jul 2020 • Bowen Shi, Shane Settle, Karen Livescu
We find that word error rate can be reduced by a large margin by pre-training the acoustic segment representation with AWEs, and additional (smaller) gains can be obtained by pre-training the word prediction layer with AGWEs.
2 code implementations • 24 Jun 2020 • Yushi Hu, Shane Settle, Karen Livescu
The pre-trained models can then be used for unseen zero-resource languages, or fine-tuned on data from low-resource languages.
1 code implementation • ACL 2020 • Shuning Jin, Sam Wiseman, Karl Stratos, Karen Livescu
While much work on deep latent variable models of text uses continuous latent variables, discrete latent variables are interesting because they are more interpretable and typically more space efficient.
1 code implementation • WS 2020 • Shubham Toshniwal, Haoyue Shi, Bowen Shi, Lingyu Gao, Karen Livescu, Kevin Gimpel
Many natural language processing (NLP) tasks involve reasoning with textual spans, including question answering, entity recognition, and coreference resolution.
1 code implementation • ACL 2020 • Shubham Toshniwal, Allyson Ettinger, Kevin Gimpel, Karen Livescu
We propose PeTra, a memory-augmented neural network designed to track entities in its memory slots.
Ranked #1 on
Coreference Resolution
on GAP
(F1 metric)
2 code implementations • ICCV 2019 • Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Diane Brentari, Greg Shakhnarovich, Karen Livescu
In this paper we focus on recognition of fingerspelling sequences in American Sign Language (ASL) videos collected in the wild, mainly from YouTube and Deaf social media.
1 code implementation • EMNLP 2018 • Mingda Chen, Qingming Tang, Karen Livescu, Kevin Gimpel
Our model family consists of a latent-variable generative model and a discriminative labeler.
Ranked #73 on
Named Entity Recognition (NER)
on CoNLL 2003 (English)
no code implementations • ACL 2019 • Haoyue Shi, Jiayuan Mao, Kevin Gimpel, Karen Livescu
We define concreteness of constituents by their matching scores with images, and use it to guide the parsing of text.
no code implementations • ICLR 2019 • Qingming Tang, Mingda Chen, Weiran Wang, Karen Livescu
Existing variational recurrent models typically use stochastic recurrent connections to model the dependence among neighboring latent variables, while generation assumes independence of generated data per time step given the latent sequence.
no code implementations • 24 Apr 2019 • Ankita Pasad, Bowen Shi, Herman Kamper, Karen Livescu
Recent work has shown that speech paired with images can be used to learn semantically meaningful speech representations even without any textual supervision.
1 code implementation • 15 Apr 2019 • Herman Kamper, Aristotelis Anastassiou, Karen Livescu
A number of recent studies have started to investigate how speech systems can be trained on untranscribed speech by leveraging accompanying images at training time.
no code implementations • 29 Mar 2019 • Shane Settle, Kartik Audhkhasi, Karen Livescu, Michael Picheny
Direct acoustics-to-word (A2W) systems for end-to-end automatic speech recognition are simpler to train, and more efficient to decode with, than sub-word systems.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 26 Oct 2018 • Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Jonathan Michaux, Diane Brentari, Greg Shakhnarovich, Karen Livescu
As the first attempt at fingerspelling recognition in the wild, this work is intended to serve as a baseline for future work on sign language recognition in realistic conditions.
1 code implementation • NAACL 2019 • Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, Sharon Goldwater
Finally, we show that the approach improves performance on a true low-resource task: pre-training on a combination of English ASR and French ASR improves Mboshi-French ST, where only 4 hours of data are available, from 3. 5 to 7. 1
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • 27 Jul 2018 • Shubham Toshniwal, Anjuli Kannan, Chung-Cheng Chiu, Yonghui Wu, Tara N. Sainath, Karen Livescu
In this paper, we compare a suite of past methods and some of our own proposed methods for using unpaired text data to improve encoder-decoder models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 17 Jul 2018 • Kalpesh Krishna, Shubham Toshniwal, Karen Livescu
Previous work has shown that neural encoder-decoder speech recognition can be improved with hierarchical multitask learning, where auxiliary tasks are added at intermediate layers of a deep encoder.
no code implementations • 24 Mar 2018 • Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, Sharon Goldwater
We explore models trained on between 20 and 160 hours of data, and find that although models trained on less data have considerably lower BLEU scores, they can still predict words with relatively high precision and recall---around 50% for a model trained on 50 hours of data, versus around 60% for the full 160 hour model.
no code implementations • 19 Mar 2018 • Qingming Tang, Weiran Wang, Karen Livescu
Previous work has shown that it is possible to improve speech recognition by learning acoustic features from paired acoustic-articulatory data, for example by using canonical correlation analysis (CCA) or its deep extensions.
no code implementations • 28 Oct 2017 • Kalpesh Krishna, Liang Lu, Kevin Gimpel, Karen Livescu
We explore whether deep convolutional neural networks (CNNs) can be used effectively instead of RNNs as the "encoder" in CTC.
no code implementations • 9 Oct 2017 • Bowen Shi, Karen Livescu
We introduce a model for fingerspelling recognition that addresses these issues.
2 code implementations • 5 Oct 2017 • Herman Kamper, Gregory Shakhnarovich, Karen Livescu
We introduce a newly collected data set of human semantic relevance judgements and an associated task, semantic speech retrieval, where the goal is to search for spoken utterances that are semantically relevant to a given text query.
no code implementations • 11 Aug 2017 • Qingming Tang, Weiran Wang, Karen Livescu
We study the problem of acoustic feature learning in the setting where we have access to another (non-acoustic) modality for feature learning but not at test time.
no code implementations • 1 Aug 2017 • Hao Tang, Liang Lu, Lingpeng Kong, Kevin Gimpel, Karen Livescu, Chris Dyer, Noah A. Smith, Steve Renals
Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time.
1 code implementation • 12 Jun 2017 • Shane Settle, Keith Levin, Herman Kamper, Karen Livescu
Query-by-example search often uses dynamic time warping (DTW) for comparing queries and proposed matching segments.
no code implementations • WS 2017 • Lifu Tu, Kevin Gimpel, Karen Livescu
We present models for embedding words in the context of surrounding words.
1 code implementation • NAACL 2018 • Trang Tran, Shubham Toshniwal, Mohit Bansal, Kevin Gimpel, Karen Livescu, Mari Ostendorf
In conversational speech, the acoustic signal provides cues that help listeners disambiguate difficult parses.
no code implementations • 5 Apr 2017 • Shubham Toshniwal, Hao Tang, Liang Lu, Karen Livescu
We hypothesize that using intermediate representations as auxiliary supervision at lower levels of deep networks may be a good way of combining the advantages of end-to-end training and more traditional pipeline approaches.
1 code implementation • 23 Mar 2017 • Herman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu
In this setting of images paired with untranscribed spoken captions, we consider whether computer vision systems can be used to obtain textual labels for the speech.
2 code implementations • 23 Mar 2017 • Herman Kamper, Karen Livescu, Sharon Goldwater
Unsupervised segmentation and clustering of unlabelled speech are core problems in zero-resource speech processing.
no code implementations • 14 Nov 2016 • Wanjia He, Weiran Wang, Karen Livescu
Recent work has begun exploring neural acoustic word embeddings---fixed-dimensional vector representations of arbitrary-length speech segments corresponding to words.
no code implementations • 8 Nov 2016 • Shane Settle, Karen Livescu
Acoustic word embeddings --- fixed-dimensional vector representations of variable-length spoken word segments --- have begun to be considered for tasks such as speech recognition and query-by-example search.
no code implementations • 21 Oct 2016 • Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu
Similarly to hybrid HMM-neural network models, segmental models of this class can be trained in two stages (frame classifier training followed by linear segmental model weight training), end to end (joint training of both frame classifier and linear weights), or with end-to-end fine-tuning after two-stage training.
1 code implementation • 20 Oct 2016 • Shubham Toshniwal, Karen Livescu
We propose an attention-enabled encoder-decoder model for the problem of grapheme-to-phoneme conversion.
no code implementations • 11 Oct 2016 • Weiran Wang, Xinchen Yan, Honglak Lee, Karen Livescu
We present deep variational canonical correlation analysis (VCCA), a deep multi-view learning model that extends the latent variable model interpretation of linear CCA to nonlinear observation models parameterized by deep neural networks.
no code implementations • 26 Sep 2016 • Taehwan Kim, Jonathan Keane, Weiran Wang, Hao Tang, Jason Riggle, Gregory Shakhnarovich, Diane Brentari, Karen Livescu
Recognizing fingerspelling is challenging for a number of reasons: It involves quick, small motions that are often highly coarticulated; it exhibits significant variation between signers; and there has been a dearth of continuous fingerspelling data collected.
no code implementations • 2 Aug 2016 • Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu
Discriminative segmental models offer a way to incorporate flexible feature functions into speech recognition.
no code implementations • EMNLP 2016 • John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu
We present Charagram embeddings, a simple approach for learning character-based compositional models to embed textual sequences.
no code implementations • 13 Feb 2016 • Taehwan Kim, Weiran Wang, Hao Tang, Karen Livescu
Previous work has shown that it is possible to achieve almost 90% accuracies on fingerspelling recognition in a signer-dependent setting.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 2 Feb 2016 • Weiran Wang, Raman Arora, Karen Livescu, Jeff Bilmes
We consider learning representations (features) in the setting in which we have access to multiple unlabeled views of the data for learning while only one view is available for downstream tasks.
no code implementations • 25 Nov 2015 • John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu
We again find that the word averaging models perform well for sentence similarity and entailment, outperforming LSTMs.
no code implementations • 16 Nov 2015 • Tomer Michaeli, Weiran Wang, Karen Livescu
Several nonlinear extensions of the original linear CCA have been proposed, including kernel and deep neural network methods.
no code implementations • 15 Nov 2015 • Weiran Wang, Karen Livescu
Kernel canonical correlation analysis (KCCA) is a nonlinear multi-view representation learning technique with broad applicability in statistics and machine learning.
no code implementations • WS 2016 • Pranava Swaroop Madhyastha, Mohit Bansal, Kevin Gimpel, Karen Livescu
We consider the supervised training setting in which we learn task-specific word embeddings.
no code implementations • 7 Oct 2015 • Weiran Wang, Raman Arora, Karen Livescu, Nathan Srebro
Deep CCA is a recently proposed deep neural network extension to the traditional canonical correlation analysis (CCA), and has been successful for multi-view representation learning in several domains.
1 code implementation • 5 Oct 2015 • Herman Kamper, Weiran Wang, Karen Livescu
Recent studies have been revisiting whole words as the basic modelling unit in speech recognition and query applications, instead of phonetic units.
no code implementations • 22 Jul 2015 • Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu
A typical solution is to use approximate decoding, either by beam pruning in a single pass or by beam pruning to generate a lattice followed by a second pass.
1 code implementation • TACL 2015 • John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu, Dan Roth
The Paraphrase Database (PPDB; Ganitkevitch et al., 2013) is an extensive semantic resource, consisting of a list of phrase pairs with (heuristic) confidence estimates.