no code implementations • NAACL (GeBNLP) 2022 • Amanda Bertsch, Ashley Oh, Sanika Natu, Swetha Gangu, Alan W. black, Emma Strubell
We extend our analysis to a longitudinal study of bias in film dialogue over the last 110 years and find that continued pre-training on OpenSubtitles encodes additional bias into BERT.
no code implementations • SEMEVAL 2020 • Sopan Khosla, Rishabh Joshi, Ritam Dutt, Alan W. black, Yulia Tsvetkov
In this paper we describe our submission for the task of Propaganda Span Identification in news articles.
no code implementations • 25 Jul 2020 • Amrith Setlur, Barnabas Poczos, Alan W. black
This paper extends recent work on nonlinear Independent Component Analysis (ICA) by introducing a theoretical framework for nonlinear Independent Subspace Analysis (ISA) in the presence of auxiliary variables.
no code implementations • WS 2020 • Elijah Mayfield, Alan W. black
Most natural language processing research now recommends large Transformer-based models with fine-tuning for supervised classification tasks; older strategies like bag-of-words features and linear models have fallen out of favor.
1 code implementation • ACL 2020 • Vaibhav Kumar, Alan W. black
In order to overcome these limitations, we devise a novel bootstrapping framework (based on self-supervision) that assists in the creation of a diverse, large-scale dataset of clarification questions based on post-comment tuples extracted from stackexchange.
no code implementations • ACL 2020 • Elizabeth Salesky, Eleanor Chodroff, Tiago Pimentel, Matthew Wiesner, Ryan Cotterell, Alan W. black, Jason Eisner
A major hurdle in data-driven research on typology is having sufficient data in many languages to draw meaningful conclusions.
1 code implementation • ACL 2020 • Elizabeth Salesky, Alan W. black
End-to-end models for speech translation (ST) more tightly couple speech recognition (ASR) and machine translation (MT) than a traditional cascade of separate ASR and MT models, with simpler model architectures and the potential for reduced error propagation.
no code implementations • COLING 2020 • Shrimai Prabhumoye, Alan W. black, Ruslan Salakhutdinov
In this work, we provide a new schema of the pipeline of the generation process by classifying it into five modules.
no code implementations • 1 May 2020 • Khyathi Raghavi Chandu, Alan W. black
We believe this viewpoint of CS as style variations opens new perspectives for modeling various tasks in CS text.
2 code implementations • ACL 2020 • Shrimai Prabhumoye, Ruslan Salakhutdinov, Alan W. black
Sentence ordering is the task of arranging the sentences of a given text in the correct order.
1 code implementation • ACL 2020 • Aman Madaan, Amrith Setlur, Tanmay Parekh, Barnabas Poczos, Graham Neubig, Yiming Yang, Ruslan Salakhutdinov, Alan W. black, Shrimai Prabhumoye
This paper introduces a new task of politeness transfer which involves converting non-polite sentences to polite sentences while preserving the meaning.
no code implementations • LREC 2020 • David R. Mortensen, Xinjian Li, Patrick Littell, Alexis Michaud, Shruti Rijhwani, Antonios Anastasopoulos, Alan W. black, Florian Metze, Graham Neubig
While phonemic representations are language specific, phonetic representations (stated in terms of (allo)phones) are much closer to a universal (language-independent) transcription.
1 code implementation • 26 Feb 2020 • Xinjian Li, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R. Mortensen, Graham Neubig, Alan W. black, Florian Metze
Multilingual models can improve language processing, particularly for low resource situations, by sharing parameters across languages.
no code implementations • 26 Feb 2020 • Xinjian Li, Siddharth Dalmia, David R. Mortensen, Juncheng Li, Alan W. black, Florian Metze
The difficulty of this task is that phoneme inventories often differ between the training languages and the target language, making it infeasible to recognize unseen phonemes.
no code implementations • 10 Jan 2020 • Yiyuan Li, Antonios Anastasopoulos, Alan W. black
Current grammatical error correction (GEC) models typically consider the task as sequence generation, which requires large amounts of annotated data and limit the applications in data-limited settings.
1 code implementation • LREC 2020 • Mingjun Duan, Carlos Fasola, Sai Krishna Rallabandi, Rodolfo M. Vega, Antonios Anastasopoulos, Lori Levin, Alan W. black
We present a resource for computational experiments on Mapudungun, a polysynthetic indigenous language spoken in Chile with upwards of 200 thousand speakers.
1 code implementation • IJCNLP 2019 • Abhilasha Ravichander, Alan W. black, Shomir Wilson, Thomas Norton, Norman Sadeh
The PrivacyQA corpus offers a challenging corpus for question answering, with genuine real-world utility.
no code implementations • WS 2019 • Shirley Anugrah Hayati, Aditi Chaudhary, Naoki Otani, Alan W. black
Irony detection is an important task with applications in identification of online abuse and harassment.
no code implementations • WS 2019 • Isak Czeresnia Etinger, Alan W. black
Typical datasets used for style transfer in NLP contain aligned pairs of two opposite extremes of a style.
no code implementations • WS 2019 • Wenchao Du, Alan W. black
Recent advances in deep learning have shown promises in solving complex combinatorial optimization problems, such as sorting variable-sized sequences.
no code implementations • WS 2019 • James Route, Steven Hillis, Isak Czeresnia Etinger, Han Zhang, Alan W. black
Grapheme-to-phoneme conversion (g2p) is the task of predicting the pronunciation of words from their orthographic representation.
no code implementations • WS 2019 • Yiheng Zhou, He He, Alan W. black, Yulia Tsvetkov
We consider a bargaining scenario where a seller and a buyer negotiate the price of an item for sale through a text-based dialog.
no code implementations • ICLR 2020 • Yiheng Zhou, Yulia Tsvetkov, Alan W. black, Zhou Yu
We train FSTs on a set of strategies and tactics used in negotiation dialogs.
no code implementations • 15 Sep 2019 • Ruo-Ping Dong, Khyathi Raghavi Chandu, Alan W. black
We also conduct human evaluation from which it is concluded that the visual stories generated by our model are preferred 82% of the times.
no code implementations • 3 Sep 2019 • Shikib Mehri, Alan W. black, Maxine Eskenazi
Voice-based technologies are typically developed for the average user, and thus generally not tailored to the specific needs of any subgroup of the population, like seniors.
1 code implementation • IJCNLP 2019 • Dongyeop Kang, Hiroaki Hayashi, Alan W. black, Eduard Hovy
In order to produce a coherent flow of text, we explore two forms of intersentential relations in a paragraph: one is a human-created linguistical relation that forms a structure (e. g., discourse tree) and the other is a relation from latent representation learned from the sentences themselves.
no code implementations • 2 Aug 2019 • Xinjian Li, Siddharth Dalmia, Alan W. black, Florian Metze
For example, the target corpus might benefit more from a corpus in the same domain or a corpus from a close language.
no code implementations • 2 Aug 2019 • Xinjian Li, Zhong Zhou, Siddharth Dalmia, Alan W. black, Florian Metze
In this work, we present SANTLR: Speech Annotation Toolkit for Low Resource Languages.
no code implementations • WS 2019 • Elijah Mayfield, Michael Madaio, Shrimai Prabhumoye, David Gerritsen, Brittany McLaughlin, Ezekiel Dixon-Rom{\'a}n, Alan W. black
There is a long record of research on equity in schools.
no code implementations • WS 2019 • Ch, Khyathi u, Shrimai Prabhumoye, Ruslan Salakhutdinov, Alan W. black
To this end, we propose five models which are incremental extensions to the baseline model to perform the task at hand.
2 code implementations • WS 2019 • Prakhar Gupta, Vinayshekhar Bannihatti Kumar, Mukul Bhutani, Alan W. black
In this paper, we propose models which generate more diverse and interesting outputs by 1) training models to focus attention on important keyphrases of the story, and 2) promoting generation of non-generic words.
no code implementations • ACL 2019 • Wenchao Du, Alan W. black
Neural models have become one of the most important approaches to dialog response generation.
no code implementations • ICLR Workshop DeepGenStruct 2019 • Ch, Khyathi u, Eric Nyberg, Alan W. black
We introduce a dataset for sequential procedural (how-to) text generation from images in cooking domain.
1 code implementation • WS 2019 • Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W. black, Yulia Tsvetkov
Contextual word embeddings such as BERT have achieved state of the art performance in numerous NLP tasks.
no code implementations • WS 2019 • Shrimai Prabhumoye, Elijah Mayfield, Alan W. black
We critique recent work on ethics in natural language processing.
no code implementations • 14 Jun 2019 • Shrimai Prabhumoye, Khyathi Raghavi Chandu, Ruslan Salakhutdinov, Alan W. black
To this end, we propose five models which are incremental extensions to the baseline model to perform the task at hand.
no code implementations • ACL 2019 • Elizabeth Salesky, Matthias Sperber, Alan W. black
Previous work on end-to-end translation from speech has primarily used frame-level features as speech representations, which creates longer, sparser sequences than text.
no code implementations • NAACL 2019 • Wenchao Du, Alan W. black
We consider neural language generation under a novel problem setting: generating the words of a sentence according to the order of their first appearance in its lexicalized PCFG parse tree, in a depth-first, left-to-right manner.
no code implementations • 25 Apr 2019 • Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux
We present the Zero Resource Speech Challenge 2019, which proposes to build a speech synthesizer without any text or phonetic labels: hence, TTS without T (text-to-speech without text).
1 code implementation • NAACL 2019 • Thomas Manzini, Yao Chong Lim, Yulia Tsvetkov, Alan W. black
Online texts -- across genres, registers, domains, and styles -- are riddled with human stereotypes, expressed in overt or subtle ways.
no code implementations • 25 Mar 2019 • Sunayana Sitaram, Khyathi Raghavi Chandu, Sai Krishna Rallabandi, Alan W. black
Code-switching, the alternation of languages within a conversation or utterance, is a common communicative phenomenon that occurs in multilingual communities across the world.
no code implementations • 24 Feb 2019 • Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J. Jansen, Yiming Yang, Lori Levin, Florian Metze, Teruko Mitamura, David R. Mortensen, Graham Neubig, Eduard Hovy, Alan W. black, Jaime Carbonell, Graham V. Horwood, Shabnam Tafreshi, Mona Diab, Efsun S. Kayi, Noura Farra, Kathleen McKeown
This paper describes the ARIEL-CMU submissions to the Low Resource Human Language Technologies (LoReHLT) 2018 evaluations for the tasks Machine Translation (MT), Entity Discovery and Linking (EDL), and detection of Situation Frames in Text and Speech (SF Text and Speech).
no code implementations • 20 Feb 2019 • Siddharth Dalmia, Xinjian Li, Alan W. black, Florian Metze
Building multilingual and crosslingual models help bring different languages together in a language universal space.
2 code implementations • 31 Jan 2019 • Emily Dinan, Varvara Logacheva, Valentin Malykh, Alexander Miller, Kurt Shuster, Jack Urbanek, Douwe Kiela, Arthur Szlam, Iulian Serban, Ryan Lowe, Shrimai Prabhumoye, Alan W. black, Alexander Rudnicky, Jason Williams, Joelle Pineau, Mikhail Burtsev, Jason Weston
We describe the setting and results of the ConvAI2 NeurIPS competition that aims to further the state-of-the-art in open-domain chatbots.
no code implementations • 24 Oct 2018 • Yulun Du, Chirag Raman, Alan W. black, Louis-Philippe Morency, Maxine Eskenazi
Distracted driving is deadly, claiming 3, 477 lives in the U. S. in 2015 alone.
3 code implementations • EMNLP 2018 • Kangyan Zhou, Shrimai Prabhumoye, Alan W. black
We define "Document Grounded Conversations" as conversations that are about the contents of a specified document.
no code implementations • 17 Sep 2018 • Shrimai Prabhumoye, Yulia Tsvetkov, Alan W. black, Ruslan Salakhutdinov
Style transfer is the task of transferring an attribute of a sentence (e. g., formality) while maintaining its semantic content.
no code implementations • 3 Sep 2018 • Wenchao Du, Alan W. black
Data augmentation seeks to manipulate the available data for training to improve the generalization ability of models.
no code implementations • 28 Jul 2018 • Siddharth Dalmia, Xinjian Li, Florian Metze, Alan W. black
We demonstrate the effectiveness of using a pre-trained English recognizer, which is robust to such mismatched conditions, as a domain normalizing feature extractor on a low resource language.
no code implementations • 4 Jul 2018 • Weidong Yuan, Alan W. black
This paper models the fundamental frequency contours on both Mandarin and Cantonese speech with decision trees and DNNs (deep neural networks).
no code implementations • WS 2018 • Rallab, SaiKrishna i, Sunayana Sitaram, Alan W. black
We hypothesize that it may be useful for an ASR system to be able to first detect the switching style of a particular utterance from acoustics, and then use specialized language models or other adaptation techniques for decoding the speech.
Automatic Speech Recognition (ASR) Language Identification +1
no code implementations • WS 2018 • Kyusong Lee, Tiancheng Zhao, Alan W. black, Maxine Eskenazi
When creating a dialog system, developers need to test each version to ensure that it is performing correctly.
no code implementations • WS 2018 • Ch, Khyathi u, Ekaterina Loginova, Vishal Gupta, Josef van Genabith, G{\"u}nter Neumann, Manoj Chinnakotla, Eric Nyberg, Alan W. black
As a first step towards fostering research which supports CM in NLP applications, we systematically crowd-sourced and curated an evaluation dataset for factoid question answering in three CM languages - Hinglish (Hindi+English), Tenglish (Telugu+English) and Tamlish (Tamil+English) which belong to two language families (Indo-Aryan and Dravidian).
no code implementations • WS 2018 • Ravich, Abhilasha er, Alan W. Black
Self-disclosure is a key social strategy employed in conversation to build relations and increase conversational depth.
no code implementations • WS 2018 • Parvathy Geetha, Ch, Khyathi u, Alan W. black
In this paper we describe models that intuitively developed from the data for the shared task Named Entity Recognition on Code-switched Data.
no code implementations • WS 2018 • Ch, Khyathi u, Thomas Manzini, Sumeet Singh, Alan W. black
Code-switching (CS), the practice of alternating between two or more languages in conversations, is pervasive in most multi-lingual communities.
3 code implementations • ACL 2018 • Shrimai Prabhumoye, Yulia Tsvetkov, Ruslan Salakhutdinov, Alan W. black
We first learn a latent representation of the input sentence which is grounded in a language translation model in order to better preserve the meaning of the sentence while reducing stylistic properties.
Ranked #10 on Unsupervised Text Style Transfer on Yelp
no code implementations • 21 Feb 2018 • Siddharth Dalmia, Ramon Sanabria, Florian Metze, Alan W. black
Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains.
no code implementations • WS 2017 • Shrimai Prabhumoye, Samridhi Choudhary, Evangelia Spiliopoulou, Christopher Bogart, Carolyn Penstein Rose, Alan W. black
There has been a long standing interest in understanding `Social Influence' both in Social Sciences and in Computational Linguistics.
no code implementations • 1 Mar 2017 • Zhou Yu, Alan W. black, Alexander I. Rudnicky
These systems work well when users have clear and explicit intentions that are well-aligned to the systems' capabilities.
no code implementations • NAACL 2016 • Yulia Tsvetkov, Sunayana Sitaram, Manaal Faruqui, Guillaume Lample, Patrick Littell, David Mortensen, Alan W. black, Lori Levin, Chris Dyer
We introduce polyglot language models, recurrent neural network models trained to predict symbol sequences in many different languages using shared representations of symbols and conditioning on typological information about the language to be predicted.
no code implementations • LREC 2016 • Sunayana Sitaram, Alan W. black
Most Text to Speech (TTS) systems today assume that the input text is in a single language and is written in the same language that the text needs to be synthesized in.
no code implementations • 26 Jan 2016 • Prasanna Kumar Muthukumar, Alan W. black
In the last two years, there have been numerous papers that have looked into using Deep Neural Networks to replace the acoustic model in traditional statistical parametric speech synthesis.
no code implementations • 14 Nov 2015 • Wang Ling, Isabel Trancoso, Chris Dyer, Alan W. black
We introduce a neural machine translation model that views the input and output sentences as sequences of characters rather than words.
1 code implementation • EMNLP 2015 • Wang Ling, Tiago Luís, Luís Marujo, Ramón Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W. black, Isabel Trancoso
We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs.
Ranked #4 on Part-Of-Speech Tagging on Penn Treebank
no code implementations • 30 Sep 2014 • Prasanna Kumar Muthukumar, Alan W. black
Mel Cepstral coefficients were never intended to work in a parametric speech synthesis framework, but as yet, there has been little success in creating a better parameterization that is more suited to synthesis.