1 code implementation • 28 Oct 2024 • Mirac Suzgun, Tayfun Gur, Federico Bianchi, Daniel E. Ho, Thomas Icard, Dan Jurafsky, James Zou
These findings highlight significant concerns about current LMs' ability to reason about truth, belief, and knowledge while emphasizing the need for advancements in these areas before broad deployment in critical sectors.
1 code implementation • 21 Oct 2024 • Aryaman Arora, Dan Jurafsky, Christopher Potts, Noah D. Goodman
In all cases, Bayesian scaling laws accurately predict the conditions under which ICL will cause the suppressed behavior to reemerge, which sheds light on the ineffectiveness of post-training at increasing LLM safety.
1 code implementation • 27 Aug 2024 • Kristina Gligorić, Tijana Zrnic, Cinoo Lee, Emmanuel J. Candès, Dan Jurafsky
We introduce Confidence-Driven Inference: a method that combines LLM annotations and LLM confidence indicators to strategically select which human annotations should be collected, with the goal of producing accurate statistical estimates and provably valid confidence intervals while reducing the number of human annotations needed.
no code implementations • 24 Aug 2024 • Antón de la Fuente, Dan Jurafsky
This study asks how self-supervised speech models represent suprasegmental categories like Mandarin lexical tone, English lexical stress, and English phrasal accents.
no code implementations • 9 Aug 2024 • Moussa Koulako Bala Doumbouya, Ananjan Nandi, Gabriel Poesia, Davide Ghilardi, Anna Goldie, Federico Bianchi, Dan Jurafsky, Christopher D. Manning
The safety of Large Language Models (LLMs) remains a critical concern due to a lack of adequate benchmarks for systematically evaluating their ability to resist generating harmful content.
1 code implementation • 6 Aug 2024 • Heidi C. Zhang, Shabnam Behzad, Kawin Ethayarajh, Dan Jurafsky
Model checklists (Ribeiro et al., 2020) have emerged as a useful tool for understanding the behavior of LLMs, analogous to unit-testing in software engineering.
no code implementations • 10 Jul 2024 • Kaitlyn Zhou, Jena D. Hwang, Xiang Ren, Nouha Dziri, Dan Jurafsky, Maarten Sap
The ability to communicate uncertainty, risk, and limitation is crucial for the safety of large language models.
no code implementations • 12 Jun 2024 • Jiatong Shi, Shih-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, Jinchuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-Yi Lee, Shinji Watanabe
This paper presents ML-SUPERB~2. 0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models across downstream models, fine-tuning setups, and efficient model adaptation approaches.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
2 code implementations • 4 Apr 2024 • Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts
We define a strong instance of the ReFT family, Low-rank Linear Subspace ReFT (LoReFT), and we identify an ablation of this method that trades some performance for increased efficiency.
1 code implementation • 2 Apr 2024 • Kristina Gligoric, Myra Cheng, Lucia Zheng, Esin Durmus, Dan Jurafsky
The use of words to convey speaker's intent is traditionally distinguished from the `mention' of words for quoting what someone said, or pointing out properties of a word.
1 code implementation • 1 Mar 2024 • Valentin Hofmann, Pratyusha Ria Kalluri, Dan Jurafsky, Sharese King
Here, we demonstrate that language models embody covert racism in the form of dialect prejudice: we extend research showing that Americans hold raciolinguistic stereotypes about speakers of African American English and find that language models have the same prejudice, exhibiting covert stereotypes that are more negative than any human stereotypes about African Americans ever experimentally recorded, although closest to the ones from before the civil rights movement.
1 code implementation • 19 Feb 2024 • Aryaman Arora, Dan Jurafsky, Christopher Potts
Language models (LMs) have proven to be powerful tools for psycholinguistic research, but most prior work has focused on purely behavioural measures (e. g., surprisal comparisons).
1 code implementation • 8 Feb 2024 • Federico Bianchi, Patrick John Chia, Mert Yuksekgonul, Jacopo Tagliabue, Dan Jurafsky, James Zou
We develop NegotiationArena: a flexible framework for evaluating and probing the negotiation abilities of LLM agents.
1 code implementation • 3 Feb 2024 • Myra Cheng, Kristina Gligoric, Tiziano Piccardi, Dan Jurafsky
Anthropomorphism, or the attribution of human-like characteristics to non-human entities, has shaped conversations about the impacts and possibilities of technology.
no code implementations • 3 Feb 2024 • Nay San, Georgios Paraskevopoulos, Aryaman Arora, Xiluo He, Prabhjot Kaur, Oliver Adams, Dan Jurafsky
Continued pre-training on 70-200 hours of untranscribed speech in these languages can help -- but what about languages without that much recorded data?
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
3 code implementations • 2 Feb 2024 • Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, Douwe Kiela
Kahneman & Tversky's $\textit{prospect theory}$ tells us that humans perceive random variables in a biased but well-defined manner (1992); for example, humans are famously loss-averse.
no code implementations • 25 Nov 2023 • Tolúlopé Ògúnrèmí, Christopher D. Manning, Dan Jurafsky
While many speakers of low-resource languages regularly code-switch between their languages and other regional languages or English, datasets of codeswitched speech are too small to train bespoke acoustic models from scratch or do language model rescoring.
no code implementations • 15 Nov 2023 • Omar Shaikh, Kristina Gligorić, Ashna Khetan, Matthias Gerstgrasser, Diyi Yang, Dan Jurafsky
To understand the roots of the identified grounding gap, we examine the role of instruction tuning and preference optimization, finding that training on contemporary preference data leads to a reduction in generated grounding acts.
no code implementations • 28 Sep 2023 • Garrett Tanzer, Mirac Suzgun, Eline Visser, Dan Jurafsky, Luke Melas-Kyriazi
In this paper, we introduce MTOB (Machine Translation from One Book), a benchmark for learning to translate between English and Kalamang -- a language with less than 200 speakers and therefore virtually no presence on the web -- using several hundred pages of field linguistics reference materials.
4 code implementations • 14 Sep 2023 • Federico Bianchi, Mirac Suzgun, Giuseppe Attanasio, Paul Röttger, Dan Jurafsky, Tatsunori Hashimoto, James Zou
Training large language models to follow instructions makes them perform better on a wide range of tasks and generally become more helpful.
1 code implementation • 16 Aug 2023 • Eva Portelance, Michael C. Frank, Dan Jurafsky
Furthermore, we find that these models can learn the meanings of logical connectives and and or without any prior knowledge of logical reasoning, as well as early evidence that they are sensitive to alternative expressions when interpreting language.
1 code implementation • 14 Jul 2023 • Yiwei Luo, Kristina Gligorić, Dan Jurafsky
Through careful linguistic analyses, we evaluate social theories about attitudes toward immigrant cuisine in a large-scale study of framing differences in 2. 1M English language Yelp reviews.
no code implementations • NeurIPS 2023 • Connor Toups, Rishi Bommasani, Kathleen A. Creel, Sarah H. Bana, Dan Jurafsky, Percy Liang
In practice, the societal impact of machine learning is determined by the surrounding context of machine learning deployments.
no code implementations • 9 Jun 2023 • Anjalie Field, Prateek Verma, Nay San, Jennifer L. Eberhardt, Dan Jurafsky
We investigate the potential of large pre-trained speech models for facilitating reviews, focusing on ASR and officer speech detection in footage from traffic stops.
1 code implementation • 29 May 2023 • Myra Cheng, Esin Durmus, Dan Jurafsky
To recognize and mitigate harms from large language models (LLMs), we need to understand the prevalence and nuances of stereotypes in LLM outputs.
1 code implementation • 18 May 2023 • Martijn Bartelds, Nay San, Bradley McDonnell, Dan Jurafsky, Martijn Wieling
For Gronings, for which there was a pre-existing text-to-speech (TTS) system available, we also examined the use of TTS to generate ASR training data from text-only sources.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
2 code implementations • 27 Apr 2023 • Mirac Suzgun, Stuart M. Shieber, Dan Jurafsky
It includes traditional algorithmic solutions as well as recent advanced neural approaches to tackle various problems in string alignment, distance measurement, lexical and semantic search, and similarity analysis -- along with several helpful visualization tools and metrics to facilitate the interpretation and analysis of these methods.
1 code implementation • 25 Apr 2023 • Isabel Papadimitriou, Dan Jurafsky
Our study leverages the capabilities of transformer models to run controlled language learning experiments that are not possible to run on humans, and surfaces hypotheses about the structures that facilitate language learning in both humans and machines.
no code implementations • 26 Feb 2023 • Kaitlyn Zhou, Dan Jurafsky, Tatsunori Hashimoto
The increased deployment of LMs for real-world tasks involving knowledge and facts makes it important to understand model epistemology: what LMs think they know, and how their attitudes toward that knowledge are affected by language use in their inputs.
no code implementations • 9 Feb 2023 • Nay San, Martijn Bartelds, Blaine Billings, Ella de Falco, Hendi Feriza, Johan Safri, Wawan Sahrozi, Ben Foley, Bradley McDonnell, Dan Jurafsky
We perform experiments using 10 minutes of transcribed speech from English (for replicating prior work) and two additional pairs of languages differing in the availability of supplemental text data: Gronings and Frisian (~7. 5M token corpora available), and Besemah and Nasal (only small lexica available).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 27 Nov 2022 • Peter Henderson, Eric Mitchell, Christopher D. Manning, Dan Jurafsky, Chelsea Finn
A growing ecosystem of large, open-source foundation models has reduced the labeled data and technical expertise necessary to apply machine learning to many new problems.
no code implementations • 25 Nov 2022 • Rishi Bommasani, Kathleen A. Creel, Ananya Kumar, Dan Jurafsky, Percy Liang
As the scope of machine learning broadens, we observe a recurring theme of algorithmic monoculture: the same systems, or systems that share components (e. g. training data), are deployed by multiple decision-makers.
1 code implementation • 14 Nov 2022 • Mirac Suzgun, Luke Melas-Kyriazi, Dan Jurafsky
In open-ended natural-language generation, existing text decoding methods typically struggle to produce text which is both diverse and high-quality.
1 code implementation • 7 Nov 2022 • Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, Aylin Caliskan
For example, we find cases of prompting for basic traits or social roles resulting in images reinforcing whiteness as ideal, prompting for occupations resulting in amplification of racial and gender disparities, and prompting for objects resulting in reification of American norms.
no code implementations • 11 Oct 2022 • Isabel Papadimitriou, Kezia Lopez, Dan Jurafsky
Here we show another problem with multilingual models: grammatical structures in higher-resource languages bleed into lower-resource languages, a phenomenon we call grammatical structure bias.
1 code implementation • 4 Oct 2022 • Mert Yuksekgonul, Federico Bianchi, Pratyusha Kalluri, Dan Jurafsky, James Zou
ARO consists of Visual Genome Attribution, to test the understanding of objects' properties; Visual Genome Relation, to test for relational understanding; and COCO & Flickr30k-Order, to test for order sensitivity.
1 code implementation • NAACL (BEA) 2022 • Sterling Alic, Dorottya Demszky, Zid Mancenido, Jing Liu, Heather Hill, Dan Jurafsky
Responsive teaching is a highly effective strategy that promotes student learning.
1 code implementation • 1 Jul 2022 • Peter Henderson, Mark S. Krass, Lucia Zheng, Neel Guha, Christopher D. Manning, Dan Jurafsky, Daniel E. Ho
One concern with the rise of large language models lies with their potential for significant harm, particularly from pretraining on biased, obscene, copyrighted, and private information.
no code implementations • 24 May 2022 • Kawin Ethayarajh, Dan Jurafsky
Human ratings are the gold standard in NLG evaluation.
1 code implementation • 23 May 2022 • Mirac Suzgun, Luke Melas-Kyriazi, Dan Jurafsky
We propose a method for arbitrary textual style transfer (TST)--the task of transforming a text into any given style--utilizing general-purpose pre-trained language models.
1 code implementation • Findings (ACL) 2022 • Kaitlyn Zhou, Kawin Ethayarajh, Dan Jurafsky
We examine whether some countries are more richly represented in embedding space than others.
2 code implementations • ACL 2022 • Kaitlyn Zhou, Kawin Ethayarajh, Dallas Card, Dan Jurafsky
Cosine similarity of contextual embeddings is used in many NLP tasks (e. g., QA, IR, MT) and metrics (e. g., BERTScore).
1 code implementation • Findings (ACL) 2022 • Junshen K. Chen, Dallas Card, Dan Jurafsky
Off-the-shelf models are widely used by computational social science researchers to measure properties of text, such as sentiment.
no code implementations • ComputEL (ACL) 2022 • Nay San, Martijn Bartelds, Tolúlopé Ògúnrèmí, Alison Mount, Ruben Thompson, Michael Higgins, Roy Barker, Jane Simpson, Dan Jurafsky
An even narrower bottleneck occurs for recordings with access constraints, such as language that must be vetted or filtered by authorised community members before annotation can begin.
1 code implementation • EMNLP 2021 • William Held, Dan Iter, Dan Jurafsky
We model the entities/events in a reader's focus as a neighborhood within a learned latent embedding space which minimizes the distance between mentions and the centroids of their gold coreference clusters.
Ranked #1 on Event Coreference Resolution on Gun Violence Corpus
coreference-resolution Entity Cross-Document Coreference Resolution +2
1 code implementation • CoNLL (EMNLP) 2021 • Eva Portelance, Michael C. Frank, Dan Jurafsky, Alessandro Sordoni, Romain Laroche
By the age of two, children tend to assume that new word categories are based on objects' shape, rather than their color or texture; this assumption is called the shape bias.
2 code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang
AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.
1 code implementation • ACL 2021 • Dorottya Demszky, Jing Liu, Zid Mancenido, Julie Cohen, Heather Hill, Dan Jurafsky, Tatsunori Hashimoto
In conversation, uptake happens when a speaker builds on the contribution of their interlocutor by, for example, acknowledging, repeating or reformulating what they have said.
no code implementations • ACL 2021 • Kawin Ethayarajh, Dan Jurafsky
Shapley Values, a solution to the credit assignment problem in cooperative game theory, are a popular type of explanation in machine learning, having been used to explain the importance of features, embeddings, and even neurons.
1 code implementation • 21 Apr 2021 • Michael Hahn, Dan Jurafsky, Richard Futrell
We introduce a theoretical framework for understanding and predicting the complexity of sequence classification tasks, using a novel extension of the theory of Boolean function sensitivity.
no code implementations • 17 Apr 2021 • Kaitlyn Zhou, Kawin Ethayarajh, Dan Jurafsky
How does word frequency in pre-training data affect the behavior of similarity metrics in contextualized BERT embeddings?
1 code implementation • 26 Mar 2021 • Nay San, Martijn Bartelds, Mitchell Browne, Lily Clifford, Fiona Gibson, John Mansfield, David Nash, Jane Simpson, Myfany Turpin, Maria Vollmer, Sasha Wilmoth, Dan Jurafsky
Surprisingly, the English model outperformed the multilingual model on 4 Australian language datasets, raising questions around how to optimally leverage self-supervised speech representations for QbE-STD.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • NeurIPS 2020 • Alex Tamkin, Dan Jurafsky, Noah Goodman
Language exhibits structure at different scales, ranging from subwords to words, sentences, paragraphs, and documents.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Yiwei Luo, Dallas Card, Dan Jurafsky
We release our stance dataset, model, and lexicons of framing devices for future work on opinion-framing and the automatic detection of GW stance.
1 code implementation • NAACL 2021 • Reid Pryzant, Dallas Card, Dan Jurafsky, Victor Veitch, Dhanya Sridhar
Second, in practice, we only have access to noisy proxies for the linguistic properties of interest -- e. g., predictions from classifiers and lexicons.
3 code implementations • NAACL 2021 • Yasuhide Miura, Yuhao Zhang, Emily Bao Tsai, Curtis P. Langlotz, Dan Jurafsky
We further show via a human evaluation and a qualitative analysis that our system leads to generations that are more factually complete and consistent compared to the baselines.
2 code implementations • EMNLP 2020 • Dallas Card, Peter Henderson, Urvashi Khandelwal, Robin Jia, Kyle Mahowald, Dan Jurafsky
Despite its importance to experimental design, statistical power (the probability that, given a real effect, an experiment will reject the null hypothesis) has largely been ignored by the NLP community.
5 code implementations • ICLR 2021 • Urvashi Khandelwal, Angela Fan, Dan Jurafsky, Luke Zettlemoyer, Mike Lewis
We introduce $k$-nearest-neighbor machine translation ($k$NN-MT), which predicts tokens with a nearest neighbor classifier over a large datastore of cached examples, using representations from a neural translation model for similarity search.
no code implementations • EMNLP 2020 • Kawin Ethayarajh, Dan Jurafsky
Benchmarks such as GLUE have helped drive advances in NLP by incentivizing the creation of more accurate models.
no code implementations • 16 Jun 2020 • Dorottya Demszky, László Kálmán, Dan Jurafsky, Beth Levin
We test the effect of lexical semantics on the ordering of verbs and their objects by grouping verbs into 11 semantic classes.
1 code implementation • ACL 2020 • Dan Iter, Kelvin Guu, Larry Lansing, Dan Jurafsky
Recent models for unsupervised representation learning of text have employed a number of techniques to improve contextual word representations but have put little focus on discourse-level representations.
2 code implementations • EMNLP 2020 • Isabel Papadimitriou, Dan Jurafsky
We propose transfer learning as a method for analyzing the encoding of grammatical structure in neural language models.
no code implementations • 6 Mar 2020 • Julia Mendelsohn, Yulia Tsvetkov, Dan Jurafsky
Dehumanization is a pernicious psychological process that often leads to extreme intergroup bias, hate speech, and violence aimed at targeted social groups.
2 code implementations • 31 Jan 2020 • Peter Henderson, Jieru Hu, Joshua Romoff, Emma Brunskill, Dan Jurafsky, Joelle Pineau
Accurate reporting of energy and carbon usage is essential for understanding the potential climate impacts of machine learning research.
1 code implementation • 21 Nov 2019 • Reid Pryzant, Richard Diehl Martinez, Nathan Dass, Sadao Kurohashi, Dan Jurafsky, Diyi Yang
To address this issue, we introduce a novel testbed for natural language generation: automatically bringing inappropriately subjective text into a neutral point of view ("neutralizing" biased text).
no code implementations • ACL 2020 • Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, Yejin Choi
We introduce Social Bias Frames, a new conceptual formalism that aims to model the pragmatic frames in which people project social biases and stereotypes onto others.
5 code implementations • ICLR 2020 • Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, Mike Lewis
Applying this augmentation to a strong Wikitext-103 LM, with neighbors drawn from the original training set, our $k$NN-LM achieves a new state-of-the-art perplexity of 15. 79 - a 2. 9 point improvement with no additional training.
Ranked #10 on Language Modelling on WikiText-103
1 code implementation • 4 Sep 2019 • Bas Hofstra, Vivek V. Kulkarni, Sebastian Munoz-Najar Galvez, Bryan He, Dan Jurafsky, Daniel A. McFarland
Are underrepresented groups more likely to generate scientific innovations?
no code implementations • WS 2019 • Yiwei Luo, Dan Jurafsky, Beth Levin
We introduce novel computational models for modeling semantic bleaching, a widespread category of change in which words become more abstract or lose elements of meaning, like the development of {``}arrive{''} from its earlier meaning {`}become at shore.
no code implementations • WS 2019 • Joseph Lee, Ziang Xie, Cindy Wang, Max Drach, Dan Jurafsky, Andrew Ng
We introduce a simple method for text style transfer that frames style transfer as denoising: we synthesize a noisy corpus and treat the source style as a noisy version of the target style.
no code implementations • NAACL 2019 • Ignacio Cases, Clemens Rosenbaum, Matthew Riemer, Atticus Geiger, Tim Klinger, Alex Tamkin, Olivia Li, S Agarwal, hini, Joshua D. Greene, Dan Jurafsky, Christopher Potts, Lauri Karttunen
The model jointly optimizes the parameters of the functions and the meta-learner{'}s policy for routing inputs through those functions.
no code implementations • NAACL 2019 • Diyi Yang, Jiaao Chen, Zichao Yang, Dan Jurafsky, Eduard Hovy
Modeling what makes a request persuasive - eliciting the desired response from a reader - is critical to the study of propaganda, behavioral economics, and advertising.
2 code implementations • 21 May 2019 • Urvashi Khandelwal, Kevin Clark, Dan Jurafsky, Lukasz Kaiser
Language model (LM) pre-training has resulted in impressive performance and sample efficiency on a variety of language understanding tasks.
Ranked #1 on Text Summarization on DUC 2004 Task 1 (ROUGE-2 metric)
1 code implementation • IJCNLP 2019 • Julia Kruk, Jonah Lubin, Karan Sikka, Xiao Lin, Dan Jurafsky, Ajay Divakaran
Computing author intent from multimodal data like Instagram posts requires modeling a complex relationship between text and image.
1 code implementation • NAACL 2019 • Dorottya Demszky, Nikhil Garg, Rob Voigt, James Zou, Matthew Gentzkow, Jesse Shapiro, Dan Jurafsky
We provide an NLP framework to uncover four linguistic dimensions of political polarization in social media: topic choice, framing, affect and illocutionary force.
2 code implementations • EMNLP 2018 • Matthew Lamm, Arun Tejasvi Chaganty, Christopher D. Manning, Dan Jurafsky, Percy Liang
To understand a sentence like "whereas only 10% of White Americans live at or below the poverty line, 28% of African Americans do" it is important not only to identify individual facts, e. g., poverty rates of distinct demographic groups, but also the higher-order relations between them, e. g., the disparity between them.
no code implementations • EMNLP 2018 • Anjalie Field, Doron Kliger, Shuly Wintner, Jennifer Pan, Dan Jurafsky, Yulia Tsvetkov
Amidst growing concern over media manipulation, NLP attention has focused on overt strategies like censorship and "fake news'".
6 code implementations • NeurIPS 2018 • William L. Hamilton, Payal Bajaj, Marinka Zitnik, Dan Jurafsky, Jure Leskovec
Learning low-dimensional embeddings of knowledge graphs is a powerful approach used to predict unobserved or missing edges between entities.
Ranked #6 on Complex Query Answering on FB15k-237
no code implementations • WS 2018 • Dan Iter, Jong Yoon, Dan Jurafsky
Here, we present the first benchmark comparison of previously proposed coherence models for detecting symptoms of schizophrenia and evaluate their performance on a new dataset of recorded interviews between subjects and clinicians.
no code implementations • NAACL 2018 • Ziang Xie, Guillaume Genthial, Stanley Xie, Andrew Ng, Dan Jurafsky
Translation-based methods for grammar correction that directly map noisy, ungrammatical text to their clean counterparts are able to correct a broad range of errors; however, such techniques are bottlenecked by the need for a large parallel corpus of noisy and clean sentence pairs.
no code implementations • NAACL 2018 • Reid Pryzant, Kelly Shen, Dan Jurafsky, Stefan Wagner
The first uses a bifurcated architecture to separate the explanatory power of the text and confounds.
1 code implementation • ACL 2018 • Urvashi Khandelwal, He He, Peng Qi, Dan Jurafsky
We know very little about how neural language models (LM) use prior linguistic context.
no code implementations • 9 Mar 2018 • Srijan Kumar, William L. Hamilton, Jure Leskovec, Dan Jurafsky
Here we study intercommunity interactions across 36, 000 communities on Reddit, examining cases where users of one community are mobilized by negative sentiment to comment in another community.
no code implementations • TACL 2018 • Vinodkumar Prabhakaran, Camilla Griffiths, Hang Su, Prateek Verma, Nelson Morgan, Jennifer L. Eberhardt, Dan Jurafsky
We apply computational dialog methods to police body-worn camera footage to model conversations between police officers and community members in traffic stops.
1 code implementation • 22 Nov 2017 • Nikhil Garg, Londa Schiebinger, Dan Jurafsky, James Zou
Word embeddings use vectors to represent words such that the geometry between vectors captures semantic relationship between the words.
no code implementations • LREC 2018 • Reid Pryzant, Yongjoo Chung, Dan Jurafsky, Denny Britz
In this paper we describe the Japanese-English Subtitle Corpus (JESC).
no code implementations • EMNLP 2017 • Jiwei Li, Dan Jurafsky
In this paper, we describe domain-independent neural models of discourse coherence that are capable of measuring multiple aspects of coherence in existing sentences and can maintain coherence while generating new sentences.
no code implementations • ACL 2017 • David Jurgens, Yulia Tsvetkov, Dan Jurafsky
Language identification (LID) is a critical first step for processing multilingual text.
no code implementations • 26 May 2017 • Justine Zhang, William L. Hamilton, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky, Jure Leskovec
To this end we introduce a quantitative, language-based typology reflecting two key aspects of a community's identity: how distinctive, and how temporally dynamic it is.
no code implementations • EACL 2017 • Grace Muzny, Michael Fang, Angel Chang, Dan Jurafsky
We present a deterministic sieve-based system for attributing quotations in literary text and a new dataset: QuoteLi3.
1 code implementation • 9 Mar 2017 • William L. Hamilton, Justine Zhang, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky, Jure Leskovec
In this paper we operationalize loyalty as a user-community relation: users loyal to a community consistently prefer it over all others; loyal communities retain their loyal users over time.
no code implementations • 7 Mar 2017 • Ziang Xie, Sida I. Wang, Jiwei Li, Daniel Lévy, Aiming Nie, Dan Jurafsky, Andrew Y. Ng
Data noising is an effective technique for regularizing neural network models.
no code implementations • 22 Feb 2017 • Jiwei Li, Will Monroe, Dan Jurafsky
We show that from such a set of subsystems, one can use reinforcement learning to build a system that tailors its output to different input contexts at test time.
8 code implementations • EMNLP 2017 • Jiwei Li, Will Monroe, Tianlin Shi, Sébastien Jean, Alan Ritter, Dan Jurafsky
In this paper, drawing intuition from the Turing test, we propose using adversarial training for open-domain dialogue generation: the system is trained to produce sequences that are indistinguishable from human-generated dialogue utterances.
Ranked #1 on Dialogue Generation on Amazon-5
no code implementations • 23 Jan 2017 • Jiwei Li, Will Monroe, Dan Jurafsky
We introduce a simple, general strategy to manipulate the behavior of a neural decoder that enables it to generate outputs that have specific properties of interest (e. g., sequences of a pre-specified length).
no code implementations • 24 Dec 2016 • Jiwei Li, Will Monroe, Dan Jurafsky
While neural networks have been successfully applied to many natural language processing tasks, they come at the cost of interpretability.
1 code implementation • 25 Nov 2016 • Jiwei Li, Will Monroe, Dan Jurafsky
We further propose a variation that is capable of automatically adjusting its diversity decoding rates for different inputs using reinforcement learning (RL).
no code implementations • 2 Sep 2016 • David Jurgens, Srijan Kumar, Raine Hoover, Dan McFarland, Dan Jurafsky
Citations are an important indicator of the state of a scientific field, reflecting how authors frame their work, and influencing uptake by future scholars.
no code implementations • EMNLP 2016 • William L. Hamilton, Jure Leskovec, Dan Jurafsky
Words shift in meaning for many reasons, including cultural factors like new technologies and regular linguistic processes like subjectification.
1 code implementation • EMNLP 2016 • William L. Hamilton, Kevin Clark, Jure Leskovec, Dan Jurafsky
A word's sentiment depends on the domain in which it is used.
8 code implementations • EMNLP 2016 • Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, Dan Jurafsky
Recent neural models of dialogue generation offer great promise for generating responses for conversational agents, but tend to be shortsighted, predicting utterances one at a time while ignoring their influence on future outcomes.
1 code implementation • 5 Jun 2016 • Jiwei Li, Dan Jurafsky
In this paper, we describe domain-independent neural models of discourse coherence that are capable of measuring multiple aspects of coherence in existing sentences and can maintain coherence while generating new sentences.
6 code implementations • ACL 2016 • William L. Hamilton, Jure Leskovec, Dan Jurafsky
Understanding how words change their meanings over time is key to models of language and cultural evolution, but historical data on meaning is scarce, making theories hard to develop and test.
3 code implementations • 31 Mar 2016 • Ziang Xie, Anand Avati, Naveen Arivazhagan, Dan Jurafsky, Andrew Y. Ng
Motivated by these issues, we present a neural network-based approach to language correction.
1 code implementation • 4 Jan 2016 • Jiwei Li, Dan Jurafsky
We introduce an alternative objective function for neural MT that maximizes the mutual information between the source and target sentences, modeling the bi-directional dependency of sources and targets.
no code implementations • 18 Oct 2015 • Jiwei Li, Alan Ritter, Dan Jurafsky
Inferring latent attributes of people online is an important social computing task, but requires integrating the many heterogeneous sources of information available on the web.
1 code implementation • NAACL 2016 • Jiwei Li, Xinlei Chen, Eduard Hovy, Dan Jurafsky
While neural networks have been successfully applied to many NLP tasks the resulting vector-based models are very difficult to interpret.
no code implementations • EMNLP 2015 • Jiwei Li, Dan Jurafsky
Learning a distinct representation for each sense of an ambiguous word could lead to more powerful and fine-grained models of vector-space representations.
6 code implementations • IJCNLP 2015 • Jiwei Li, Minh-Thang Luong, Dan Jurafsky
Natural language generation of coherent long texts like paragraphs or longer documents is a challenging problem for recurrent networks models.
no code implementations • EMNLP 2015 • Jiwei Li, Minh-Thang Luong, Dan Jurafsky, Eudard Hovy
Recursive neural models, which use syntactic parse trees to recursively generate representations bottom-up, are a popular architecture.
no code implementations • 11 Nov 2014 • Jiwei Li, Alan Ritter, Dan Jurafsky
by building a probabilistic model that reasons over user attributes (the user's location or gender) and the social network (the user's friends and spouse), via inferences like homophily (I am more likely to like sushi if spouse or friends like sushi, I am more likely to like the Knicks if I live in New York).
no code implementations • 13 May 2014 • Tim Althoff, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky
We present a case study of altruistic requests in an online community where all requests ask for the very same contribution and do not offer anything tangible in return, allowing us to disentangle what is requested from textual and social factors.
no code implementations • LREC 2014 • Heeyoung Lee, Mihai Surdeanu, Bill MacCartney, Dan Jurafsky
We investigate the importance of text analysis for stock price prediction.
no code implementations • ACL 2013 • Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, Christopher Potts
We propose a computational framework for identifying linguistic aspects of politeness.