Search Results for author: David Jurgens

Found 65 papers, 24 papers with code

Condolence and Empathy in Online Communities

no code implementations EMNLP 2020 Naitian Zhou, David Jurgens

Here, we develop computational tools to create a massive dataset of 11. 4M expressions of distress and 2. 8M corresponding offerings of condolence in order to examine the dynamics of condolence online.

Classification without (Proper) Representation: Political Heterogeneity in Social Media and Its Implications for Classification and Behavioral Analysis

no code implementations Findings (ACL) 2022 Kenan Alkiek, Bohan Zhang, David Jurgens

Reddit is home to a broad spectrum of political activity, and users signal their political affiliations in multiple ways—from self-declarations to community participation.

Classification

When it Rains, it Pours: Modeling Media Storms and the News Ecosystem

1 code implementation4 Dec 2023 Benjamin Litterer, David Jurgens, Dallas Card

Most events in the world receive at most brief coverage by the news media.

Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks

1 code implementation16 Nov 2023 Huaman Sun, Jiaxin Pei, MinJe Choi, David Jurgens

We find that for both tasks, model predictions are closer to the labels from White and female participants.

Social Meme-ing: Measuring Linguistic Variation in Memes

1 code implementation15 Nov 2023 Naitian Zhou, David Jurgens, David Bamman

Much work in the space of NLP has used computational methods to explore sociolinguistic variation in text.

RCT Rejection Sampling for Causal Estimation Evaluation

1 code implementation27 Jul 2023 Katherine A. Keith, Sergey Feldman, David Jurgens, Jonathan Bragg, Rohit Bhattacharya

We contribute a new sampling algorithm, which we call RCT rejection sampling, and provide theoretical guarantees that causal identification holds in the observational data to allow for valid comparisons to the ground-truth RCT.

Causal Identification

Your spouse needs professional help: Determining the Contextual Appropriateness of Messages through Modeling Social Relationships

1 code implementation6 Jul 2023 David Jurgens, Agrima Seth, Jackson Sargent, Athena Aghighi, Michael Geraci

We introduce a new dataset of contextually-situated judgments of appropriateness and show that large language models can readily incorporate relationship information to accurately identify appropriateness in a given context.

When Do Annotator Demographics Matter? Measuring the Influence of Annotator Demographics with the POPQUORN Dataset

1 code implementation12 Jun 2023 Jiaxin Pei, David Jurgens

Further, our work shows that backgrounds not previously considered in NLP (e. g., education), are meaningful and should be considered.

Question Answering

Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark

1 code implementation24 May 2023 MinJe Choi, Jiaxin Pei, Sagar Kumar, Chang Shu, David Jurgens

Large language models (LLMs) have been shown to perform well at a variety of syntactic, discourse, and reasoning tasks.

POTATO: The Portable Text Annotation Tool

1 code implementation16 Dec 2022 Jiaxin Pei, Aparna Ananthasubramaniam, Xingyao Wang, Naitian Zhou, Jackson Sargent, Apostolos Dedeloudis, David Jurgens

We present POTATO, the Portable text annotation tool, a free, fully open-sourced annotation system that 1) supports labeling many types of text and multimodal data; 2) offers easy-to-configure features to maximize the productivity of both deployers and annotators (convenient templates for common ML/NLP tasks, active learning, keypress shortcuts, keyword highlights, tooltips); and 3) supports a high degree of customization (editable UI, inserting pre-screening questions, attention and qualification tests).

Active Learning text annotation

A Critical Reflection and Forward Perspective on Empathy and Natural Language Processing

no code implementations29 Oct 2022 Allison Lahnala, Charles Welch, David Jurgens, Lucie Flek

We review the state of research on empathy in natural language processing and identify the following issues: (1) empathy definitions are absent or abstract, which (2) leads to low construct validity and reproducibility.

Modeling Information Change in Science Communication with Semantically Matched Paraphrases

no code implementations24 Oct 2022 Dustin Wright, Jiaxin Pei, David Jurgens, Isabelle Augenstein

Whether the media faithfully communicate scientific information has long been a core issue to the science community.

Fact Checking Retrieval

SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis

no code implementations3 Oct 2022 Jiaxin Pei, Vítor Silva, Maarten Bos, Yozon Liu, Leonardo Neves, David Jurgens, Francesco Barbieri

We propose MINT, a new Multilingual INTimacy analysis dataset covering 13, 372 tweets in 10 languages including English, French, Spanish, Italian, Portuguese, Korean, Dutch, Chinese, Hindi, and Arabic.

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

3 code implementations9 Jun 2022 Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu

BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.

Common Sense Reasoning Math +1

An Attention-Based Model for Predicting Contextual Informativeness and Curriculum Learning Applications

1 code implementation21 Apr 2022 Sungjin Nam, David Jurgens, Gwen Frishkoff, Kevyn Collins-Thompson

Second, we show how our model identifies key contextual elements in a sentence that are likely to contribute most to a reader's understanding of the target word.

Informativeness Sentence

ByT5 model for massively multilingual grapheme-to-phoneme conversion

1 code implementation6 Apr 2022 Jian Zhu, Cong Zhang, David Jurgens

In this study, we tackle massively multilingual grapheme-to-phoneme conversion through implementing G2P models based on ByT5.

Networks and Identity Drive Geographic Properties of the Diffusion of Linguistic Innovation

no code implementations10 Feb 2022 Aparna Ananthasubramaniam, David Jurgens, Daniel M. Romero

In this study, we show that demographic identity and network topology are both required to model the diffusion of innovation, as they play complementary roles in producing its spatial properties.

Cultural Vocal Bursts Intensity Prediction

Phone-to-audio alignment without text: A Semi-supervised Approach

1 code implementation8 Oct 2021 Jian Zhu, Cong Zhang, David Jurgens

The task of phone-to-audio alignment has many applications in speech research.

Contrastive Learning

Measuring Sentence-Level and Aspect-Level (Un)certainty in Science Communications

no code implementations EMNLP 2021 Jiaxin Pei, David Jurgens

Here, we introduce a new study of certainty that models both the level and the aspects of certainty in scientific findings.

Sentence

Using Sociolinguistic Variables to Reveal Changing Attitudes Towards Sexuality and Gender

no code implementations EMNLP 2021 Sky CH-Wang, David Jurgens

The linguistic choices in each variable allow us to study increased rates of acceptances of gay marriage and gender equality, respectively.

Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles

1 code implementation EMNLP 2021 Jian Zhu, David Jurgens

Furthermore, we provide a description of idiolects through measuring inter- and intra-author variation, showing that variation in idiolects is often distinctive yet consistent.

HamiltonDinggg at SemEval-2021 Task 5: Investigating Toxic Span Detection using RoBERTa Pre-training

no code implementations SEMEVAL 2021 Huiyang Ding, David Jurgens

In this paper, we demonstrate our system for detecting toxic spans, which includes expanding the toxic training set with Local Interpretable Model-Agnostic Explanations (LIME), fine-tuning RoBERTa model for detection, and error analysis.

Toxic Spans Detection

Detecting Cross-Geographic Biases in Toxicity Modeling on Social Media

no code implementations WNUT (ACL) 2021 Sayan Ghosh, Dylan Baker, David Jurgens, Vinodkumar Prabhakaran

Online social media platforms increasingly rely on Natural Language Processing (NLP) techniques to detect abusive content at scale in order to mitigate the harms it causes to their users.

Bias Detection

The structure of online social networks modulates the rate of lexical change

1 code implementation NAACL 2021 Jian Zhu, David Jurgens

Using Poisson regression and survival analysis, our study demonstrates that the community's network structure plays a significant role in lexical change.

Survival Analysis

Conversations Gone Alright: Quantifying and Predicting Prosocial Outcomes in Online Conversations

1 code implementation16 Feb 2021 Jiajun Bao, Junjie Wu, Yiming Zhang, Eshwar Chandrasekharan, David Jurgens

Online conversations can go in many directions: some turn out poorly due to antisocial behavior, while others turn out positively to the benefit of all.

Quantifying Intimacy in Language

no code implementations EMNLP 2020 Jiaxin Pei, David Jurgens

Intimacy is a fundamental aspect of how we relate to others in social settings.

A Just and Comprehensive Strategy for Using NLP to Address Online Abuse

no code implementations ACL 2019 David Jurgens, Eshwar Chandrasekharan, Libby Hemphill

Online abusive behavior affects millions and the NLP community has attempted to mitigate this problem by developing technologies to detect abuse.

Position

Demographic Inference and Representative Population Estimates from Multilingual Social Media Data

no code implementations15 May 2019 Zijian Wang, Scott A. Hale, David Adelani, Przemyslaw A. Grabowicz, Timo Hartmann, Fabian Flöck, David Jurgens

In a large experiment over multilingual heterogeneous European regions, we show that our demographic inference and bias correction together allow for more accurate estimates of populations and make a significant step towards representative social sensing in downstream applications with multilingual social media.

Attribute

Still out there: Modeling and Identifying Russian Troll Accounts on Twitter

2 code implementations31 Jan 2019 Jane Im, Eshwar Chandrasekharan, Jackson Sargent, Paige Lighthammer, Taylor Denby, Ankit Bhargava, Libby Hemphill, David Jurgens, Eric Gilbert

In this work, we: 1) develop machine learning models that predict whether a Twitter account is a Russian troll within a set of 170K control accounts; and, 2) demonstrate that it is possible to use this model to find active accounts on Twitter still likely acting on behalf of the Russian state.

Social and Information Networks Computers and Society

It's going to be okay: Measuring Access to Support in Online Communities

no code implementations EMNLP 2018 Zijian Wang, David Jurgens

People use online platforms to seek out support for their informational and emotional needs.

Citation Classification for Behavioral Analysis of a Scientific Field

no code implementations2 Sep 2016 David Jurgens, Srijan Kumar, Raine Hoover, Dan McFarland, Dan Jurafsky

Citations are an important indicator of the state of a scientific field, reflecting how authors frame their work, and influencing uptake by future scholars.

Classification General Classification

Annotating Characters in Literary Corpora: A Scheme, the CHARLES Tool, and an Annotated Novel

no code implementations LREC 2016 Hardik Vala, Stefan Dimitrov, David Jurgens, Andrew Piper, Derek Ruths

To address the latter problem, this work presents three contributions: (1) a comprehensive scheme for manually resolving mentions to characters in texts.

An analysis of ambiguity in word sense annotations

no code implementations LREC 2014 David Jurgens

Word sense annotation is a challenging task where annotators distinguish which meaning of a word is present in a given context.

Word Sense Disambiguation

Cannot find the paper you are looking for? You can Submit a new open access paper.