no code implementations • insights (ACL) 2022 • Pedro Rodriguez, Phu Mon Htut, John Lalor, João Sedoc
In natural language processing, multi-dataset benchmarks for common tasks (e. g., SuperGLUE for natural language inference and MRQA for question answering) have risen in importance.
no code implementations • NAACL (GeBNLP) 2022 • Prasanna Parasurama, João Sedoc
Despite growing concerns around gender bias in NLP models used in algorithmic hiring, there is little empirical work studying the extent and nature of gendered language in resumes. Using a corpus of 709k resumes from IT firms, we train a series of models to classify the gender of the applicant, thereby measuring the extent of gendered information encoded in resumes. We also investigate whether it is possible to obfuscate gender from resumes by removing gender identifiers, hobbies, gender sub-space in embedding models, etc. We find that there is a significant amount of gendered information in resumes even after obfuscation. A simple Tf-Idf model can learn to classify gender with AUROC=0. 75, and more sophisticated transformer-based models achieve AUROC=0. 8. We further find that gender predictive values have low correlation with gender direction of embeddings – meaning that, what is predictive of gender is much more than what is “gendered” in the masculine/feminine sense. We discuss the algorithmic bias and fairness implications of these findings in the hiring context.
no code implementations • Findings (ACL) 2022 • Ann-Katrin Reuel, Sebastian Peralta, João Sedoc, Garrick Sherman, Lyle Ungar
Being able to reliably estimate self-disclosure – a key component of friendship and intimacy – from language is important for many psychology studies.
no code implementations • EMNLP (Eval4NLP) 2020 • João Sedoc, Lyle Ungar
Conversational agent quality is currently assessed using human evaluation, and often requires an exorbitant number of comparisons to achieve statistical significance.
no code implementations • WASSA (ACL) 2022 • Valentin Barriere, Shabnam Tafreshi, João Sedoc, Sawsan Alqahtani
This paper presents the results that were obtained from WASSA 2022 shared task on predicting empathy, emotion, and personality in reaction to news stories.
no code implementations • EACL (WASSA) 2021 • Darren Edmonds, João Sedoc
Song lyrics convey a multitude of emotions to the listener and powerfully portray the emotional state of the writer or singer.
no code implementations • EACL (WASSA) 2021 • Shabnam Tafreshi, Orphee De Clercq, Valentin Barriere, Sven Buechel, João Sedoc, Alexandra Balahur
This paper presents the results that were obtained from the WASSA 2021 shared task on predicting empathy and emotions.
no code implementations • EACL (Louhi) 2021 • Shuang Gao, Shivani Pandya, Smisha Agarwal, João Sedoc
This paper applies topic modeling to understand maternal health topics, concerns, and questions expressed in online communities on social networking sites.
1 code implementation • EMNLP (NLP-COVID19) 2020 • Adam Poliak, Max Fleming, Cash Costello, Kenton Murray, Mahsa Yarmohammadi, Shivani Pandya, Darius Irani, Milind Agarwal, Udit Sharma, Shuo Sun, Nicola Ivanov, Lingxi Shang, Kaushik Srinivasan, Seolhwa Lee, Xu Han, Smisha Agarwal, João Sedoc
We release a dataset of over 2, 100 COVID19 related Frequently asked Question-Answer pairs scraped from over 40 trusted websites.
2 code implementations • EMNLP (NLP-COVID19) 2020 • Seolhwa Lee, João Sedoc
To combat misinformation regarding COVID- 19 during this unprecedented pandemic, we propose a conversational agent that answers questions related to COVID-19.
no code implementations • 8 Apr 2025 • Vasant Dhar, João Sedoc
We compare DBOT to its analytic parent, Damodaran, and highlight the research challenges involved in raising its current capability to that of Damodaran's.
no code implementations • 18 Feb 2025 • Rubing Li, João Sedoc, Arun Sundararajan
When encountering increasingly frequent performance improvements or cost reductions from a new large language model (LLM), developers of applications leveraging LLMs must decide whether to take advantage of these improvements or stay with older tried-and-tested models.
no code implementations • 19 Dec 2024 • Xiang Cheng, Raveesh Mayya, João Sedoc
Unstructured text data annotation and analysis are fundamental to management research, often relying on human annotators through crowdsourcing platforms.
1 code implementation • 19 Nov 2024 • Tingting Liu, Salvatore Giorgi, Ankit Aich, Allison Lahnala, Brenda Curtis, Lyle Ungar, João Sedoc
As AI chatbots increasingly incorporate empathy, understanding user-centered perceptions of chatbot empathy and its impact on conversation quality remains essential yet under-explored.
no code implementations • 8 Sep 2024 • Andrew Smart, Ben Hutchinson, Lameck Mbangula Amugongo, Suzanne Dikker, Alex Zito, Amber Ebinama, Zara Wudiri, Ding Wang, Erin Van Liemt, João Sedoc, Seyi Olojo, Stanley Uwakwe, Edem Wornyo, Sonja Schmer-Galunder, Jamila Smith-Loud
There is growing interest in multilingual LLMs, and various efforts are striving for models to accommodate languages of communities outside of the Global North, which include many languages that have been historically underrepresented in digital realms.
no code implementations • 20 Jun 2024 • Salvatore Giorgi, Tingting Liu, Ankit Aich, Kelsey Isman, Garrick Sherman, Zachary Fried, João Sedoc, Lyle H. Ungar, Brenda Curtis
Thus, it may be the case that employing LLMs (which do not have such human factors) in these tasks results in a lack of variation in data, failing to reflect the diversity of human experiences.
no code implementations • 9 May 2024 • Aadesh Salecha, Molly E. Ireland, Shashanka Subrahmanya, João Sedoc, Lyle H. Ungar, Johannes C. Eichstaedt
As Large Language Models (LLMs) become widely used to model and simulate human behavior, understanding their biases becomes critical.
1 code implementation • 2 Apr 2024 • Marcel Nawrath, Agnieszka Nowak, Tristan Ratz, Danilo C. Walenta, Juri Opitz, Leonardo F. R. Ribeiro, João Sedoc, Daniel Deutsch, Simon Mille, Yixin Liu, Lining Zhang, Sebastian Gehrmann, Saad Mahamood, Miruna Clinciu, Khyathi Chandu, Yufang Hou
At the heart of the Pyramid evaluation method for text summarization lie human written summary content units (SCUs).
no code implementations • 9 Nov 2023 • Nikita Soni, H. Andrew Schwartz, João Sedoc, Niranjan Balasubramanian
As research in human-centered NLP advances, there is a growing recognition of the importance of incorporating human and social factors into NLP models.
1 code implementation • 25 Oct 2023 • Young Min Cho, Sunny Rai, Lyle Ungar, João Sedoc, Sharath Chandra Guntuku
Mental health conversational agents (a. k. a.
1 code implementation • 22 Jun 2023 • Mario Rodríguez-Cantelar, Chen Zhang, Chengguang Tang, Ke Shi, Sarik Ghazarian, João Sedoc, Luis Fernando D'Haro, Alexander Rudnicky
The advent and fast development of neural networks have revolutionized the research on dialogue systems and subsequently have triggered various challenges regarding their automatic evaluation.
no code implementations • 23 May 2023 • Huda Khayrallah, Zuhaib Akhtar, Edward Cohen, Jyothir S V, João Sedoc
We release MMSMR, a Massively Multi-System MultiReference dataset to enable future work on metrics and evaluation for dialog.
1 code implementation • 20 Dec 2022 • Lining Zhang, Simon Mille, Yufang Hou, Daniel Deutsch, Elizabeth Clark, Yixin Liu, Saad Mahamood, Sebastian Gehrmann, Miruna Clinciu, Khyathi Chandu, João Sedoc
To prevent the costly and inefficient use of resources on low-quality annotations, we want a method for creating a pool of dependable annotators who can effectively complete difficult tasks, such as evaluating automatic summarization.
no code implementations • 20 Nov 2022 • Li S. Yifei, Lyle Ungar, João Sedoc
We propose two methods of applying conceptors (1) bias subspace projection by post-processing by the conceptor NOT operation; and (2) a new architecture, conceptor-intervened BERT (CI-BERT), which explicitly incorporates the conceptor projection into all layers during training.
no code implementations • 20 Oct 2022 • Yukun Feng, Patrick Xia, Benjamin Van Durme, João Sedoc
Building pretrained language models is considered expensive and data-intensive, but must we increase dataset size to achieve better performance?
1 code implementation • 22 Jun 2022 • Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez-Beltrachini, Leonardo F. R. Ribeiro, Lewis Tunstall, Li Zhang, Mahima Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, Yufang Hou
This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims.
no code implementations • 25 May 2022 • Damilola Omitaomu, Shabnam Tafreshi, Tingting Liu, Sven Buechel, Chris Callison-Burch, Johannes Eichstaedt, Lyle Ungar, João Sedoc
Hence, we collected detailed characterization of the participants' traits, their self-reported empathetic response to news articles, their conversational partner other-report, and turn-by-turn third-party assessments of the level of self-disclosure, emotion, and empathy expressed.
1 code implementation • 24 May 2022 • Shai Gretz, Assaf Toledo, Roni Friedman, Dan Lahav, Rose Weeks, Naor Bar-Zeev, João Sedoc, Pooja Sangha, Yoav Katz, Noam Slonim
We use this framework to report baseline intent discovery results over VIRADialogs, that highlight the difficulty of this task.
1 code implementation • 24 May 2022 • Jeevesh Juneja, Rachit Bansal, Kyunghyun Cho, João Sedoc, Naomi Saphra
It is widely accepted in the mode connectivity literature that when two neural networks are trained similarly on the same data, they are connected by a path through parameter space over which test set accuracy is maintained.
no code implementations • 24 May 2022 • Roni Friedman, João Sedoc, Shai Gretz, Assaf Toledo, Rose Weeks, Naor Bar-Zeev, Yoav Katz, Noam Slonim
Public trust in medical information is crucial for successful application of public health policies such as vaccine uptake.
no code implementations • 16 Dec 2021 • Prasanna Parasurama, João Sedoc
We investigate whether it is feasible to remove gendered information from resumes to mitigate potential bias in algorithmic resume screening.
no code implementations • 16 Dec 2021 • Qi He, João Sedoc, Jordan Rodu
To date, there are no theoretical analyses of the Transformer's ability to capture tree structures.
2 code implementations • 3 Nov 2021 • Chen Zhang, João Sedoc, Luis Fernando D'Haro, Rafael Banchs, Alexander Rudnicky
The development of Open-Domain Dialogue Systems (ODS)is a trending topic due to the large number of research challenges, large societal and business impact, and advances in the underlying technology.
no code implementations • 29 Sep 2021 • Seolhwa Lee, Kisu Yang, Chanjun Park, João Sedoc, Heuiseok Lim
To the best of our knowledge, our approach is the first method to apply multi-task learning to the dialogue summarization task.
no code implementations • ACL (GEM) 2021 • Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna Clinciu, Dipanjan Das, Kaustubh D. Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Mihir Kale, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa Prasad Majumder, Pedro Henrique Martins, Angelina McMillan-Major, Simon Mille, Emiel van Miltenburg, Moin Nadeem, Shashi Narayan, Vitaly Nikolaev, Rubungo Andre Niyongabo, Salomey Osei, Ankur Parikh, Laura Perez-Beltrachini, Niranjan Ramesh Rao, Vikas Raunak, Juan Diego Rodriguez, Sashank Santhanam, João Sedoc, Thibault Sellam, Samira Shaikh, Anastasia Shimorina, Marco Antonio Sobrevilla Cabezudo, Hendrik Strobelt, Nishant Subramani, Wei Xu, Diyi Yang, Akhila Yerukola, Jiawei Zhou
We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics.
Ranked #1 on
Extreme Summarization
on GEM-XSum
Abstractive Text Summarization
Cross-Lingual Abstractive Summarization
+5
no code implementations • Findings of the Association for Computational Linguistics 2020 • Huda Khayrallah, João Sedoc
Non-task-oriented dialog models suffer from poor quality and non-diverse responses.
no code implementations • 24 Oct 2020 • Seolhwa Lee, Heuiseok Lim, João Sedoc
These findings demonstrate the feasibility of our protocol to evaluate conversational agents and evaluation sets.
no code implementations • NAACL 2021 • Huda Khayrallah, João Sedoc
We consider the intrinsic evaluation of neural generative dialog models through the lens of Grice's Maxims of Conversation (1975).
2 code implementations • ACL (GEM) 2021 • Alexandra DeLucia, Aaron Mueller, Xiang Lisa Li, João Sedoc
Narrative generation is an open-ended NLP task in which a model generates a story given a prompt.
1 code implementation • EMNLP 2020 • Nathaniel Weir, João Sedoc, Benjamin Van Durme
We present COD3S, a novel method for generating semantically diverse sentences using neural sequence-to-sequence (seq2seq) models.
1 code implementation • EMNLP 2020 • Patrick Xia, João Sedoc, Benjamin Van Durme
We investigate modeling coreference resolution under a fixed memory constraint by extending an incremental clustering algorithm to utilize contextualized encoders and neural components.
no code implementations • LREC 2020 • João Sedoc, Sven Buechel, Yehonathan Nachmany, Anneke Buffone, Lyle Ungar
The underlying problem of learning word ratings from higher-level supervision has to date only been addressed in an ad hoc fashion and has not used deep learning methods.
1 code implementation • ACL 2019 • Daphne Ippolito, Reno Kriz, Maria Kustikova, João Sedoc, Chris Callison-Burch
While conditional language models have greatly improved in their ability to output high-quality natural language, many NLP applications benefit from being able to generate a diverse set of candidate sequences.
no code implementations • WS 2019 • Saket Karve, Lyle Ungar, João Sedoc
Bias in word embeddings such as Word2Vec has been widely investigated, and many efforts made to remove such bias.
1 code implementation • NAACL 2019 • Tianlin Liu, Lyle Ungar, João Sedoc
Distributed representations of sentences have become ubiquitous in natural language processing tasks.
2 code implementations • NAACL 2019 • Reno Kriz, João Sedoc, Marianna Apidianaki, Carolina Zheng, Gaurav Kumar, Eleni Miltsakaki, Chris Callison-Burch
Sentence simplification is the task of rewriting texts so they are easier to understand.
Ranked #4 on
Text Simplification
on Newsela
1 code implementation • 17 Nov 2018 • Tianlin Liu, Lyle Ungar, João Sedoc
Word vectors are at the core of many natural language processing tasks.
no code implementations • COLING (PEOPLES) 2020 • Sven Buechel, João Sedoc, H. Andrew Schwartz, Lyle Ungar
One of the major downsides of Deep Learning is its supposed need for vast amounts of training data.
1 code implementation • EMNLP 2018 • Sven Buechel, Anneke Buffone, Barry Slaff, Lyle Ungar, João Sedoc
Computational detection and understanding of empathy is an important factor in advancing human-computer interaction.
no code implementations • ICLR 2018 • João Sedoc, Jordan Rodu, Dean Foster, Lyle Ungar
This paper presents a novel variant of hierarchical hidden Markov models (HMMs), the multiscale hidden Markov model (MSHMM), and an associated spectral estimation and prediction scheme that is consistent, finds global optima, and is computationally efficient.
no code implementations • ICLR 2018 • João Sedoc, Dean Foster, Lyle Ungar
We introduce a novel approach to tree-to-tree learning, the neural tree transducer (NTT), a top-down depth first context-sensitive tree decoder, which is paired with recursive neural encoders.
1 code implementation • 2 Aug 2017 • Grishma Jena, Mansi Vashisht, Abheek Basu, Lyle Ungar, João Sedoc
In this work, we propose a design for a chatbot that captures the "style" of Star Trek by incorporating references from the show along with peculiar tones of the fictional characters therein.
no code implementations • 2 Aug 2017 • Sajal Choudhary, Prerna Srivastava, Lyle Ungar, João Sedoc
We investigate the task of building a domain aware chat system which generates intelligent responses in a conversation comprising of different domains.
1 code implementation • 20 Jan 2016 • João Sedoc, Jean Gallier, Lyle Ungar, Dean Foster
Vector space representations of words capture many aspects of word similarity, but such methods tend to make vector spaces in which antonyms (as well as synonyms) are close to each other.