Search Results for author: João Sedoc

Found 45 papers, 18 papers with code

Clustering Examples in Multi-Dataset Benchmarks with Item Response Theory

no code implementations insights (ACL) 2022 Pedro Rodriguez, Phu Mon Htut, John Lalor, João Sedoc

In natural language processing, multi-dataset benchmarks for common tasks (e. g., SuperGLUE for natural language inference and MRQA for question answering) have risen in importance.

Clustering Natural Language Inference +1

Gendered Language in Resumes and its Implications for Algorithmic Bias in Hiring

no code implementations NAACL (GeBNLP) 2022 Prasanna Parasurama, João Sedoc

Despite growing concerns around gender bias in NLP models used in algorithmic hiring, there is little empirical work studying the extent and nature of gendered language in resumes. Using a corpus of 709k resumes from IT firms, we train a series of models to classify the gender of the applicant, thereby measuring the extent of gendered information encoded in resumes. We also investigate whether it is possible to obfuscate gender from resumes by removing gender identifiers, hobbies, gender sub-space in embedding models, etc. We find that there is a significant amount of gendered information in resumes even after obfuscation. A simple Tf-Idf model can learn to classify gender with AUROC=0. 75, and more sophisticated transformer-based models achieve AUROC=0. 8. We further find that gender predictive values have low correlation with gender direction of embeddings – meaning that, what is predictive of gender is much more than what is “gendered” in the masculine/feminine sense. We discuss the algorithmic bias and fairness implications of these findings in the hiring context.


WASSA 2022 Shared Task: Predicting Empathy, Emotion and Personality in Reaction to News Stories

no code implementations WASSA (ACL) 2022 Valentin Barriere, Shabnam Tafreshi, João Sedoc, Sawsan Alqahtani

This paper presents the results that were obtained from WASSA 2022 shared task on predicting empathy, emotion, and personality in reaction to news stories.

Multi-Emotion Classification for Song Lyrics

no code implementations EACL (WASSA) 2021 Darren Edmonds, João Sedoc

Song lyrics convey a multitude of emotions to the listener and powerfully portray the emotional state of the writer or singer.

Classification Emotion Classification

Item Response Theory for Efficient Human Evaluation of Chatbots

no code implementations EMNLP (Eval4NLP) 2020 João Sedoc, Lyle Ungar

Conversational agent quality is currently assessed using human evaluation, and often requires an exorbitant number of comparisons to achieve statistical significance.

Chatbot Test

Measuring the Language of Self-Disclosure across Corpora

no code implementations Findings (ACL) 2022 Ann-Katrin Reuel, Sebastian Peralta, João Sedoc, Garrick Sherman, Lyle Ungar

Being able to reliably estimate self-disclosure – a key component of friendship and intimacy – from language is important for many psychology studies.

Using the Poly-encoder for a COVID-19 Question Answering System

2 code implementations EMNLP (NLP-COVID19) 2020 Seolhwa Lee, João Sedoc

To combat misinformation regarding COVID- 19 during this unprecedented pandemic, we propose a conversational agent that answers questions related to COVID-19.

Misinformation Question Answering +1

Topic Modeling for Maternal Health Using Reddit

no code implementations EACL (Louhi) 2021 Shuang Gao, Shivani Pandya, Smisha Agarwal, João Sedoc

This paper applies topic modeling to understand maternal health topics, concerns, and questions expressed in online communities on social networking sites.

Knowledge Distillation

Overview of Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems at DSTC 11 Track 4

1 code implementation22 Jun 2023 Mario Rodríguez-Cantelar, Chen Zhang, Chengguang Tang, Ke Shi, Sarik Ghazarian, João Sedoc, Luis Fernando D'Haro, Alexander Rudnicky

The advent and fast development of neural networks have revolutionized the research on dialogue systems and subsequently have triggered various challenges regarding their automatic evaluation.

Needle in a Haystack: An Analysis of High-Agreement Workers on MTurk for Summarization

1 code implementation20 Dec 2022 Lining Zhang, Simon Mille, Yufang Hou, Daniel Deutsch, Elizabeth Clark, Yixin Liu, Saad Mahamood, Sebastian Gehrmann, Miruna Clinciu, Khyathi Chandu, João Sedoc

To prevent the costly and inefficient use of resources on low-quality annotations, we want a method for creating a pool of dependable annotators who can effectively complete difficult tasks, such as evaluating automatic summarization.

Conceptor-Aided Debiasing of Large Language Models

no code implementations20 Nov 2022 Li S. Yifei, Lyle Ungar, João Sedoc

We propose two methods of applying conceptors (1) bias subspace projection by post-processing by the conceptor NOT operation; and (2) a new architecture, conceptor-intervened BERT (CI-BERT), which explicitly incorporates the conceptor projection into all layers during training.

Language Modelling

Automatic Document Selection for Efficient Encoder Pretraining

no code implementations20 Oct 2022 Yukun Feng, Patrick Xia, Benjamin Van Durme, João Sedoc

Building pretrained language models is considered expensive and data-intensive, but must we increase dataset size to achieve better performance?

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

no code implementations22 Jun 2022 Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez-Beltrachini, Leonardo F. R. Ribeiro, Lewis Tunstall, Li Zhang, Mahima Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, Yufang Hou

This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims.

Benchmarking Text Generation

Empathic Conversations: A Multi-level Dataset of Contextualized Conversations

no code implementations25 May 2022 Damilola Omitaomu, Shabnam Tafreshi, Tingting Liu, Sven Buechel, Chris Callison-Burch, Johannes Eichstaedt, Lyle Ungar, João Sedoc

Hence, we collected detailed characterization of the participants' traits, their self-reported empathetic response to news articles, their conversational partner other-report, and turn-by-turn third-party assessments of the level of self-disclosure, emotion, and empathy expressed.

Linear Connectivity Reveals Generalization Strategies

1 code implementation24 May 2022 Jeevesh Juneja, Rachit Bansal, Kyunghyun Cho, João Sedoc, Naomi Saphra

It is widely accepted in the mode connectivity literature that when two neural networks are trained similarly on the same data, they are connected by a path through parameter space over which test set accuracy is maintained.


VIRATrustData: A Trust-Annotated Corpus of Human-Chatbot Conversations About COVID-19 Vaccines

no code implementations24 May 2022 Roni Friedman, João Sedoc, Shai Gretz, Assaf Toledo, Rose Weeks, Naor Bar-Zeev, Yoav Katz, Noam Slonim

Public trust in medical information is crucial for successful application of public health policies such as vaccine uptake.


Trees in transformers: a theoretical analysis of the Transformer's ability to represent trees

no code implementations16 Dec 2021 Qi He, João Sedoc, Jordan Rodu

To date, there are no theoretical analyses of the Transformer's ability to capture tree structures.

Degendering Resumes for Fair Algorithmic Resume Screening

no code implementations16 Dec 2021 Prasanna Parasurama, João Sedoc

We investigate whether it is feasible to remove gendered information from resumes to mitigate potential bias in algorithmic resume screening.


Automatic Evaluation and Moderation of Open-domain Dialogue Systems

2 code implementations3 Nov 2021 Chen Zhang, João Sedoc, Luis Fernando D'Haro, Rafael Banchs, Alexander Rudnicky

The development of Open-Domain Dialogue Systems (ODS)is a trending topic due to the large number of research challenges, large societal and business impact, and advances in the underlying technology.

Chatbot Dialogue Evaluation

An Evaluation Protocol for Generative Conversational Systems

no code implementations24 Oct 2020 Seolhwa Lee, Heuiseok Lim, João Sedoc

These findings demonstrate the feasibility of our protocol to evaluate conversational agents and evaluation sets.

Experimental Design

Measuring the `I don't know' Problem through the Lens of Gricean Quantity

no code implementations NAACL 2021 Huda Khayrallah, João Sedoc

We consider the intrinsic evaluation of neural generative dialog models through the lens of Grice's Maxims of Conversation (1975).

COD3S: Diverse Generation with Discrete Semantic Signatures

1 code implementation EMNLP 2020 Nathaniel Weir, João Sedoc, Benjamin Van Durme

We present COD3S, a novel method for generating semantically diverse sentences using neural sequence-to-sequence (seq2seq) models.

Semantic Textual Similarity

Incremental Neural Coreference Resolution in Constant Memory

1 code implementation EMNLP 2020 Patrick Xia, João Sedoc, Benjamin Van Durme

We investigate modeling coreference resolution under a fixed memory constraint by extending an incremental clustering algorithm to utilize contextualized encoders and neural components.

Clustering coreference-resolution

Learning Word Ratings for Empathy and Distress from Document-Level User Responses

no code implementations LREC 2020 João Sedoc, Sven Buechel, Yehonathan Nachmany, Anneke Buffone, Lyle Ungar

The underlying problem of learning word ratings from higher-level supervision has to date only been addressed in an ad hoc fashion and has not used deep learning methods.

Clustering Emotion Recognition

Conceptor Debiasing of Word Representations Evaluated on WEAT

no code implementations WS 2019 Saket Karve, Lyle Ungar, João Sedoc

Bias in word embeddings such as Word2Vec has been widely investigated, and many efforts made to remove such bias.

Test Word Embeddings

Comparison of Diverse Decoding Methods from Conditional Language Models

1 code implementation ACL 2019 Daphne Ippolito, Reno Kriz, Maria Kustikova, João Sedoc, Chris Callison-Burch

While conditional language models have greatly improved in their ability to output high-quality natural language, many NLP applications benefit from being able to generate a diverse set of candidate sequences.

Modeling Empathy and Distress in Reaction to News Stories

1 code implementation EMNLP 2018 Sven Buechel, Anneke Buffone, Barry Slaff, Lyle Ungar, João Sedoc

Computational detection and understanding of empathy is an important factor in advancing human-computer interaction.

Neural Tree Transducers for Tree to Tree Learning

no code implementations ICLR 2018 João Sedoc, Dean Foster, Lyle Ungar

We introduce a novel approach to tree-to-tree learning, the neural tree transducer (NTT), a top-down depth first context-sensitive tree decoder, which is paired with recursive neural encoders.

Multiscale Hidden Markov Models For Covariance Prediction

no code implementations ICLR 2018 João Sedoc, Jordan Rodu, Dean Foster, Lyle Ungar

This paper presents a novel variant of hierarchical hidden Markov models (HMMs), the multiscale hidden Markov model (MSHMM), and an associated spectral estimation and prediction scheme that is consistent, finds global optima, and is computationally efficient.

Enterprise to Computer: Star Trek chatbot

1 code implementation2 Aug 2017 Grishma Jena, Mansi Vashisht, Abheek Basu, Lyle Ungar, João Sedoc

In this work, we propose a design for a chatbot that captures the "style" of Star Trek by incorporating references from the show along with peculiar tones of the fictional characters therein.


Domain Aware Neural Dialog System

no code implementations2 Aug 2017 Sajal Choudhary, Prerna Srivastava, Lyle Ungar, João Sedoc

We investigate the task of building a domain aware chat system which generates intelligent responses in a conversation comprising of different domains.

Semantic Word Clusters Using Signed Normalized Graph Cuts

1 code implementation20 Jan 2016 João Sedoc, Jean Gallier, Lyle Ungar, Dean Foster

Vector space representations of words capture many aspects of word similarity, but such methods tend to make vector spaces in which antonyms (as well as synonyms) are close to each other.

Clustering Word Similarity

Cannot find the paper you are looking for? You can Submit a new open access paper.