Search Results for author: Paul Bennett

Found 22 papers, 11 papers with code

Say ‘YES’ to Positivity: Detecting Toxic Language in Workplace Communications

no code implementations Findings (EMNLP) 2021 Meghana Moorthy Bhat, Saghar Hosseini, Ahmed Hassan Awadallah, Paul Bennett, Weisheng Li

Specifically, the lack of corpus, sparsity of toxicity in enterprise emails, and well-defined criteria for annotating toxic conversations have prevented researchers from addressing the problem at scale.

Axiomatic Preference Modeling for Longform Question Answering

no code implementations2 Dec 2023 Corby Rosset, Guoqing Zheng, Victor Dibia, Ahmed Awadallah, Paul Bennett

The remarkable abilities of large language models (LLMs) like GPT-4 partially stem from post-training processes like Reinforcement Learning from Human Feedback (RLHF) involving human preferences encoded in a reward model.

Question Answering

ArK: Augmented Reality with Knowledge Interactive Emergent Ability

no code implementations1 May 2023 Qiuyuan Huang, Jae Sung Park, Abhinav Gupta, Paul Bennett, Ran Gong, Subhojit Som, Baolin Peng, Owais Khan Mohammed, Chris Pal, Yejin Choi, Jianfeng Gao

In this study, we develop an infinite agent that learns to transfer knowledge memory from general foundation models (e. g. GPT4, DALLE) to novel domains or scenarios for scene understanding and generation in the physical or virtual world.

Mixed Reality Scene Generation +1

Understanding Causality with Large Language Models: Feasibility and Opportunities

no code implementations11 Apr 2023 Cheng Zhang, Stefan Bauer, Paul Bennett, Jiangfeng Gao, Wenbo Gong, Agrin Hilmkil, Joel Jennings, Chao Ma, Tom Minka, Nick Pawlowski, James Vaughan

We assess the ability of large language models (LLMs) to answer causal questions by analyzing their strengths and weaknesses against three types of causal question.

Decision Making

Augmenting Zero-Shot Dense Retrievers with Plug-in Mixture-of-Memories

no code implementations7 Feb 2023 Suyu Ge, Chenyan Xiong, Corby Rosset, Arnold Overwijk, Jiawei Han, Paul Bennett

In this paper we improve the zero-shot generalization ability of language models via Mixture-Of-Memory Augmentation (MoMA), a mechanism that retrieves augmentation documents from multiple information corpora ("external memories"), with the option to "plug in" new memory at inference time.

Retrieval Zero-shot Generalization

METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals

no code implementations13 Apr 2022 Payal Bajaj, Chenyan Xiong, Guolin Ke, Xiaodong Liu, Di He, Saurabh Tiwary, Tie-Yan Liu, Paul Bennett, Xia Song, Jianfeng Gao

We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model.


Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

1 code implementation ICLR 2022 Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, Xia Song

We present a new framework AMOS that pretrains text encoders with an Adversarial learning curriculum via a Mixture Of Signals from multiple auxiliary generators.

Neural Approaches to Conversational Information Retrieval

no code implementations13 Jan 2022 Jianfeng Gao, Chenyan Xiong, Paul Bennett, Nick Craswell

A conversational information retrieval (CIR) system is an information retrieval (IR) system with a conversational interface which allows users to interact with the system to seek information via multi-turn conversations of natural language, in spoken or written form.

Information Retrieval Retrieval

Keep it Simple: Unsupervised Simplification of Multi-Paragraph Text

1 code implementation ACL 2021 Philippe Laban, Tobias Schnabel, Paul Bennett, Marti A. Hearst

This work presents Keep it Simple (KiS), a new approach to unsupervised text simplification which learns to balance a reward across three properties: fluency, salience and simplicity.

Reading Comprehension Text Simplification

COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

2 code implementations NeurIPS 2021 Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, Xia Song

The first token-level task, Corrective Language Modeling, is to detect and correct tokens replaced by the auxiliary model, in order to better capture token-level semantics.

Contrastive Learning Language Modelling +1

Few-Shot Text Ranking with Meta Adapted Synthetic Weak Supervision

1 code implementation ACL 2021 Si Sun, Yingzhuo Qian, Zhenghao Liu, Chenyan Xiong, Kaitao Zhang, Jie Bao, Zhiyuan Liu, Paul Bennett

To democratize the benefits of Neu-IR, this paper presents MetaAdaptRank, a domain adaptive learning method that generalizes Neu-IR models from label-rich source domains to few-shot target domains.

Information Retrieval Learning-To-Rank +1

Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval

5 code implementations ICLR 2021 Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, Arnold Overwijk

In this paper, we identify that the main bottleneck is in the training mechanisms, where the negative instances used in training are not representative of the irrelevant documents in testing.

Contrastive Learning Passage Retrieval +3

Few-Shot Generative Conversational Query Rewriting

1 code implementation9 Jun 2020 Shi Yu, Jiahua Liu, Jingqin Yang, Chenyan Xiong, Paul Bennett, Jianfeng Gao, Zhiyuan Liu

Conversational query rewriting aims to reformulate a concise conversational query to a fully specified, context-independent query that can be effectively handled by existing information retrieval systems.

Information Retrieval Retrieval +2

On Domain Transfer When Predicting Intent in Text

no code implementations NeurIPS Workshop Document_Intelligen 2019 Petar Stojanov, Ahmed Hassan Awadallah, Paul Bennett, Saghar Hosseini

In many domains, especially enterprise text analysis, there is an abundance of data which can be used for the development of new AI-powered intelligent experiences to improve people's productivity.

GATEtoGerManC: A GATE-based Annotation Pipeline for Historical German

no code implementations LREC 2012 Silke Scheible, Richard J. Whitt, Martin Durrell, Paul Bennett

We describe a new GATE-based linguistic annotation pipeline for Early Modern German, which can be used to annotate historical texts with word tokens, sentence boundaries, lemmas, and POS tags.

POS POS Tagging +1

Cannot find the paper you are looking for? You can Submit a new open access paper.