Search Results for author: Saurabh Tiwary

Found 15 papers, 5 papers with code

METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals

no code implementations13 Apr 2022 Payal Bajaj, Chenyan Xiong, Guolin Ke, Xiaodong Liu, Di He, Saurabh Tiwary, Tie-Yan Liu, Paul Bennett, Xia Song, Jianfeng Gao

We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model.

Denoising

Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

1 code implementation ICLR 2022 Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, Xia Song

We present a new framework AMOS that pretrains text encoders with an Adversarial learning curriculum via a Mixture Of Signals from multiple auxiliary generators.

COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

2 code implementations NeurIPS 2021 Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, Xia Song

The first token-level task, Corrective Language Modeling, is to detect and correct tokens replaced by the auxiliary model, in order to better capture token-level semantics.

Contrastive Learning Language Modelling +1

Pretrain Knowledge-Aware Language Models

no code implementations1 Jan 2021 Corbin L Rosset, Chenyan Xiong, Minh Phan, Xia Song, Paul N. Bennett, Saurabh Tiwary

Rather, we simply signal the existence of entities to the input of the transformer in pretraining, with an entity-extended tokenizer; and at the output, with an additional entity prediction task.

Knowledge Probing Language Modelling +1

Generic Intent Representation in Web Search

no code implementations24 Jul 2019 Hongfei Zhang, Xia Song, Chenyan Xiong, Corby Rosset, Paul N. Bennett, Nick Craswell, Saurabh Tiwary

This paper presents GEneric iNtent Encoder (GEN Encoder) which learns a distributed representation space for user intent in search.

Multi-Task Learning

An Axiomatic Approach to Regularizing Neural Ranking Models

no code implementations15 Apr 2019 Corby Rosset, Bhaskar Mitra, Chenyan Xiong, Nick Craswell, Xia Song, Saurabh Tiwary

The training of these models involve a search for appropriate parameter values based on large quantities of labeled examples.

Information Retrieval Retrieval

Towards Language Agnostic Universal Representations

no code implementations ACL 2019 Armen Aghajanyan, Xia Song, Saurabh Tiwary

When a bilingual student learns to solve word problems in math, we expect the student to be able to solve these problem in both languages the student is fluent in, even if the math lessons were only taught in one language.

Math

Optimizing Query Evaluations using Reinforcement Learning for Web Search

no code implementations12 Apr 2018 Corby Rosset, Damien Jose, Gargi Ghosh, Bhaskar Mitra, Saurabh Tiwary

In web search, typically a candidate generation step selects a small set of documents---from collections containing as many as billions of web pages---that are subsequently ranked and pruned before being presented to the user.

reinforcement-learning Reinforcement Learning (RL)

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

12 code implementations28 Nov 2016 Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, Tong Wang

The size of the dataset and the fact that the questions are derived from real user search queries distinguishes MS MARCO from other well-known publicly available datasets for machine reading comprehension and question-answering.

Benchmarking Machine Reading Comprehension +1

Cannot find the paper you are looking for? You can Submit a new open access paper.