no code implementations • 28 Sep 2022 • Sebastian Hofstätter, Jiecao Chen, Karthik Raman, Hamed Zamani
Retrieval-augmented generation models offer many benefits over standalone language models: besides a textual answer to a given query they provide provenance items retrieved from an updateable knowledge base.
no code implementations • 7 Jul 2022 • Sebastian Hofstätter, Jiecao Chen, Karthik Raman, Hamed Zamani
This paper studies multi-task training of retrieval-augmented generation models for knowledge-intensive tasks.
no code implementations • 16 Mar 2022 • Karthik Raman, Iftekhar Naim, Jiecao Chen, Kazuma Hashimoto, Kiran Yalasangi, Krishna Srinivasan
Pretrained, large, generative language models (LMs) have had great success in a wide range of sequence tagging and structured prediction tasks.
3 code implementations • 2 Mar 2021 • Krishna Srinivasan, Karthik Raman, Jiecao Chen, Michael Bendersky, Marc Najork
First, WIT is the largest multimodal dataset by the number of image-text examples by 3x (at the time of writing).
Ranked #1 on Image Retrieval on WIT
no code implementations • 23 Oct 2020 • Aditi Chaudhary, Karthik Raman, Krishna Srinivasan, Jiecao Chen
In particular, by requiring the model to predict the language-specific token, the MLM objective disincentivizes learning a language-agnostic representation -- which is a key goal of multilingual pre-training.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Jiecao Chen, Liu Yang, Karthik Raman, Michael Bendersky, Jung-Jung Yeh, Yun Zhou, Marc Najork, Danyang Cai, Ehsan Emadzadeh
Pre-trained models like BERT (Devlin et al., 2018) have dominated NLP / IR applications such as single sentence classification, text pair classification, and question answering.
no code implementations • NeurIPS 2019 • Ankit Singh Rawat, Jiecao Chen, Felix Yu, Ananda Theertha Suresh, Sanjiv Kumar
For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of classes and utilize an estimate of the loss gradient based on these classes, known as the sampled softmax method.
no code implementations • 29 Oct 2018 • Jiecao Chen, Qin Zhang
In this paper we study how to perform distinct sampling in the streaming model where data contain near-duplicates.
Data Structures and Algorithms
no code implementations • 16 Oct 2018 • Sashank J. Reddi, Satyen Kale, Felix Yu, Dan Holtmann-Rice, Jiecao Chen, Sanjiv Kumar
Furthermore, we identify a particularly intuitive class of loss functions in the aforementioned family and show that they are amenable to practical implementation in the large output space setting (i. e. computation is possible without evaluating scores of all labels) by developing a technique called Stochastic Negative Mining.
no code implementations • NeurIPS 2018 • Jiecao Chen, Erfan Sadeqi Azer, Qin Zhang
We study the classic $k$-means/median clustering, which are fundamental problems in unsupervised learning, in the setting where data are partitioned across multiple sites, and where we are allowed to discard a small portion of the data by labeling them as outliers.
no code implementations • NeurIPS 2018 • Jiecao Chen, Qin Zhang, Yuan Zhou
We study the collaborative PAC learning problem recently proposed in Blum et al.~\cite{BHPQ17}, in which we have $k$ players and they want to learn a target function collaboratively, such that the learned function approximates the target function well on all players' distributions simultaneously.
no code implementations • ICML 2017 • Matthew Schlegel, Yangchen Pan, Jiecao Chen, Martha White
In this work, we develop an approximately submodular criterion for this setting, and an efficient online greedy submodular maximization algorithm for optimizing the criterion.
no code implementations • ICML 2017 • Jiecao Chen, Xi Chen, Qin Zhang, Yuan Zhou
We study the problem of selecting $K$ arms with the highest expected rewards in a stochastic $n$-armed bandit game.
no code implementations • NeurIPS 2016 • Jiecao Chen, He Sun, David P. Woodruff, Qin Zhang
We would like the quality of the clustering in the distributed setting to match that in the centralized setting for which all the data resides on a single site.