no code implementations • 27 May 2023 • Kaize Ding, Albert Jiongqian Liang, Bryan Perrozi, Ting Chen, Ruoxi Wang, Lichan Hong, Ed H. Chi, Huan Liu, Derek Zhiyuan Cheng
Learning expressive representations for high-dimensional yet sparse features has been a longstanding problem in information retrieval.
no code implementations • 24 May 2023 • Konstantina Christakopoulou, Alberto Lalama, Cj Adams, Iris Qu, Yifat Amir, Samer Chucri, Pierce Vollucci, Fabio Soldo, Dina Bseiso, Sarah Scodel, Lucas Dixon, Ed H. Chi, Minmin Chen
We argue, and demonstrate through extensive experiments, that LLMs as foundation models can reason through user activities, and describe their interests in nuanced and interesting ways, similar to how a human would.
no code implementations • 22 May 2023 • Ananth Balashankar, Xuezhi Wang, Yao Qin, Ben Packer, Nithum Thain, Jilin Chen, Ed H. Chi, Alex Beutel
We demonstrate that with a small amount of human-annotated counterfactual data (10%), we can generate a counterfactual augmentation dataset with learned labels, that provides an 18-20% improvement in robustness and a 14-21% reduction in errors on 6 out-of-domain datasets, comparable to that of a fully human-annotated counterfactual dataset for both sentiment classification and question paraphrase tasks.
no code implementations • 20 May 2023 • Benjamin Coleman, Wang-Cheng Kang, Matthew Fahrbach, Ruoxi Wang, Lichan Hong, Ed H. Chi, Derek Zhiyuan Cheng
Learning high-quality feature embeddings efficiently and effectively is critical for the performance of web-scale machine learning systems.
no code implementations • 8 May 2023 • Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan H. Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q. Tran, Jonah Samost, Maciej Kula, Ed H. Chi, Maheswaran Sathiamoorthy
Once we have the Semantic IDs for all the items, a Transformer based sequence-to-sequence model is trained to predict the Semantic ID of the next item.
no code implementations • 22 Feb 2023 • Yao Qin, Xuezhi Wang, Balaji Lakshminarayanan, Ed H. Chi, Alex Beutel
A wide breadth of research has devised data augmentation approaches that can improve both accuracy and generalization performance for neural networks.
1 code implementation • 17 Feb 2023 • Jiaxi Tang, Yoel Drori, Daryl Chang, Maheswaran Sathiamoorthy, Justin Gilmer, Li Wei, Xinyang Yi, Lichan Hong, Ed H. Chi
Recommender systems play an important role in many content platforms.
no code implementations • 17 Nov 2022 • Bo Chang, Alexandros Karatzoglou, Yuyan Wang, Can Xu, Ed H. Chi, Minmin Chen
We demonstrate the effectiveness of the latent user intent modeling via offline analyses as well as live experiments on a large-scale industrial recommendation platform.
no code implementations • 25 Oct 2022 • Yin Zhang, Ruoxi Wang, Derek Zhiyuan Cheng, Tiansheng Yao, Xinyang Yi, Lichan Hong, James Caverlee, Ed H. Chi
Most existing methods mainly attempt to reduce the bias from the prior perspective, which ignores the discrepancy in the conditional probability.
2 code implementations • 20 Oct 2022 • Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, Jason Wei
We find that instruction finetuning with the above aspects dramatically improves performance on a variety of model classes (PaLM, T5, U-PaLM), prompting setups (zero-shot, few-shot, CoT), and evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generation).
Ranked #1 on
Coreference Resolution
on WSC
Cross-Lingual Question Answering
Multi-task Language Understanding
+1
1 code implementation • 17 Oct 2022 • Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei
BIG-Bench (Srivastava et al., 2022) is a diverse evaluation suite that focuses on tasks believed to be beyond the capabilities of current language models.
no code implementations • 14 Oct 2022 • Flavien Prost, Ben Packer, Jilin Chen, Li Wei, Pierre Kremp, Nicholas Blumm, Susan Wang, Tulsee Doshi, Tonia Osadebe, Lukasz Heldt, Ed H. Chi, Alex Beutel
We reconcile these notions and show that the tension is due to differences in distributions of users where items are relevant, and break down the important factors of the user's recommendations.
no code implementations • 30 Sep 2022 • Konstantina Christakopoulou, Can Xu, Sai Zhang, Sriraj Badam, Trevor Potter, Daniel Li, Hao Wan, Xinyang Yi, Ya Le, Chris Berg, Eric Bencomo Dixon, Ed H. Chi, Minmin Chen
How might we design Reinforcement Learning (RL)-based recommenders that encourage aligning user trajectories with the underlying user satisfaction?
no code implementations • 15 Jun 2022 • Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus
Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks.
no code implementations • 19 May 2022 • Ziniu Hu, Zhe Zhao, Xinyang Yi, Tiansheng Yao, Lichan Hong, Yizhou Sun, Ed H. Chi
First, the risk of having non-causal knowledge is higher, as the shared MTL model needs to encode all knowledge from different tasks, and causal knowledge for one task could be potentially spurious to the other.
no code implementations • 2 Apr 2022 • Jianling Wang, Ya Le, Bo Chang, Yuyan Wang, Ed H. Chi, Minmin Chen
Users who come to recommendation platforms are heterogeneous in activity levels.
no code implementations • 1 Mar 2022 • Yun He, Huaixiu Steven Zheng, Yi Tay, Jai Gupta, Yu Du, Vamsi Aribandi, Zhe Zhao, Yaguang Li, Zhao Chen, Donald Metzler, Heng-Tze Cheng, Ed H. Chi
Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient way.
no code implementations • 2 Feb 2022 • Kiran Vodrahalli, Rakesh Shivanna, Maheswaran Sathiamoorthy, Sagar Jain, Ed H. Chi
We propose a novel low-rank initialization framework for training low-rank deep neural networks -- networks where the weight parameters are re-parameterized by products of two low-rank matrices.
3 code implementations • NeurIPS 2021 • Hussein Hazimeh, Zhe Zhao, Aakanksha Chowdhery, Maheswaran Sathiamoorthy, Yihua Chen, Rahul Mazumder, Lichan Hong, Ed H. Chi
State-of-the-art MoE models use a trainable sparse gate to select a subset of the experts for each input example.
no code implementations • 4 Jun 2021 • Yuyan Wang, Xuezhi Wang, Alex Beutel, Flavien Prost, Jilin Chen, Ed H. Chi
This presents a multi-dimensional Pareto frontier on (1) the trade-off between group fairness and accuracy with respect to each task, as well as (2) the trade-offs across multiple tasks.
no code implementations • 20 May 2021 • Flavien Prost, Pranjal Awasthi, Nick Blumm, Aditee Kumthekar, Trevor Potter, Li Wei, Xuezhi Wang, Ed H. Chi, Jilin Chen, Alex Beutel
In this work we study the problem of measuring the fairness of a machine learning model under noisy information.
no code implementations • 6 May 2021 • Ruohan Zhan, Konstantina Christakopoulou, Ya Le, Jayden Ooi, Martin Mladenov, Alex Beutel, Craig Boutilier, Ed H. Chi, Minmin Chen
We then build a REINFORCE recommender agent, coined EcoAgent, to optimize a joint objective of user utility and the counterfactual utility lift of the provider associated with the recommended content, which we show to be equivalent to maximizing overall user utility and the utilities of all providers on the platform under some mild assumptions.
no code implementations • 12 Jan 2021 • Sirui Yao, Yoni Halpern, Nithum Thain, Xuezhi Wang, Kang Lee, Flavien Prost, Ed H. Chi, Jilin Chen, Alex Beutel
Using this simulation framework, we can (a) isolate the effect of the recommender system from the user preferences, and (b) examine how the system performs not just on average for an "average user" but also the extreme experiences under atypical user behavior.
no code implementations • 23 Dec 2020 • Hussam Abu-Libdeh, Deniz Altınbüken, Alex Beutel, Ed H. Chi, Lyric Doshi, Tim Kraska, Xiaozhou, Li, Andy Ly, Christopher Olston
There is great excitement about learned index structures, but understandable skepticism about the practicality of a new method uprooting decades of research on B-Trees.
no code implementations • 29 Oct 2020 • Yin Zhang, Derek Zhiyuan Cheng, Tiansheng Yao, Xinyang Yi, Lichan Hong, Ed H. Chi
It is also very encouraging that our framework further improves head items and overall performance on top of the gains on tail items.
no code implementations • 21 Oct 2020 • Wang-Cheng Kang, Derek Zhiyuan Cheng, Tiansheng Yao, Xinyang Yi, Ting Chen, Lichan Hong, Ed H. Chi
Embedding learning of categorical features (e. g. user/item IDs) is at the core of various recommendation models including matrix factorization and neural collaborative filtering.
9 code implementations • 19 Aug 2020 • Ruoxi Wang, Rakesh Shivanna, Derek Z. Cheng, Sagar Jain, Dong Lin, Lichan Hong, Ed H. Chi
Learning effective feature crosses is the key behind building recommender systems.
Ranked #4 on
Click-Through Rate Prediction
on Criteo
no code implementations • 17 Aug 2020 • Zhe Chen, Yuyan Wang, Dong Lin, Derek Zhiyuan Cheng, Lichan Hong, Ed H. Chi, Claire Cui
Despite deep neural network (DNN)'s impressive prediction performance in various domains, it is well known now that a set of DNN models trained with the same model specification and the same data can produce very different prediction results.
no code implementations • 13 Aug 2020 • Yuyan Wang, Zhe Zhao, Bo Dai, Christopher Fifty, Dong Lin, Lichan Hong, Ed H. Chi
A delicate balance between multi-task generalization and multi-objective optimization is therefore needed for finding a better trade-off between efficiency and generalization.
no code implementations • 7 Aug 2020 • Tao Wu, Ellie Ka-In Chio, Heng-Tze Cheng, Yu Du, Steffen Rendle, Dima Kuzmin, Ritesh Agarwal, Li Zhang, John Anderson, Sarvjeet Singh, Tushar Chandra, Ed H. Chi, Wen Li, Ankit Kumar, Xiang Ma, Alex Soares, Nitin Jindal, Pei Cao
In light of these problems, we observed that most online content platforms have both a search and a recommender system that, while having heterogeneous input spaces, can be connected through their common output item space and a shared semantic representation.
no code implementations • 25 Jul 2020 • Tiansheng Yao, Xinyang Yi, Derek Zhiyuan Cheng, Felix Yu, Ting Chen, Aditya Menon, Lichan Hong, Ed H. Chi, Steve Tjoa, Jieqi Kang, Evan Ettinger
Our online results also verify our hypothesis that our framework indeed improves model performance even more on slices that lack supervision.
no code implementations • NeurIPS 2021 • Yao Qin, Xuezhi Wang, Alex Beutel, Ed H. Chi
To this end, we propose Adversarial Robustness based Adaptive Label Smoothing (AR-AdaLS) that integrates the correlations of adversarial robustness and calibration into training by adaptively softening labels for an example based on how easily it can be attacked by an adversary.
4 code implementations • NeurIPS 2020 • Preethi Lahoti, Alex Beutel, Jilin Chen, Kang Lee, Flavien Prost, Nithum Thain, Xuezhi Wang, Ed H. Chi
Much of the previous machine learning (ML) fairness literature assumes that protected features such as race and sex are present in the dataset, and relies upon them to mitigate fairness concerns.
no code implementations • 9 Jun 2020 • Jiaqi Ma, Xinyang Yi, Weijing Tang, Zhe Zhao, Lichan Hong, Ed H. Chi, Qiaozhu Mei
We investigate the Plackett-Luce (PL) model based listwise learning-to-rank (LTR) on data with partitioned preference, where a set of items are sliced into ordered and disjoint partitions, but the ranking of items within a partition is unknown.
no code implementations • 16 Mar 2020 • Carole-Jean Wu, Robin Burke, Ed H. Chi, Joseph Konstan, Julian McAuley, Yves Raimond, Hao Zhang
Deep learning-based recommendation models are used pervasively and broadly, for example, to recommend movies, products, or other information most relevant to users, in order to enhance the user experience.
no code implementations • 20 Feb 2020 • Wang-Cheng Kang, Derek Zhiyuan Cheng, Ting Chen, Xinyang Yi, Dong Lin, Lichan Hong, Ed H. Chi
In this paper, we seek to learn highly compact embeddings for large-vocab sparse features in recommender systems (recsys).
no code implementations • 10 Feb 2020 • Jiaxi Tang, Rakesh Shivanna, Zhe Zhao, Dong Lin, Anima Singh, Ed H. Chi, Sagar Jain
Knowledge Distillation (KD) is a model-agnostic technique to improve model quality while having a fixed capacity budget.
no code implementations • 5 Nov 2019 • Xuezhi Wang, Nithum Thain, Anu Sinha, Flavien Prost, Ed H. Chi, Jilin Chen, Alex Beutel
In addition to the theoretical results, we find on multiple datasets -- including a large-scale real-world recommender system -- that the overall system's end-to-end fairness is largely achievable by improving fairness in individual components.
no code implementations • 25 Oct 2019 • Flavien Prost, Hai Qian, Qiuwen Chen, Ed H. Chi, Jilin Chen, Alex Beutel
As recent literature has demonstrated how classifiers often carry unintended biases toward some subgroups, deploying machine learned models to users demands careful consideration of the social consequences.
no code implementations • 25 Sep 2019 • Dar Gilboa, Bo Chang, Minmin Chen, Greg Yang, Samuel S. Schoenholz, Ed H. Chi, Jeffrey Pennington
We demonstrate the efficacy of our initialization scheme on multiple sequence tasks, on which it enables successful training while a standard initialization either fails completely or is orders of magnitude slower.
no code implementations • 24 Jun 2019 • Candice Schumann, Xuezhi Wang, Alex Beutel, Jilin Chen, Hai Qian, Ed H. Chi
A model trained for one setting may be picked up and used in many others, particularly as is common with pre-training and cloud APIs.
no code implementations • 23 May 2019 • Francois Belletti, Minmin Chen, Ed H. Chi
Characterizing temporal dependence patterns is a critical step in understanding the statistical properties of sequential data.
no code implementations • 2 Mar 2019 • Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Li Wei, Yi Wu, Lukasz Heldt, Zhe Zhao, Lichan Hong, Ed H. Chi, Cristos Goodrow
Recommender systems are one of the most pervasive applications of machine learning in industry, with many services using them to match users to products or information.
1 code implementation • ICLR 2019 • Bo Chang, Minmin Chen, Eldad Haber, Ed H. Chi
In this paper, we draw connections between recurrent networks and ordinary differential equations.
no code implementations • 22 Feb 2019 • Jiaxi Tang, Francois Belletti, Sagar Jain, Minmin Chen, Alex Beutel, Can Xu, Ed H. Chi
Our approach employs a mixture of models, each with a different temporal range.
no code implementations • 25 Jan 2019 • Dar Gilboa, Bo Chang, Minmin Chen, Greg Yang, Samuel S. Schoenholz, Ed H. Chi, Jeffrey Pennington
We demonstrate the efficacy of our initialization scheme on multiple sequence tasks, on which it enables successful training while a standard initialization either fails completely or is orders of magnitude slower.
no code implementations • 14 Jan 2019 • Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Allison Woodruff, Christine Luu, Pierre Kreitmann, Jonathan Bischof, Ed H. Chi
In this paper we provide a case-study on the application of fairness in machine learning research to a production classification system, and offer new insights in how to measure and address algorithmic fairness issues.
no code implementations • 27 Sep 2018 • Sahaj Garg, Vincent Perot, Nicole Limtiaco, Ankur Taly, Ed H. Chi, Alex Beutel
In this paper, we study counterfactual fairness in text classification, which asks the question: How would the prediction change if the sensitive attribute referenced in the example were different?
7 code implementations • 4 Dec 2017 • Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis
Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not.
no code implementations • 1 Jul 2017 • Alex Beutel, Jilin Chen, Zhe Zhao, Ed H. Chi
How can we learn a classifier that is "fair" for a protected or sensitive group, when we do not know if the input to the classifier belongs to the protected group?