In recommendation settings, there is an apparent trade-off between the goals of accuracy (to recommend items a user is most likely to want) and diversity (to recommend items representing a range of categories).
There has been a steep recent increase in the number of large language model (LLM) papers, producing a dramatic shift in the scientific landscape which remains largely undocumented through bibliometric analysis.
Many recommender systems are based on optimizing a linear weighting of different user behaviors, such as clicks, likes, shares, etc.
Across outcomes and metrics, we show that the risk scores exhibit significant granular performance disparities within coarse race groups.
no code implementations • 22 Nov 2022 • Charvi Rastogi, Ivan Stelmakh, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan, Zhenyu Xue, Hal Daumé III, Emma Pierson, Nihar B. Shah
In a top-tier computer science conference (NeurIPS 2021) with more than 23, 000 submitting authors and 9, 000 submitted papers, we survey the authors on three questions: (i) their predicted probability of acceptance for each of their papers, (ii) their perceived ranking of their own papers based on scientific contribution, and (iii) the change in their perception about their own papers after seeing the reviews.
Algorithms provide powerful tools for detecting and dissecting human bias and error.
Estimating the prevalence of a medical condition, or the proportion of the population in which it occurs, is a fundamental problem in healthcare and public health.
5 code implementations • 14 Dec 2020 • Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton A. Earnshaw, Imran S. Haque, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, Percy Liang
Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild.
The use of machine learning (ML) in health care raises numerous ethical concerns, especially as models can amplify existing health inequities.
We seek to learn models that we can interact with using high-level concepts: if the model did not think there was a bone spur in the x-ray, would it still predict severe arthritis?
Predicting pregnancy has been a fundamental problem in women's health for more than 50 years.
Motivated by the study of human aging, we present an interpretable latent-variable model that learns temporal dynamics from cross-sectional data.
We find that black drivers are stopped more often than white drivers relative to their share of the driving-age population, but that Hispanic drivers are stopped less often than whites.
We here present SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn a sample-to-sample similarity measure from expression data observed for heterogenous samples.