As Machine Learning (ML) systems continue to grow, the demand for relevant and comprehensive datasets becomes imperative.
Quantifying the impact of training data points is crucial for understanding the outputs of machine learning models and for improving the transparency of the AI pipeline.
Our work highlights the importance of understanding the nonlinear effects of model improvement on performance in different subpopulations, and has the potential to inform the development of more equitable and responsible machine learning models.
On several real-world datasets, we demonstrate that the influential features identified by WeightedSHAP are better able to recapitulate the model's predictions compared to the features identified by the Shapley value.
Our systematic analysis demonstrates that this gap is caused by a combination of model initialization and contrastive learning optimization.
We introduce a new environment that allows ML predictors to use active learning algorithms to purchase labeled data within their budgets while competing against each other to attract users.
Data Shapley has recently been proposed as a principled framework to quantify the contribution of individual datum in machine learning.
A service that is more often queried by users, perhaps because it more accurately anticipates user preferences, is also more likely to obtain additional user data (e. g. in the form of a Yelp review).
Distributional data Shapley value (DShapley) has recently been proposed as a principled framework to quantify the contribution of individual datum in machine learning.
We show that our approximation and risk consistency results naturally extend to the cases when data are locally perturbed.
Deep neural networks have outperformed existing machine learning models in various molecular applications.
We consider the problem of learning a binary classifier from only positive and unlabeled observations (called PU learning).
In chemistry, deep neural network models have been increasingly utilized in a variety of applications such as molecular property predictions, novel molecule designs, and planning chemical reactions.
Most recent research of neural networks in the field of computer vision has focused on improving accuracy of point predictions by developing various network architectures or learning algorithms.