# Data Valuation

29 papers with code • 0 benchmarks • 0 datasets

Data valuation in machine learning tries to determine the worth of data, or data sets, for downstream tasks. Some methods are task-agnostic and consider datasets as a whole, mostly for decision making in data markets. These look at distributional distances between samples. More often, methods look at how individual points affect performance of specific machine learning models. They assign a scalar to each element of a training set which reflects its contribution to the final performance of some model trained on it. Some concepts of value depend on a specific model of interest, others are model-agnostic.

Concepts of the usefulness of a datum or its influence on the outcome of a prediction have a long history in statistics and ML, in particular through the notion of the influence function. However, it has only been recently that rigorous and practical notions of value for data, and in particular data-sets, have appeared in the ML literature, often based on concepts from collaborative game theory, but also from generalization estimates of neural networks, or optimal transport theory, among others.

## Benchmarks

These leaderboards are used to track progress in Data Valuation
## Libraries

Use these libraries to find Data Valuation models and implementations## Most implemented papers

# Data Shapley: Equitable Valuation of Data for Machine Learning

As data becomes the fuel driving technological and economic growth, a fundamental challenge is how to quantify the value of data in algorithmic predictions and decisions.

# Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms

The most surprising result is that for unweighted $K$NN classifiers and regressors, the Shapley value of all $N$ data points can be computed, exactly, in $O(N\log N)$ time -- an exponential improvement on computational complexity!

# Data Valuation using Reinforcement Learning

To adaptively learn data values jointly with the target task predictor model, we propose a meta learning framework which we name Data Valuation using Reinforcement Learning (DVRL).

# Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning

Data Shapley has recently been proposed as a principled framework to quantify the contribution of individual datum in machine learning.

# The Shapley Value in Machine Learning

Over the last few years, the Shapley value, a solution concept from cooperative game theory, has found numerous applications in machine learning.

# Data Banzhaf: A Robust Data Valuation Framework for Machine Learning

To address this challenge, we introduce the concept of safety margin, which measures the robustness of a data value notion.

# CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification

Our theoretical analysis shows the proposed value function is (essentially) the unique function that satisfies two desirable properties for evaluating data values in classification.

# Data-OOB: Out-of-bag Estimate as a Simple and Efficient Data Value

As a result, it has been recognized as infeasible to apply to large datasets.

# Towards Efficient Data Valuation Based on the Shapley Value

In this paper, we study the problem of data valuation by utilizing the Shapley value, a popular notion of value which originated in cooperative game theory.

# Improving Cooperative Game Theory-based Data Valuation via Data Utility Learning

The Shapley value (SV) and Least core (LC) are classic methods in cooperative game theory for cost/profit sharing problems.