Validation Free and Replication Robust Volume-based Data Valuation

NeurIPS 2021 · Xinyi Xu, Zhaoxuan Wu, Chuan Sheng Foo, Bryan Kian Hsiang Low ·

Data valuation arises as a non-trivial challenge in use cases such as collaborative data sharing, data markets, among others. The value of data is often associated with the learning performance (e.g., validation accuracy) of the model trained on the data. This intuitive methodology introduces a high coupling between data valuation and validation. This may be undesirable because a validation set may not be available in practice, and it can be challenging for the data providers to reach an agreement on the choice of the validation set. A separate but practical issue is data replication. Given the value of some data points, a dishonest data provider may replicate these data points to exploit the valuation for a higher reward/payment. We observe that the diversity of the data points is an inherent property of the dataset that is independent of validation. We formalize diversity via the volume of the data matrix (determinant of its left Gram). This allows us to formally connect the diversity of data to the learning performance without requiring validation. Furthermore, we propose a robust volume with theoretical replication robustness guarantees by following the intuition that copying the same data points does not increase the diversity in data. We perform extensive experiments to demonstrate its consistency and practical advantages over existing baselines and show that our method is model- and task-agnostic and ﬂexibly adaptable to various neural networks.

PDF Abstract NeurIPS 2021 PDF NeurIPS 2021 Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Data Valuation

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Validation Free and Replication Robust Volume-based Data Valuation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove