Data Summarization

19 papers with code • 0 benchmarks • 0 datasets

Data Summarization is a central problem in the area of machine learning, where we want to compute a small summary of the data.

Source: How to Solve Fair k-Center in Massive Data Models


Greatest papers with code

apricot: Submodular selection for data summarization in Python

jmschrei/apricot 8 Jun 2019

This paper presents an explanation of submodular selection, an overview of the features in apricot, and an application to several data sets.

Data Summarization

Soft-Label Dataset Distillation and Text Dataset Distillation

ilia10000/LO-Shot 6 Oct 2019

We propose to simultaneously distill both images and their labels, thus assigning each synthetic sample a `soft' label (a distribution of labels).

Data Summarization Image Classification +1

Fast and Accurate Least-Mean-Squares Solvers

ibramjub/Fast-and-Accurate-Least-Mean-Squares-Solvers NeurIPS 2019

Least-mean squares (LMS) solvers such as Linear / Ridge / Lasso-Regression, SVD and Elastic-Net not only solve fundamental machine learning problems, but are also the building blocks in a variety of other methods, such as decision trees and matrix factorizations.

Data Summarization

Flexible Dataset Distillation: Learn Labels Instead of Images

ondrejbohdal/label-distillation 15 Jun 2020

In particular, we study the problem of label distillation - creating synthetic labels for a small set of real images, and show it to be more effective than the prior image-based approach to dataset distillation.

Data Summarization Meta-Learning

Semi-supervised Batch Active Learning via Bilevel Optimization

zalanborsos/bilevel_coresets 19 Oct 2020

Active learning is an effective technique for reducing the labeling cost by improving data efficiency.

Active Learning bilevel optimization +1

CO-Optimal Transport

PythonOT/COOT NeurIPS 2020

Optimal transport (OT) is a powerful geometric and probabilistic tool for finding correspondences and measuring similarity between two distributions.

Clustering Data Summarization +1

Sequential Estimation of Nonparametric Correlation using Hermite Series Estimators

MikeJaredS/hermiter 11 Dec 2020

To treat the non-stationary setting, we introduce a novel, exponentially weighted estimator for the Spearman's rank correlation, which allows the local nonparametric correlation of a bivariate data stream to be tracked.

Clustering Data Summarization +2

Sequential Quantiles via Hermite Series Density Estimation

MikeJaredS/hermiter 17 Jul 2015

These algorithms go beyond existing sequential quantile estimation algorithms in that they allow arbitrary quantiles (as opposed to pre-specified quantiles) to be estimated at any point in time.

Data Summarization Sequential Distribution Function Estimation +1

Fair k-Center Clustering for Data Summarization

matthklein/fair_k_center_clustering 24 Jan 2019

In data summarization we want to choose $k$ prototypes in order to summarize a data set.

Clustering Data Summarization +1

Very Fast Streaming Submodular Function Maximization

sbuschjaeger/SubmodularStreamingMaximization 20 Oct 2020

Data summarization has become a valuable tool in understanding even terabytes of data.

Data Summarization