# Data Summarization

32 papers with code • 0 benchmarks • 2 datasets

**Data Summarization** is a central problem in the area of machine learning, where we want to compute a small summary of the data.

## Benchmarks

These leaderboards are used to track progress in Data Summarization
## Libraries

Use these libraries to find Data Summarization models and implementations## Most implemented papers

# Soft-Label Dataset Distillation and Text Dataset Distillation

We propose to simultaneously distill both images and their labels, thus assigning each synthetic sample a `soft' label (a distribution of labels).

# Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision

In our algorithm, at each iteration, the maximum information from the structure of the data is captured by one selected sample, and the captured information is neglected in the next iterations by projection on the null-space of previously selected samples.

# Flexible Dataset Distillation: Learn Labels Instead of Images

In particular, we study the problem of label distillation - creating synthetic labels for a small set of real images, and show it to be more effective than the prior image-based approach to dataset distillation.

# Sequential estimation of Spearman rank correlation using Hermite series estimators

To treat the non-stationary setting, we introduce a novel, exponentially weighted estimator for the Spearman rank correlation, which allows the local nonparametric correlation of a bivariate data stream to be tracked.

# Sequential Quantiles via Hermite Series Density Estimation

These algorithms go beyond existing sequential quantile estimation algorithms in that they allow arbitrary quantiles (as opposed to pre-specified quantiles) to be estimated at any point in time.

# Scalable k-Means Clustering via Lightweight Coresets

As such, they have been successfully used to scale up clustering models to massive data sets.

# An Online Algorithm for Nonparametric Correlations

This paper investigates the problem of computing nonparametric correlations on the fly for streaming data.

# Fair and Diverse DPP-based Data Summarization

Sampling methods that choose a subset of the data proportional to its diversity in the feature space are popular for data summarization.

# A Mixed Hierarchical Attention based Encoder-Decoder Approach for Standard Table Summarization

Structured data summarization involves generation of natural language summaries from structured input data.

# Coverage-Based Designs Improve Sample Mining and Hyper-Parameter Optimization

Sampling one or more effective solutions from large search spaces is a recurring idea in machine learning, and sequential optimization has become a popular solution.