Data Summarization
34 papers with code • 0 benchmarks • 2 datasets
Data Summarization is a central problem in the area of machine learning, where we want to compute a small summary of the data.
Benchmarks
These leaderboards are used to track progress in Data Summarization
Libraries
Use these libraries to find Data Summarization models and implementationsMost implemented papers
Soft-Label Dataset Distillation and Text Dataset Distillation
We propose to simultaneously distill both images and their labels, thus assigning each synthetic sample a `soft' label (a distribution of labels).
Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision
In our algorithm, at each iteration, the maximum information from the structure of the data is captured by one selected sample, and the captured information is neglected in the next iterations by projection on the null-space of previously selected samples.
Flexible Dataset Distillation: Learn Labels Instead of Images
In particular, we study the problem of label distillation - creating synthetic labels for a small set of real images, and show it to be more effective than the prior image-based approach to dataset distillation.
Sequential estimation of Spearman rank correlation using Hermite series estimators
To treat the non-stationary setting, we introduce a novel, exponentially weighted estimator for the Spearman rank correlation, which allows the local nonparametric correlation of a bivariate data stream to be tracked.
Sequential Quantiles via Hermite Series Density Estimation
These algorithms go beyond existing sequential quantile estimation algorithms in that they allow arbitrary quantiles (as opposed to pre-specified quantiles) to be estimated at any point in time.
Scalable k-Means Clustering via Lightweight Coresets
As such, they have been successfully used to scale up clustering models to massive data sets.
An Online Algorithm for Nonparametric Correlations
This paper investigates the problem of computing nonparametric correlations on the fly for streaming data.
Fair and Diverse DPP-based Data Summarization
Sampling methods that choose a subset of the data proportional to its diversity in the feature space are popular for data summarization.
A Mixed Hierarchical Attention based Encoder-Decoder Approach for Standard Table Summarization
Structured data summarization involves generation of natural language summaries from structured input data.
Coverage-Based Designs Improve Sample Mining and Hyper-Parameter Optimization
Sampling one or more effective solutions from large search spaces is a recurring idea in machine learning, and sequential optimization has become a popular solution.