Data Summarization

33 papers with code • 0 benchmarks • 2 datasets

Data Summarization is a central problem in the area of machine learning, where we want to compute a small summary of the data.

Source: How to Solve Fair k-Center in Massive Data Models

Benchmarks

Add a Result

These leaderboards are used to track progress in Data Summarization

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Libraries

Use these libraries to find Data Summarization models and implementations

MikeJaredS/hermiter

2 papers

Datasets

Most implemented papers

Most implemented Social Latest No code

Fair k-Center Clustering for Data Summarization

matthklein/fair_k_center_clustering • 24 Jan 2019

In data summarization we want to choose $k$ prototypes in order to summarize a data set.

Paper
Code

apricot: Submodular selection for data summarization in Python

jmschrei/apricot • 8 Jun 2019

This paper presents an explanation of submodular selection, an overview of the features in apricot, and an application to several data sets.

Paper
Code

Fast and Accurate Least-Mean-Squares Solvers

ibramjub/Fast-and-Accurate-Least-Mean-Squares-Solvers • NeurIPS 2019

Least-mean squares (LMS) solvers such as Linear / Ridge / Lasso-Regression, SVD and Elastic-Net not only solve fundamental machine learning problems, but are also the building blocks in a variety of other methods, such as decision trees and matrix factorizations.

Paper
Code

Scalability vs. Utility: Do We Have to Sacrifice One for the Other in Data Importance Quantification?

easeml/datascope • • CVPR 2021

Quantifying the importance of each training point to a learning task is a fundamental problem in machine learning and the estimated importance scores have been leveraged to guide a range of data workflows such as data summarization and domain adaption.

Paper
Code

Streaming Submodular Maximization under a $k$-Set System Constraint

ehsankazemi/streamingkextendible • 9 Feb 2020

In this paper, we propose a novel framework that converts streaming algorithms for monotone submodular maximization into streaming algorithms for non-monotone submodular maximization.

Paper
Code

CO-Optimal Transport

PythonOT/COOT • NeurIPS 2020

Optimal transport (OT) is a powerful geometric and probabilistic tool for finding correspondences and measuring similarity between two distributions.

Paper
Code

Deuteros 2.0: Peptide-level significance testing of data from hydrogen deuterium exchange mass spectrometry

andymlau/Deuteros_2.0 • 17 May 2020

There are currently very few software packages available that offer quick and informative comparison of HDX-MS datasets and even few-er which offer statistical analysis and advanced visualization.

Paper
Code

Understanding collections of related datasets using dependent MMD coresets

sinead/dmmd • 24 Jun 2020

Understanding how two datasets differ can help us determine whether one dataset under-represents certain sub-populations, and provides insights into how well models will generalize across datasets.

Paper
Code

$β$-Cores: Robust Large-Scale Bayesian Data Summarization in the Presence of Outliers

dionman/beta-cores • 31 Aug 2020

Modern machine learning applications should be able to address the intrinsic challenges arising over inference on massive real-world datasets, including scalability and robustness to outliers.

Paper
Code

Fair and Representative Subset Selection from Data Streams

FraFabbri/fair-subset-datastream • 9 Oct 2020

We study the problem of extracting a small subset of representative items from a large data stream.

Paper
Code

Data Summarization

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result