Data Summarization

33 papers with code • 0 benchmarks • 2 datasets

Data Summarization is a central problem in the area of machine learning, where we want to compute a small summary of the data.

Source: How to Solve Fair k-Center in Massive Data Models

Libraries

Use these libraries to find Data Summarization models and implementations

Most implemented papers

Semi-supervised Batch Active Learning via Bilevel Optimization

zalanborsos/bilevel_coresets 19 Oct 2020

Active learning is an effective technique for reducing the labeling cost by improving data efficiency.

Very Fast Streaming Submodular Function Maximization

sbuschjaeger/SubmodularStreamingMaximization 20 Oct 2020

Data summarization has become a valuable tool in understanding even terabytes of data.

Synthetic Dataset Generation of Driver Telematics

sstocksieker/dair 30 Jan 2021

This article describes techniques employed in the production of a synthetic dataset of driver telematics emulated from a similar real insurance dataset.

Submodlib: A Submodular Optimization Library

decile-team/submodlib 22 Feb 2022

A recent work has also leveraged submodular functions to propose submodular information measures which have been found to be very useful in solving the problems of guided subset selection and guided summarization.

Group Equality in Adaptive Submodular Maximization

j-yuan/gequality 7 Jul 2022

In this paper, we study the classic submodular maximization problem subject to a group equality constraint under both non-adaptive and adaptive settings.

Towards Neural Numeric-To-Text Generation From Temporal Personal Health Data

neato47/neural-numeric-to-text-generation 11 Jul 2022

We examine recurrent, convolutional, and Transformer-based encoder-decoder models to automatically generate natural language summaries from numeric temporal personal health data.

Streaming Algorithms for Diversity Maximization with Fairness Constraints

yhwang1990/code-fdm 30 Jul 2022

Given a set $X$ of $n$ elements, it asks to select a subset $S$ of $k \ll n$ elements with maximum \emph{diversity}, as quantified by the dissimilarities among the elements in $S$.

Balancing Utility and Fairness in Submodular Maximization (Technical Report)

yhwang1990/code-bsm-release 2 Nov 2022

Submodular function maximization is a fundamental combinatorial optimization problem with plenty of applications -- including data summarization, influence maximization, and recommendation.

Black-box Coreset Variational Inference

facebookresearch/blackbox-coresets-vi 4 Nov 2022

Recent advances in coreset methods have shown that a selection of representative datapoints can replace massive volumes of data for Bayesian inference, preserving the relevant statistical information and significantly accelerating subsequent downstream tasks.

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

huggingface/transformers 19 Dec 2022

Visual language data such as plots, charts, and infographics are ubiquitous in the human world.