Data Summarization
33 papers with code • 0 benchmarks • 2 datasets
Data Summarization is a central problem in the area of machine learning, where we want to compute a small summary of the data.
Benchmarks
These leaderboards are used to track progress in Data Summarization
Libraries
Use these libraries to find Data Summarization models and implementationsMost implemented papers
Semi-supervised Batch Active Learning via Bilevel Optimization
Active learning is an effective technique for reducing the labeling cost by improving data efficiency.
Very Fast Streaming Submodular Function Maximization
Data summarization has become a valuable tool in understanding even terabytes of data.
Synthetic Dataset Generation of Driver Telematics
This article describes techniques employed in the production of a synthetic dataset of driver telematics emulated from a similar real insurance dataset.
Submodlib: A Submodular Optimization Library
A recent work has also leveraged submodular functions to propose submodular information measures which have been found to be very useful in solving the problems of guided subset selection and guided summarization.
Group Equality in Adaptive Submodular Maximization
In this paper, we study the classic submodular maximization problem subject to a group equality constraint under both non-adaptive and adaptive settings.
Towards Neural Numeric-To-Text Generation From Temporal Personal Health Data
We examine recurrent, convolutional, and Transformer-based encoder-decoder models to automatically generate natural language summaries from numeric temporal personal health data.
Streaming Algorithms for Diversity Maximization with Fairness Constraints
Given a set $X$ of $n$ elements, it asks to select a subset $S$ of $k \ll n$ elements with maximum \emph{diversity}, as quantified by the dissimilarities among the elements in $S$.
Balancing Utility and Fairness in Submodular Maximization (Technical Report)
Submodular function maximization is a fundamental combinatorial optimization problem with plenty of applications -- including data summarization, influence maximization, and recommendation.
Black-box Coreset Variational Inference
Recent advances in coreset methods have shown that a selection of representative datapoints can replace massive volumes of data for Bayesian inference, preserving the relevant statistical information and significantly accelerating subsequent downstream tasks.
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
Visual language data such as plots, charts, and infographics are ubiquitous in the human world.