Search Results for author: Michal Klein

Found 6 papers, 1 papers with code

Careful with that Scalpel: Improving Gradient Surgery with an EMA

no code implementations5 Feb 2024 Yu-Guan Hsieh, James Thornton, Eugene Ndiaye, Michal Klein, Marco Cuturi, Pierre Ablin

Beyond minimizing a single training loss, many deep learning estimation pipelines rely on an auxiliary objective to quantify and encourage desirable properties of the model (e. g. performance on another dataset, robustness, agreement with a prior).

Scalable Pre-training of Large Autoregressive Image Models

2 code implementations16 Jan 2024 Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar, Joshua M Susskind, Armand Joulin

Specifically, we highlight two key findings: (1) the performance of the visual features scale with both the model capacity and the quantity of data, (2) the value of the objective function correlates with the performance of the model on downstream tasks.

Ranked #333 on Image Classification on ImageNet (using extra training data)

Image Classification

Predicting Ordinary Differential Equations with Transformers

no code implementations24 Jul 2023 Sören Becker, Michal Klein, Alexander Neitz, Giambattista Parascandolo, Niki Kilbertus

We develop a transformer-based sequence-to-sequence model that recovers scalar ordinary differential equations (ODEs) in symbolic form from irregularly sampled and noisy observations of a single solution trajectory.

Learning Costs for Structured Monge Displacements

no code implementations20 Jun 2023 Michal Klein, Aram-Alexandre Pooladian, Pierre Ablin, Eugène Ndiaye, Jonathan Niles-Weed, Marco Cuturi

Because of such difficulties, existing approaches rarely depart from the default choice of estimating such maps with the simple squared-Euclidean distance as the ground cost, $c(x, y)=\|x-y\|^2_2$.

Monge, Bregman and Occam: Interpretable Optimal Transport in High-Dimensions with Feature-Sparse Maps

no code implementations8 Feb 2023 Marco Cuturi, Michal Klein, Pierre Ablin

Optimal transport (OT) theory focuses, among all maps $T:\mathbb{R}^d\rightarrow \mathbb{R}^d$ that can morph a probability measure onto another, on those that are the ``thriftiest'', i. e. such that the averaged cost $c(x, T(x))$ between $x$ and its image $T(x)$ be as small as possible.

Dimensionality Reduction MORPH

Discovering ordinary differential equations that govern time-series

no code implementations5 Nov 2022 Sören Becker, Michal Klein, Alexander Neitz, Giambattista Parascandolo, Niki Kilbertus

Natural laws are often described through differential equations yet finding a differential equation that describes the governing law underlying observed data is a challenging and still mostly manual task.

Time Series Time Series Analysis

Cannot find the paper you are looking for? You can Submit a new open access paper.