Search Results for author: Thomas Hofmann

Found 81 papers, 30 papers with code

Towards guarantees for parameter isolation in continual learning

no code implementations2 Oct 2023 Giulia Lanzillotta, Sidak Pal Singh, Benjamin F. Grewe, Thomas Hofmann

Deep learning has proved to be a successful paradigm for solving many challenges in machine learning.

The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit

no code implementations30 Jun 2023 Lorenzo Noci, Chuning Li, Mufan Bill Li, Bobby He, Thomas Hofmann, Chris Maddison, Daniel M. Roy

Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with skip connections in the proportional limit of infinite-depth-and-width.

Deep Attention Learning Theory

Scaling MLPs: A Tale of Inductive Bias

1 code implementation23 Jun 2023 Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann

We show that the performance of MLPs drastically improves with scale (95% on CIFAR10, 82% on CIFAR100, 58% on ImageNet ReaL), highlighting that lack of inductive bias can indeed be compensated.

Inductive Bias Learning Theory

Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes

no code implementations4 Jun 2023 Alexandros Delitzas, Maria Parelli, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann

Training models to apply common-sense linguistic knowledge and visual concepts from 2D images to 3D scene understanding is a promising direction that researchers have only recently started to explore.

Common Sense Reasoning Question Answering +2

The Hessian perspective into the Nature of Convolutional Neural Networks

no code implementations16 May 2023 Sidak Pal Singh, Thomas Hofmann, Bernhard Schölkopf

While Convolutional Neural Networks (CNNs) have long been investigated and applied, as well as theorized, we aim to provide a slightly different perspective into their nature -- through the perspective of their Hessian maps.

CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes

1 code implementation12 Apr 2023 Maria Parelli, Alexandros Delitzas, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann

Training models to apply linguistic knowledge and visual concepts from 2D images to 3D world understanding is a promising direction that researchers have only recently started to explore.

Question Answering Visual Question Answering

Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning

1 code implementation CVPR 2023 Sanghwan Kim, Lorenzo Noci, Antonio Orvieto, Thomas Hofmann

In contrast to the natural capabilities of humans to learn new tasks in a sequential fashion, neural networks are known to suffer from catastrophic forgetting, where the model's performances on old tasks drop dramatically after being optimized for a new task.

Continual Learning

Random Teachers are Good Teachers

1 code implementation23 Feb 2023 Felix Sarnthein, Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann

In this work, we investigate the implicit regularization induced by teacher-student learning dynamics in self-distillation.

Data Augmentation Self-Supervised Learning

Cosmology from Galaxy Redshift Surveys with PointNet

no code implementations22 Nov 2022 Sotiris Anagnostidis, Arne Thomsen, Tomasz Kacprzak, Tilman Tröster, Luca Biggio, Alexandre Refregier, Thomas Hofmann

In this work, we aim to improve upon two-point statistics by employing a \textit{PointNet}-like neural network to regress the values of the cosmological parameters directly from point cloud data.

The Curious Case of Benign Memorization

no code implementations25 Oct 2022 Sotiris Anagnostidis, Gregor Bachmann, Lorenzo Noci, Thomas Hofmann

While such a memorization capacity seems worrisome, in this work we show that under training protocols that include \textit{data augmentation}, neural networks learn to memorize entirely random labels in a benign way, i. e. they learn embeddings that lead to highly non-trivial performance under nearest neighbour probing.

Data Augmentation Memorization

Mastering Spatial Graph Prediction of Road Networks

no code implementations ICCV 2023 Sotiris Anagnostidis, Aurelien Lucchi, Thomas Hofmann

Accurately predicting road networks from satellite images requires a global understanding of the network topology.

Reinforcement Learning (RL)

OpenFilter: A Framework to Democratize Research Access to Social Media AR Filters

1 code implementation19 Jul 2022 Piera Riccio, Bill Psomas, Francesco Galati, Francisco Escolano, Thomas Hofmann, Nuria Oliver

Augmented Reality or AR filters on selfies have become very popular on social media platforms for a variety of applications, including marketing, entertainment and aesthetics.


How Tempering Fixes Data Augmentation in Bayesian Neural Networks

no code implementations27 May 2022 Gregor Bachmann, Lorenzo Noci, Thomas Hofmann

While data augmentation has been empirically recognized as one of the main drivers of this effect, a theoretical account of its role, on the other hand, is largely missing.

Data Augmentation

Phenomenology of Double Descent in Finite-Width Neural Networks

no code implementations ICLR 2022 Sidak Pal Singh, Aurelien Lucchi, Thomas Hofmann, Bernhard Schölkopf

`Double descent' delineates the generalization behaviour of models depending on the regime they belong to: under- or over-parameterized.

Generalization Through The Lens Of Leave-One-Out Error

1 code implementation ICLR 2022 Gregor Bachmann, Thomas Hofmann, Aurélien Lucchi

Despite the tremendous empirical success of deep learning models to solve various learning tasks, our theoretical understanding of their generalization ability is very limited.

Generalization Bounds Transfer Learning

On the effectiveness of Randomized Signatures as Reservoir for Learning Rough Dynamics

no code implementations2 Jan 2022 Enea Monzio Compagnoni, Anna Scampicchio, Luca Biggio, Antonio Orvieto, Thomas Hofmann, Josef Teichmann

Many finance, physics, and engineering phenomena are modeled by continuous-time dynamical systems driven by highly irregular (stochastic) inputs.

LEMMA Time Series +1

How to Query Language Models?

1 code implementation4 Aug 2021 Leonard Adolphs, Shehzaad Dhuliawala, Thomas Hofmann

We apply this approach of querying by example to the LAMA probe and obtain substantial improvements of up to 37. 8% for BERT-large on the T-REx data when providing only 10 demonstrations--even outperforming a baseline that queries the model with up to 40 paraphrases of the question.

Analytic Insights into Structure and Rank of Neural Network Hessian Maps

no code implementations NeurIPS 2021 Sidak Pal Singh, Gregor Bachmann, Thomas Hofmann

Moreover, we demonstrate that our bounds remain faithful as an estimate of the numerical Hessian rank, for a larger class of models such as rectified and hyperbolic tangent networks.

Precise characterization of the prior predictive distribution of deep ReLU networks

no code implementations NeurIPS 2021 Lorenzo Noci, Gregor Bachmann, Kevin Roth, Sebastian Nowozin, Thomas Hofmann

Recent works on Bayesian neural networks (BNNs) have highlighted the need to better understand the implications of using Gaussian priors in combination with the compositional structure of the network architecture.

Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect

no code implementations NeurIPS 2021 Lorenzo Noci, Kevin Roth, Gregor Bachmann, Sebastian Nowozin, Thomas Hofmann

The dataset curation hypothesis of Aitchison (2020): we show empirically that the CPE does not arise in a real curated data set but can be produced in a controlled experiment with varying curation strength.

Data Augmentation

Vanishing Curvature and the Power of Adaptive Methods in Randomly Initialized Deep Networks

no code implementations7 Jun 2021 Antonio Orvieto, Jonas Kohler, Dario Pavllo, Thomas Hofmann, Aurelien Lucchi

This paper revisits the so-called vanishing gradient phenomenon, which commonly occurs in deep randomly initialized neural networks.

Uniform Convergence, Adversarial Spheres and a Simple Remedy

no code implementations7 May 2021 Gregor Bachmann, Seyed-Mohsen Moosavi-Dezfooli, Thomas Hofmann

By considering a specific dataset, it was observed that a neural network completely misclassifies a projection of the training data (adversarial set), rendering any existing generalization bound based on uniform convergence vacuous.

Learning Generative Models of Textured 3D Meshes from Real-World Images

1 code implementation ICCV 2021 Dario Pavllo, Jonas Kohler, Thomas Hofmann, Aurelien Lucchi

Recent advances in differentiable rendering have sparked an interest in learning generative models of textured 3D meshes from image collections.

Pose Estimation

Generative Minimization Networks: Training GANs Without Competition

no code implementations23 Mar 2021 Paulina Grnarova, Yannic Kilcher, Kfir Y. Levy, Aurelien Lucchi, Thomas Hofmann

Among known problems experienced by practitioners is the lack of convergence guarantees or convergence to a non-optimum cycle.

SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German

no code implementations21 Mar 2021 Pelin Dogan-Schönberger, Julian Mäder, Thomas Hofmann

Swiss German is a dialect continuum whose natively acquired dialects significantly differ from the formal variety of the language.

Speech Synthesis

Revisiting the Role of Euler Numerical Integration on Acceleration and Stability in Convex Optimization

no code implementations23 Feb 2021 Peiyuan Zhang, Antonio Orvieto, Hadi Daneshmand, Thomas Hofmann, Roy Smith

Viewing optimization methods as numerical integrators for ordinary differential equations (ODEs) provides a thought-provoking modern framework for studying accelerated first-order optimizers.

Numerical Integration

Batch normalization provably avoids ranks collapse for randomly initialised deep networks

no code implementations NeurIPS 2020 Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi

Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used.

Convolutional Generation of Textured 3D Meshes

1 code implementation NeurIPS 2020 Dario Pavllo, Graham Spinks, Thomas Hofmann, Marie-Francine Moens, Aurelien Lucchi

A key contribution of our work is the encoding of the mesh and texture as 2D representations, which are semantically aligned and can be easily modeled by a 2D convolutional GAN.

BERT as a Teacher: Contextual Embeddings for Sequence-Level Reward

no code implementations5 Mar 2020 Florian Schmidt, Thomas Hofmann

Measuring the quality of a generated sequence against a set of references is a central problem in many learning frameworks, be it to compute a score, to assign a reward, or to perform discrimination.

Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks

no code implementations3 Mar 2020 Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi

Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used.

Controlling Style and Semantics in Weakly-Supervised Image Generation

1 code implementation ECCV 2020 Dario Pavllo, Aurelien Lucchi, Thomas Hofmann

We propose a weakly-supervised approach for conditional image generation of complex scenes where a user has fine control over objects appearing in the scene.

Conditional Image Generation

Mixing of Stochastic Accelerated Gradient Descent

no code implementations31 Oct 2019 Peiyuan Zhang, Hadi Daneshmand, Thomas Hofmann

We study the mixing properties for stochastic accelerated gradient descent (SAGD) on least-squares regression.

Stochastic Optimization

Adversarial Training Generalizes Data-dependent Spectral Norm Regularization

no code implementations25 Sep 2019 Kevin Roth, Yannic Kilcher, Thomas Hofmann

We establish a theoretical link between adversarial training and operator norm regularization for deep neural networks.

LeDeepChef: Deep Reinforcement Learning Agent for Families of Text-Based Games

no code implementations4 Sep 2019 Leonard Adolphs, Thomas Hofmann

We, however, consider the task of designing an agent that not just succeeds in a single game, but performs well across a whole family of games, sharing the same theme.

Atari Games Hierarchical Reinforcement Learning +3

Autoregressive Text Generation Beyond Feedback Loops

1 code implementation IJCNLP 2019 Florian Schmidt, Stephan Mandt, Thomas Hofmann

Autoregressive state transitions, where predictions are conditioned on past predictions, are the predominant choice for both deterministic and stochastic sequential models.

Text Generation

Cosmological N-body simulations: a challenge for scalable generative models

1 code implementation15 Aug 2019 Nathanaël Perraudin, Ankit Srivastava, Aurelien Lucchi, Tomasz Kacprzak, Thomas Hofmann, Alexandre Réfrégier

Our results show that the proposed model produces samples of high visual quality, although the statistical analysis reveals that capturing rare features in the data poses significant problems for the generative models.

Cosmological constraints with deep learning from KiDS-450 weak lensing maps

no code implementations7 Jun 2019 Janis Fluri, Tomasz Kacprzak, Aurelien Lucchi, Alexandre Refregier, Adam Amara, Thomas Hofmann, Aurel Schneider

We present the cosmological results with a CNN from the KiDS-450 tomographic weak lensing dataset, constraining the total matter density $\Omega_m$, the fluctuation amplitude $\sigma_8$, and the intrinsic alignment amplitude $A_{\rm{IA}}$.

Cosmology and Nongalactic Astrophysics

Adversarial Training is a Form of Data-dependent Operator Norm Regularization

no code implementations NeurIPS 2020 Kevin Roth, Yannic Kilcher, Thomas Hofmann

We establish a theoretical link between adversarial training and operator norm regularization for deep neural networks.

Evaluating GANs via Duality

no code implementations ICLR 2019 Paulina Grnarova, Kfir. Y. Levy, Aurelien Lucchi, Nathanael Perraudin, Thomas Hofmann, Andreas Krause

Generative Adversarial Networks (GANs) have shown great results in accurately modeling complex distributions, but their training is known to be difficult due to instabilities caused by a challenging minimax optimization problem.

The Odds are Odd: A Statistical Test for Detecting Adversarial Examples

1 code implementation13 Feb 2019 Kevin Roth, Yannic Kilcher, Thomas Hofmann

We investigate conditions under which test statistics exist that can reliably detect examples, which have been adversarially manipulated in a white-box attack.

A domain agnostic measure for monitoring and evaluating GANs

1 code implementation NeurIPS 2019 Paulina Grnarova, Kfir. Y. Levy, Aurelien Lucchi, Nathanael Perraudin, Ian Goodfellow, Thomas Hofmann, Andreas Krause

Evaluations are essential for: (i) relative assessment of different models and (ii) monitoring the progress of a single model throughout training.

Learning and Evaluating Sparse Interpretable Sentence Embeddings

no code implementations WS 2018 Valentin Trifonov, Octavian-Eugen Ganea, Anna Potapenko, Thomas Hofmann

Previous research on word embeddings has shown that sparse representations, which can be either learned on top of existing dense embeddings or obtained through model constraints during training time, have the benefit of increased interpretability properties: to some degree, each dimension can be understood by a human and associated with a recognizable feature in the data.

Sentence Embedding Sentence-Embedding +1

Cosmological constraints from noisy convergence maps through deep learning

no code implementations23 Jul 2018 Janis Fluri, Tomasz Kacprzak, Aurelien Lucchi, Alexandre Refregier, Adam Amara, Thomas Hofmann

We find that, for a shape noise level corresponding to 8. 53 galaxies/arcmin$^2$ and the smoothing scale of $\sigma_s = 2. 34$ arcmin, the network is able to generate 45% tighter constraints.

Cosmology and Nongalactic Astrophysics

A Distributed Second-Order Algorithm You Can Trust

no code implementations ICML 2018 Celestine Dünner, Aurelien Lucchi, Matilde Gargiani, An Bian, Thomas Hofmann, Martin Jaggi

Due to the rapid growth of data and computational resources, distributed optimization has become an active research area in recent years.

Distributed Optimization Second-order methods

Deep State Space Models for Unconditional Word Generation

no code implementations NeurIPS 2018 Florian Schmidt, Thomas Hofmann

Autoregressive feedback is considered a necessity for successful unconditional text generation using stochastic sequence models.

Text Generation Variational Inference

Zero-Shot Dual Machine Translation

1 code implementation25 May 2018 Lierni Sestorain, Massimiliano Ciaramita, Christian Buck, Thomas Hofmann

Our method can obtain improvements also on the setting where a small amount of parallel data for the zero-shot language pair is available.

Machine Translation NMT +1

Hyperbolic Neural Networks

3 code implementations NeurIPS 2018 Octavian-Eugen Ganea, Gary Bécigneul, Thomas Hofmann

However, the representational power of hyperbolic geometry is not yet on par with Euclidean geometry, mostly because of the absence of corresponding hyperbolic neural network layers.

Graph Representation Learning Natural Language Inference +1

Adversarially Robust Training through Structured Gradient Regularization

no code implementations22 May 2018 Kevin Roth, Aurelien Lucchi, Sebastian Nowozin, Thomas Hofmann

We propose a novel data-dependent structured gradient regularizer to increase the robustness of neural networks vis-a-vis adversarial perturbations.

Local Saddle Point Optimization: A Curvature Exploitation Approach

1 code implementation15 May 2018 Leonard Adolphs, Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann

Gradient-based optimization methods are the most popular choice for finding local optima for classical minimization and saddle point problems.

Hyperbolic Entailment Cones for Learning Hierarchical Embeddings

3 code implementations ICML 2018 Octavian-Eugen Ganea, Gary Bécigneul, Thomas Hofmann

Learning graph representations via low-dimensional embeddings that preserve relevant network properties is an important class of problems in machine learning.

Graph Embedding Hypernym Discovery +2

Escaping Saddles with Stochastic Gradients

no code implementations ICML 2018 Hadi Daneshmand, Jonas Kohler, Aurelien Lucchi, Thomas Hofmann

We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients exhibit a strong component along these directions.

Fast cosmic web simulations with generative adversarial networks

no code implementations27 Jan 2018 Andres C. Rodriguez, Tomasz Kacprzak, Aurelien Lucchi, Adam Amara, Raphael Sgier, Janis Fluri, Thomas Hofmann, Alexandre Réfrégier

Computational models of the underlying physical processes, such as classical N-body simulations, are extremely resource intensive, as they track the action of gravity in an expanding universe using billions of particles as tracers of the cosmic matter distribution.

The best defense is a good offense: Countering black box attacks by predicting slightly wrong labels

no code implementations15 Nov 2017 Yannic Kilcher, Thomas Hofmann

Black-Box attacks on machine learning models occur when an attacker, despite having no access to the inner workings of a model, can successfully craft an attack by means of model theft.

Parametrizing filters of a CNN with a GAN

no code implementations ICLR 2018 Yannic Kilcher, Gary Becigneul, Thomas Hofmann

It is commonly agreed that the use of relevant invariances as a good statistical bias is important in machine-learning.

Semantic Interpolation in Implicit Models

no code implementations ICLR 2018 Yannic Kilcher, Aurelien Lucchi, Thomas Hofmann

In implicit models, one often interpolates between sampled points in latent space.

Flexible Prior Distributions for Deep Generative Models

no code implementations ICLR 2018 Yannic Kilcher, Aurelien Lucchi, Thomas Hofmann

We consider the problem of training generative models with deep neural networks as generators, i. e. to map latent codes to data points.

Generator Reversal

no code implementations28 Jul 2017 Yannic Kilcher, Aurélien Lucchi, Thomas Hofmann

We consider the problem of training generative models with deep neural networks as generators, i. e. to map latent codes to data points.

Learning Aerial Image Segmentation from Online Maps

2 code implementations21 Jul 2017 Pascal Kaiser, Jan Dirk Wegner, Aurelien Lucchi, Martin Jaggi, Thomas Hofmann, Konrad Schindler

We adapt a state-of-the-art CNN architecture for semantic segmentation of buildings and roads in aerial images, and compare its performance when using different training data sets, ranging from manually labeled, pixel-accurate ground truth of the same city to automatic training data derived from OpenStreetMap data from distant locations.

General Classification Image Segmentation +1

Cosmological model discrimination with Deep Learning

no code implementations17 Jul 2017 Jorit Schmelzle, Aurelien Lucchi, Tomasz Kacprzak, Adam Amara, Raphael Sgier, Alexandre Réfrégier, Thomas Hofmann

We find that our implementation of DCNN outperforms the skewness and kurtosis statistics, especially for high noise levels.

Accelerated Dual Learning by Homotopic Initialization

no code implementations13 Jun 2017 Hadi Daneshmand, Hamed Hassani, Thomas Hofmann

Gradient descent and coordinate descent are well understood in terms of their asymptotic behavior, but less so in a transient regime often used for approximations in machine learning.

Stabilizing Training of Generative Adversarial Networks through Regularization

1 code implementation NeurIPS 2017 Kevin Roth, Aurelien Lucchi, Sebastian Nowozin, Thomas Hofmann

Deep generative models based on Generative Adversarial Networks (GANs) have demonstrated impressive sample quality but in order to work they require a careful choice of architecture, parameter initialization, and selection of hyper-parameters.

Image Generation

Deep Joint Entity Disambiguation with Local Neural Attention

3 code implementations EMNLP 2017 Octavian-Eugen Ganea, Thomas Hofmann

We propose a novel deep learning model for joint document-level entity disambiguation, which leverages learned neural representations.

Entity Disambiguation

A Semi-supervised Framework for Image Captioning

1 code implementation16 Nov 2016 Wenhu Chen, Aurelien Lucchi, Thomas Hofmann

We here propose a novel way of using such textual data by artificially generating missing visual information.

Image Captioning Word Embeddings

Fully Character-Level Neural Machine Translation without Explicit Segmentation

2 code implementations TACL 2017 Jason Lee, Kyunghyun Cho, Thomas Hofmann

We observe that on CS-EN, FI-EN and RU-EN, the quality of the multilingual character-level translation even surpasses the models specifically trained on that language pair alone, both in terms of BLEU score and human judgment.

Machine Translation NMT +1

DynaNewton - Accelerating Newton's Method for Machine Learning

no code implementations20 May 2016 Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann

Solutions on this path are tracked such that the minimizer of the previous objective is guaranteed to be within the quadratic convergence region of the next objective to be optimized.

BIG-bench Machine Learning

Starting Small -- Learning with Adaptive Sample Sizes

no code implementations9 Mar 2016 Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann

For many machine learning problems, data is abundant and it may be prohibitive to make multiple passes through the full training set.

BIG-bench Machine Learning

Probabilistic Bag-Of-Hyperlinks Model for Entity Linking

1 code implementation8 Sep 2015 Octavian-Eugen Ganea, Marina Ganea, Aurelien Lucchi, Carsten Eickhoff, Thomas Hofmann

We demonstrate the accuracy of our approach on a wide range of benchmark datasets, showing that it matches, and in many cases outperforms, existing state-of-the-art methods.

Entity Disambiguation Entity Linking +3

Variance Reduced Stochastic Gradient Descent with Neighbors

no code implementations NeurIPS 2015 Thomas Hofmann, Aurelien Lucchi, Simon Lacoste-Julien, Brian McWilliams

As a side-product we provide a unified convergence analysis for a family of variance reduction algorithms, which we call memorization algorithms.


A Variance Reduced Stochastic Newton Method

no code implementations28 Mar 2015 Aurelien Lucchi, Brian McWilliams, Thomas Hofmann

Quasi-Newton methods are widely used in practise for convex loss minimization problems.

Probabilistic Latent Semantic Analysis

3 code implementations23 Jan 2013 Thomas Hofmann

Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two-mode and co-occurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas.

Information Retrieval Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.