A crucial part of predictive uncertainty quantification is the estimation of epistemic uncertainty, which is defined as an integral of the product between a divergence function and the posterior.
However, fine-tuning a pre-trained model often suffers from catastrophic forgetting, that is, the performance on the pre-training tasks deteriorates when fine-tuning on new tasks.
Then we feed these tokens to a pretrained language model that serves the agent as memory and provides it with a coherent and interpretable representation of the past.
1 code implementation • 2 May 2023 • Marius-Constantin Dinu, Markus Holzleitner, Maximilian Beck, Hoan Duc Nguyen, Andrea Huber, Hamid Eghbal-zadeh, Bernhard A. Moser, Sergei Pereverzyev, Sepp Hochreiter, Werner Zellinger
Our method outperforms deep embedded validation (DEV) and importance weighted validation (IWV) on all datasets, setting a new state-of-the-art performance for solving parameter choice issues in unsupervised domain adaptation with theoretical error guarantees.
Our novel concept for molecule representation enrichment is to associate molecules from both the support set and the query set with a large set of reference (context) molecules through a Modern Hopfield Network.
In this work, we study how to combine the efficiency and scalability of MIM with the ability of ID to perform downstream classification in the absence of large amounts of labeled data.
Ranked #1 on Image Clustering on ImageNet
To quantify uncertainty, conformal prediction methods are gaining continuously more interest and have already been successfully applied to various domains.
1 code implementation • 14 Mar 2023 • Moritz Neun, Christian Eichenberger, Henry Martin, Markus Spanring, Rahul Siripurapu, Daniel Springer, Leyan Deng, Chenwang Wu, Defu Lian, Min Zhou, Martin Lumiste, Andrei Ilie, Xinhua Wu, Cheng Lyu, Qing-Long Lu, Vishal Mahajan, Yichao Lu, Jiezhang Li, Junjun Li, Yue-Jiao Gong, Florian Grötschla, Joël Mathys, Ye Wei, He Haitao, Hui Fang, Kevin Malm, Fei Tang, Michael Kopp, David Kreil, Sepp Hochreiter
We only provide vehicle count data from spatially sparse stationary vehicle detectors in these three cities as model input for this task.
Activity and property prediction models are the central workhorses in drug discovery and materials sciences, but currently they have to be trained or fine-tuned for new tasks.
Graph neural networks (GNNs) have evolved into one of the most popular deep learning architectures.
To better evaluate the realism and semantic consistency of the generated images, we further conduct zero-shot classification on real remote sensing data using the classification model trained on synthesized images.
Therefore, exploration strategies and learning methods are required that are capable of tracking the steady domain shifts, and adapting to them.
1 code implementation • 7 Jun 2022 • Martin Gauch, Maximilian Beck, Thomas Adler, Dmytro Kotsur, Stefan Fiel, Hamid Eghbal-zadeh, Johannes Brandstetter, Johannes Kofler, Markus Holzleitner, Werner Zellinger, Daniel Klotz, Sepp Hochreiter, Sebastian Lehner
We introduce SubGD, a novel few-shot learning method which is based on the recent finding that stochastic gradient descent updates tend to live in a low-dimensional parameter subspace.
Residual mappings have been shown to perform representation learning in the first layers and iterative feature refinement in higher layers.
In experiments on small-sized tabular datasets with less than 1, 000 samples, Hopular surpasses Gradient Boosting, Random Forests, SVMs, and in particular several Deep Learning methods.
Ranked #1 on General Classification on Shrutime
We propose to utilize a frozen Pretrained Language Transformer (PLT) for history representation and compression to improve sample efficiency.
1 code implementation • 31 Mar 2022 • Christian Eichenberger, Moritz Neun, Henry Martin, Pedro Herruzo, Markus Spanring, Yichao Lu, Sungbin Choi, Vsevolod Konyakhin, Nina Lukashina, Aleksei Shpilman, Nina Wiedemann, Martin Raubal, Bo wang, Hai L. Vu, Reza Mohajerpoor, Chen Cai, Inhi Kim, Luca Hermes, Andrew Melnik, Riza Velioglu, Markus Vieth, Malte Schilling, Alabi Bojesomo, Hasan Al Marzouqi, Panos Liatsis, Jay Santokhi, Dylan Hillier, Yiming Yang, Joned Sarwar, Anna Jordan, Emil Hewage, David Jonietz, Fei Tang, Aleksandra Gruca, Michael Kopp, David Kreil, Sepp Hochreiter
The IARAI Traffic4cast competitions at NeurIPS 2019 and 2020 showed that neural networks can successfully predict future traffic conditions 1 hour into the future on simply aggregated GPS probe data in time and space bins.
Finding synthesis routes for molecules of interest is essential in the discovery of new drugs and materials.
Ranked #11 on Single-step retrosynthesis on USPTO-50k
The dataset characteristics are determined by the behavioral policy that samples this dataset.
1 code implementation • 21 Oct 2021 • Andreas Fürst, Elisabeth Rumetshofer, Johannes Lehner, Viet Tran, Fei Tang, Hubert Ramsauer, David Kreil, Michael Kopp, Günter Klambauer, Angela Bitto-Nemling, Sepp Hochreiter
We suggest to use modern Hopfield networks to tackle the problem of explaining away.
However, it is notoriously difficult to integrate them into machine learning approaches due to their heterogeneity with respect to size and orientation.
Recently, the application of machine learning models has gained momentum in natural sciences and engineering, which is a natural fit due to the abundance of data in these fields.
Finding synthesis routes for molecules of interest is an essential step in the discovery of new drugs and materials.
Artificial Intelligence is one of the fastest growing technologies of the 21st century and accompanies us in our daily lives when interacting with technical applications.
MC-LSTMs set a new state-of-the-art for neural arithmetic units at learning arithmetic operations, such as addition tasks, which have a strong conservation law, as the sum is constant over time.
Deep Learning is becoming an increasingly important way to produce accurate hydrological predictions across a wide range of spatial and temporal scales.
We prove under commonly used assumptions the convergence of actor-critic reinforcement learning algorithms, which simultaneously learn a policy function, the actor, and a value function, the critic.
Compared to naive prediction with a distinct LSTM per timescale, the multi-timescale architectures are computationally more efficient with no loss in accuracy.
On the few-shot datasets miniImagenet and tieredImagenet with small domain shifts, CHEF is competitive with state-of-the-art methods.
For such complex tasks, the recently proposed RUDDER uses reward redistribution to leverage steps in the Q-function that are associated with accomplishing sub-tasks.
1 code implementation • • Michael Widrich, Bernhard Schäfl, Hubert Ramsauer, Milena Pavlović, Lukas Gruber, Markus Holzleitner, Johannes Brandstetter, Geir Kjetil Sandve, Victor Greiff, Sepp Hochreiter, Günter Klambauer
We show that the attention mechanism of transformer architectures is actually the update rule of modern Hopfield networks that can store exponentially many patterns.
2 code implementations • • Hubert Ramsauer, Bernhard Schäfl, Johannes Lehner, Philipp Seidl, Michael Widrich, Thomas Adler, Lukas Gruber, Markus Holzleitner, Milena Pavlović, Geir Kjetil Sandve, Victor Greiff, David Kreil, Michael Kopp, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter
The new update rule is equivalent to the attention mechanism used in transformers.
1 code implementation • 25 Mar 2020 • Markus Hofmarcher, Andreas Mayr, Elisabeth Rumetshofer, Peter Ruch, Philipp Renz, Johannes Schimunek, Philipp Seidl, Andreu Vall, Michael Widrich, Sepp Hochreiter, Günter Klambauer
Due to the current severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, there is an urgent need for novel therapies and drugs.
The aim of this study is to evaluate whether it is possible to detect basal cell carcinomas in histological sections using attention-based deep learning models and to overcome the ultra-high resolution and the weak labels of whole slide images.
Climate change affects occurrences of floods and droughts worldwide.
In this work, we show that machine learning models can provide significant improvement over random search.
We introduce Patch Refinement a two-stage model for accurate 3D object detection and localization from point cloud data.
Ranked #1 on Object Detection on KITTI Cars Hard
While neural networks have acted as a strong unifying force in the design of modern AI systems, the neural network architectures themselves remain highly heterogeneous due to the variety of tasks to be solved.
We propose a GAN based approach to solve inverse problems which have non-differential or non-continuous forward relations.
The problem currently is that traditional hydrological models degrade significantly in performance when calibrated for multiple basins together instead of for a single basin alone.
We present the largest comparison of CNN architectures including GapNet-PL for protein localization in HTI images of human cells.
Surprisingly, we could predict 29% of the 209 pharmacological assays at high predictive performance (AUC > 0. 9).
LSTMs are particularly well-suited for this problem since memory cells can represent dynamic reservoirs and storages, which are essential components in state-space modelling approaches of the hydrological system.
Without any means of interpretation, neural networks that predict molecular properties and bioactivities are merely black boxes.
In MDPs the Q-values are equal to the expected immediate reward plus the expected future rewards.
Ranked #9 on Atari Games on Atari 2600 Bowling
We propose a novel distance measure between two sets of molecules, called Fr\'echet ChemNet distance (FCD), that can be used as an evaluation metric for generative models.
To formally describe an optimal update direction, we introduce a theoretical framework which allows the derivation of requirements on both the divergence and corresponding method for determining an update direction, with these requirements guaranteeing unbiased mini-batch updates in the direction of steepest descent.
Ranked #2 on Image Generation on LSUN Bedroom 64 x 64
While drug combination therapies are a well-established concept in cancer treatment, identifying novel synergistic combinations is challenging due to the size of combinatorial space.
We prove that Coulomb GANs possess only one Nash equilibrium which is optimal in the sense that the model distribution equals the target distribution.
Generative Adversarial Networks (GANs) excel at creating realistic images with complex models for which maximum likelihood is infeasible.
Ranked #1 on Image Generation on LSUN Bedroom 64 x 64
We introduce self-normalizing neural networks (SNNs) to enable high-level abstract representations.
Ranked #7 on Drug Discovery on Tox21
1 code implementation • 1 Jun 2016 • Michael Treml, Jose A. Arjona-Medina, Thomas Unterthiner, Rupesh Durgesh, Felix Friedmann, Peter Schuberth, Andreas Mayr, Martin Heusel, Markus Hofmarcher, Michael Widrich, Bernhard Nessler, Sepp Hochreiter
We propose a novel deep network architecture for image segmentation that keeps the high accuracy while being efficient enough for embedded devices.
In contrast to ReLUs, ELUs have negative values which allows them to push mean unit activations closer to zero like batch normalization but with lower computational complexity.
Ranked #142 on Image Classification on CIFAR-100 (using extra training data)