Channel decoding, channel detection, channel assessment, and resource management for wireless multiple-input multiple-output (MIMO) systems are all examples of problems where machine learning (ML) can be successfully applied.
The construction of precoding matrices and their distribution for the SE objective function using VAE and CVAE methods is described in the literature for the first time.
Structured latent variables allow incorporating meaningful prior knowledge into deep learning models.
Bias correction techniques are used by most of the high-performing methods for off-policy reinforcement learning.
While quantization is well established for discriminative models, the performance of modern quantization techniques in application to GANs remains unclear.
Training neural networks with batch normalization and weight decay has become a common practice in recent years.
Averaging predictions over a set of models -- an ensemble -- is widely used to improve predictive performance and uncertainty estimation of deep learning models.
Based on this exploration, we present a new algorithm, Credit-Constrained Advantage Actor-Critic (C2A2C), which ignores policy updates for actions which don't affect future outcomes based on credit in hindsight, while updating the policy as normal for those that do.
In this work, we consider a fixed memory budget setting, and investigate, what is more effective: to train a single wide network, or to perform a memory split -- to train an ensemble of several thinner networks, with the same total number of parameters?
The overestimation bias is one of the major impediments to accurate off-policy learning.
Stochastic regularization of neural networks (e. g. dropout) is a wide-spread technique in deep learning that allows for better generalization.
Test-time data augmentation$-$averaging the predictions of a machine learning model across multiple augmented samples of data$-$is a widely used technique that improves the predictive performance.
This approach is an order of magnitude faster than state-of-the-art methods for spectral visualization, and can be generically used to investigate the spectral properties of matrices in deep learning.
Learning models with discrete latent variables using stochastic gradient descent remains a challenge due to the high variance of gradient estimates.
Recently, a lot of techniques were developed to sparsify the weights of neural networks and to remove networks' structure units, e. g. neurons.
Previous works show that the richer family of prior distributions may help to avoid the mode collapse problem in GANs and to improve the evidence lower bound in VAEs.
Reduction of the number of parameters is one of the most important goals in Deep Learning.
Bayesian inference was once a gold standard for learning with neural networks, providing accurate full predictive distributions and well calibrated uncertainty.
For any implicit probabilistic model and a target distribution represented by a set of samples, implicit Metropolis-Hastings operates by learning a discriminator to estimate the density-ratio and then generating a chain of samples.
Variational Inference is a powerful tool in the Bayesian modeling toolkit, however, its effectiveness is determined by the expressivity of the utilized variational distributions in terms of their ability to match the true posterior distribution.
This paper proposes a semi-conditional normalizing flow model for semi-supervised learning.
We propose a novel multi-texture synthesis model based on generative adversarial networks (GANs) with a user-controllable mechanism.
We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning.
Bayesian methods have been successfully applied to sparsify weights of neural networks and to remove structure units from the networks, e. g. neurons.
Neural Network is a powerful Machine Learning tool that shows outstanding performance in Computer Vision, Natural Language Processing, and Artificial Intelligence.
In natural language processing, a lot of the tasks are successfully solved with recurrent neural networks, but such models have a huge number of parameters.
From this point of view, the problem of constructing a sampler can be reduced to the question - how to choose a proposal for the MH algorithm?
Bayesian inference is known to provide a general framework for incorporating prior knowledge or specific properties into machine learning models via carefully choosing a prior distribution.
We experimentally demonstrate that our model generates samples and reconstructions of quality competitive with state-of-the-art on datasets MNIST, CIFAR10, CelebA and achieves good quantitative results on CIFAR10.
Unlike discriminator-based and kernel-based approaches to implicit variational inference, DSIVI optimizes a proper lower bound on ELBO that is asymptotically exact.
We explore recently introduced definition modeling technique that provided the tool for evaluation of different distributed vector representations of words through modeling dictionary definitions of words.
We propose a single neural probabilistic model based on variational autoencoder that can be conditioned on an arbitrary subset of observed features and then sample the remaining features in "one shot".
Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence.
Ranked #65 on Image Classification on CIFAR-100
Ordinary stochastic neural networks mostly rely on the expected values of their weights to make predictions, whereas the induced noise is mostly used to capture the uncertainty, prevent overfitting and slightly boost the performance through test-time averaging.
The loss functions of deep neural networks are complex and their geometric properties are not well understood.
Recurrent neural networks show state-of-the-art results in many text analysis tasks but often require a lot of memory to store their weights.
In the paper, we propose a new Bayesian model that takes into account the computational structure of neural networks and provides structured sparsity, e. g. removes neurons and/or convolutional channels in CNNs.
This paper proposes a deep learning architecture based on Residual Network that dynamically adjusts the number of executed layers for the regions of the image.
Convolutional neural networks excel in image recognition tasks, but this comes at the cost of high computational and memory complexity.
We describe GTApprox - a new tool for medium-scale surrogate modeling in industrial design.
We propose a novel approach to reduce the computational cost of evaluation of convolutional neural networks, a factor that has hindered their deployment in low-power devices such as mobile phones.
Recently proposed Skip-gram model is a powerful method for learning high-dimensional word representations that capture rich semantic relationships between words.
In this paper we address the problem of finding the most probable state of a discrete Markov random field (MRF), also known as the MRF energy minimization problem.
Structured-output learning is a challenging problem; particularly so because of the difficulty in obtaining large datasets of fully labelled instances for training.