Inspired by recent research that recommends starting neural networks training with large learning rates (LRs) to achieve the best generalization, we explore this hypothesis in detail.
In this work, we investigate the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR.
Channel decoding, channel detection, channel assessment, and resource management for wireless multiple-input multiple-output (MIMO) systems are all examples of problems where machine learning (ML) can be successfully applied.
Memorization studies of deep neural networks (DNNs) help to understand what patterns and how do DNNs learn, and motivate improvements to DNN training approaches.
Training neural networks with batch normalization and weight decay has become a common practice in recent years.
In this work, we consider a fixed memory budget setting, and investigate, what is more effective: to train a single wide network, or to perform a memory split -- to train an ensemble of several thinner networks, with the same total number of parameters?
Recently, a lot of techniques were developed to sparsify the weights of neural networks and to remove networks' structure units, e. g. neurons.
Bayesian methods have been successfully applied to sparsify weights of neural networks and to remove structure units from the networks, e. g. neurons.
In natural language processing, a lot of the tasks are successfully solved with recurrent neural networks, but such models have a huge number of parameters.
Recurrent neural networks show state-of-the-art results in many text analysis tasks but often require a lot of memory to store their weights.