It is a meta-method in the sense that when clustering the individual subsets any seeding algorithm can be used.
This lets us derive a meaningful notion of the flatness of minimizers and of the geodesic paths connecting them.
We apply digitized Quantum Annealing (QA) and Quantum Approximate Optimization Algorithm (QAOA) to a paradigmatic task of supervised learning in artificial neural networks: the optimization of synaptic weights for the binary perceptron.
Although exponentially rare compared to typical solutions (which are narrower and extremely difficult to sample), entropic solutions are accessible to the algorithms used in learning.
The success of deep learning has revealed the application potential of neural networks across the sciences and opened up fundamental theoretical problems.
We analyze the connection between minimizers with good generalizing properties and high local entropy regions of a threshold-linear classifier in Gaussian mixtures with the mean squared error loss function.
Simulated Annealing is the crowning glory of Markov Chain Monte Carlo Methods for the solution of NP-hard optimization problems in which the cost function is known.
The properties of flat minima in the empirical risk landscape of neural networks have been debated for some time.
We introduce an algorithmic decision process for multialternative choice that combines binary comparisons and Markovian exploration.
The geometrical features of the (non-convex) loss landscape of neural network models are crucial in ensuring successful optimization and, most importantly, the capability to generalize well.
Generative processes in biology and other fields often produce data that can be regarded as resulting from a composition of basic features.
Rectified Linear Units (ReLU) have become the main model for the neural units in current deep learning systems.
In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM.
We compare this scheme with state-of-the-art alternative, a more standard genetic algorithm with deterministic pairwise-nearest-neighbor crossover and an elitist selection policy, of which we also provide an augmented and efficient implementation.
Stochasticity and limited precision of synaptic weights in neural network models are key aspects of both biological and hardware modeling of learning processes.
We propose a new algorithm called Parle for parallel training of deep networks that converges 2-4x faster than a data-parallel implementation of SGD, while achieving significantly improved error rates that are nearly state-of-the-art on several benchmarks including CIFAR-10 and CIFAR-100, without introducing any additional hyper-parameters.
Their energy landscapes is dominated by local minima that cause exponential slow down of classical thermal annealers while simulated quantum annealing converges efficiently to rare dense regions of optimal solutions.
This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape.
Our code for the Ising case is publicly available [https://github. com/carlobaldassi/RRRMC. jl], and extensible to user-defined models: it provides efficient implementations of standard Metropolis, the RRR method, the BKL method (extended to the case of continuous energy specra), and the waiting time method [Dall and Sibani Comput. Phys. Commun.
Statistical Mechanics Disordered Systems and Neural Networks
We define a novel measure, which we call the "robust ensemble" (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions.
Learning in neural networks poses peculiar challenges when using discretized rather then continuous synaptic states.
We introduce a novel Entropy-driven Monte Carlo (EdMC) strategy to efficiently sample solutions of random Constraint Satisfaction Problems (CSPs).
We also show that the dense regions are surprisingly accessible by simple learning protocols, and that these synaptic configurations are robust to perturbations and generalize better than typical solutions.
The algorithm we present performs as well as BP on binary perceptron learning problems, and may be better suited to address the problem on fully-connected two-layer networks, since inherent symmetries in two layer networks are naturally broken using the MS approach.