no code implementations • 28 Dec 2024 • Qianli Liao, Liu Ziyin, Yulu Gan, Brian Cheung, Mark Harnett, Tomaso Poggio
Over the last four decades, the amazing success of deep learning has been driven by the use of Stochastic Gradient Descent (SGD) as the main optimization technique.
no code implementations • 31 Dec 2020 • Tomaso Poggio, Qianli Liao
Deep ReLU networks trained with the square loss have been observed to perform well in classification tasks.
no code implementations • 24 Jun 2020 • Arturo Deza, Qianli Liao, Andrzej Banburski, Tomaso Poggio
For object recognition we find, as expected, that scrambling does not affect the performance of shallow or deep fully connected networks contrary to the out-performance of convolutional networks.
no code implementations • 25 Aug 2019 • Tomaso Poggio, Andrzej Banburski, Qianli Liao
In approximation theory both shallow and deep networks have been shown to approximate any continuous functions on a bounded domain at the expense of an exponential number of parameters (exponential in the dimensionality of the function).
no code implementations • 12 Mar 2019 • Andrzej Banburski, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Fernanda De La Torre, Jack Hidary, Tomaso Poggio
In particular, gradient descent induces a dynamics of the normalized weights which converge for $t \to \infty$ to an equilibrium which corresponds to a minimum norm (or maximum margin) solution.
2 code implementations • ICLR 2019 • Will Xiao, Honglin Chen, Qianli Liao, Tomaso Poggio
These results complement the study by Bartunov et al. (2018), and establish a new benchmark for future biologically plausible learning algorithms on more difficult datasets and more complex architectures.
3 code implementations • 25 Jul 2018 • Qianli Liao, Brando Miranda, Andrzej Banburski, Jack Hidary, Tomaso Poggio
Given two networks with the same training loss on a dataset, when would they have drastically different test losses and errors?
no code implementations • 29 Jun 2018 • Tomaso Poggio, Qianli Liao, Brando Miranda, Andrzej Banburski, Xavier Boix, Jack Hidary
Here we prove a similar result for nonlinear multilayer DNNs near zero minima of the empirical loss.
no code implementations • 7 Jan 2018 • Chiyuan Zhang, Qianli Liao, Alexander Rakhlin, Brando Miranda, Noah Golowich, Tomaso Poggio
In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent.
no code implementations • 30 Dec 2017 • Tomaso Poggio, Kenji Kawaguchi, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Xavier Boix, Jack Hidary, Hrushikesh Mhaskar
In this note, we show that the dynamics associated to gradient descent minimization of nonlinear networks is topologically equivalent, near the asymptotically stable minima of the empirical error, to linear gradient system in a quadratic potential with a degenerate (for square loss) or almost degenerate (for logistic or crossentropy loss) Hessian.
no code implementations • 28 Mar 2017 • Qianli Liao, Tomaso Poggio
Previous theoretical work on deep learning and neural network optimization tend to focus on avoiding saddle points and local minima.
no code implementations • 18 Jan 2017 • Vijay Chandrasekhar, Jie Lin, Qianli Liao, Olivier Morère, Antoine Veillard, Ling-Yu Duan, Tomaso Poggio
One major drawback of CNN-based {\it global descriptors} is that uncompressed deep neural network models require hundreds of megabytes of storage making them inconvenient to deploy in mobile applications or in custom hardware.
no code implementations • 2 Nov 2016 • Tomaso Poggio, Hrushikesh Mhaskar, Lorenzo Rosasco, Brando Miranda, Qianli Liao
The paper characterizes classes of functions for which deep learning can be exponentially better than shallow learning.
no code implementations • 19 Oct 2016 • Qianli Liao, Kenji Kawaguchi, Tomaso Poggio
We systematically explored a spectrum of normalization algorithms related to Batch Normalization (BN) and propose a generalized formulation that simultaneously solves two major limitations of BN: (1) online learning and (2) recurrent learning.
no code implementations • 5 Jun 2016 • Joel Z. Leibo, Qianli Liao, Winrich Freiwald, Fabio Anselmi, Tomaso Poggio
The primate brain contains a hierarchy of visual areas, dubbed the ventral stream, which rapidly computes object representations that are both specific for object identity and relatively robust against identity-preserving transformations like depth-rotations.
1 code implementation • 13 Apr 2016 • Qianli Liao, Tomaso Poggio
We discuss relations between Residual Networks (ResNet), Recurrent Neural Networks (RNNs) and the primate visual cortex.
no code implementations • 3 Mar 2016 • Hrushikesh Mhaskar, Qianli Liao, Tomaso Poggio
While the universal approximation property holds both for hierarchical and shallow networks, we prove that deep (hierarchical) networks can approximate the class of compositional functions with the same accuracy as shallow networks but with exponentially lower number of training parameters as well as VC-dimension.
2 code implementations • 17 Oct 2015 • Qianli Liao, Joel Z. Leibo, Tomaso Poggio
Gradient backpropagation (BP) requires symmetric feedforward and feedback connections -- the same weights must be used for forward and backward passes.
Ranked #1 on
Handwritten Digit Recognition
on MNIST
(PERCENTAGE ERROR metric)
no code implementations • 12 Sep 2014 • Qianli Liao, Joel Z. Leibo, Tomaso Poggio
Populations of neurons in inferotemporal cortex (IT) maintain an explicit code for object identity that also tolerates transformations of object appearance e. g., position, scale, viewing angle [1, 2, 3].
no code implementations • NeurIPS 2013 • Qianli Liao, Joel Z. Leibo, Tomaso Poggio
Next, we apply the model to non-affine transformations: as expected, it performs well on face verification tasks requiring invariance to the relatively smooth transformations of 3D rotation-in-depth and changes in illumination direction.
no code implementations • 16 Nov 2013 • Qianli Liao, Joel Z. Leibo, Youssef Mroueh, Tomaso Poggio
The standard approach to unconstrained face recognition in natural photographs is via a detection, alignment, recognition pipeline.