no code implementations • 7 Feb 2025 • Liu Ziyin, Yizhou Xu, Tomaso Poggio, Isaac Chuang
The dynamics of learning in modern large AI systems is hierarchical, often characterized by abrupt, qualitative shifts akin to phase transitions observed in physical systems.
1 code implementation • 25 Jan 2025 • Kushagra Tiwary, Aaron Young, Zaid Tasneem, Tzofi Klinghoffer, Akshat Dave, Tomaso Poggio, Dan-Eric Nilsson, Brian Cheung, Ramesh Raskar
First, we demonstrate computational evidence that task specific selection drives bifurcation in eye evolution - orientation tasks like navigation in a maze leads to distributed compound-type eyes while an object discrimination task leads to the emergence of high-acuity camera-type eyes.
no code implementations • 28 Dec 2024 • Qianli Liao, Liu Ziyin, Yulu Gan, Brian Cheung, Mark Harnett, Tomaso Poggio
Over the last four decades, the amazing success of deep learning has been driven by the use of Stochastic Gradient Descent (SGD) as the main optimization technique.
no code implementations • 20 Nov 2024 • Andrea Pinto, Akshay Rangamani, Tomaso Poggio
While previous optimization results have suggested that deep neural networks tend to favour low-rank weight matrices, the implications of this inductive bias on generalization bounds remain underexplored.
no code implementations • 26 Oct 2024 • Vighnesh Subramaniam, David Mayo, Colin Conwell, Tomaso Poggio, Boris Katz, Brian Cheung, Andrei Barbu
If the guide is trained, this transfers over part of the architectural prior and knowledge of the guide to the target.
no code implementations • 3 Oct 2024 • Liu Ziyin, Isaac Chuang, Tomer Galanti, Tomaso Poggio
We then show that the breaking of CRH leads to the emergence of reciprocal power-law relations between R, W, and G, which we refer to as the Polynomial Alignment Hypothesis (PAH).
no code implementations • 27 Sep 2024 • Yulu Gan, Tomer Galanti, Tomaso Poggio, Eran Malach
This research reveals the unique computational abilities of ARDTs, aiming to broaden the architectural diversity in language model development.
no code implementations • 17 Jun 2024 • Pierfrancesco Beneventano, Andrea Pinto, Tomaso Poggio
In contrast, we demonstrate that while vanilla GD also approximates the target function, it requires an explicit regularization term to learn the support in the first layer.
no code implementations • 13 Feb 2023 • Yena Han, Tomaso Poggio, Brian Cheung
The networks are compared to recordings of biological neurons, and good performance in reproducing neural responses is considered to support the model's validity.
no code implementations • 28 Jan 2023 • Tomer Galanti, Mengjia Xu, Liane Galanti, Tomaso Poggio
In this paper, we investigate the Rademacher complexity of deep sparse neural networks, where each neuron receives a small number of inputs.
no code implementations • 24 Dec 2022 • Vassilis Apidopoulos, Tomaso Poggio, Lorenzo Rosasco, Silvia Villa
In this paper, we focus on iterative regularization in the context of classification.
no code implementations • 12 Jun 2022 • Tomer Galanti, Zachary S. Siegel, Aparna Gupte, Tomaso Poggio
We investigate the inherent bias of Stochastic Gradient Descent (SGD) toward learning low-rank weight matrices during the training of deep neural networks.
no code implementations • 22 Oct 2021 • Simon Alford, Anshula Gandhi, Akshay Rangamani, Andrzej Banburski, Tony Wang, Sylee Dandekar, John Chin, Tomaso Poggio, Peter Chin
More specifically, we extend existing execution-guided program synthesis approaches with deductive reasoning based on function inverse semantics to enable a neural-guided bidirectional search algorithm.
no code implementations • 21 Jul 2021 • Andrzej Banburski, Fernanda De La Torre, Nishka Pant, Ishana Shastri, Tomaso Poggio
Recent theoretical results show that gradient descent on deep neural networks under exponential loss functions locally maximizes classification margin, which is equivalent to minimizing the norm of the weight matrices under margin constraints.
no code implementations • 21 Feb 2021 • Owen Kunhardt, Arturo Deza, Tomaso Poggio
In this paper, we propose an adaptation to the area under the curve (AUC) metric to measure the adversarial robustness of a model over a particular $\epsilon$-interval $[\epsilon_0, \epsilon_1]$ (interval of adversarial perturbation strengths) that facilitates unbiased comparisons across models when they have different initial $\epsilon_0$ performance.
no code implementations • 31 Dec 2020 • Tomaso Poggio, Qianli Liao
Deep ReLU networks trained with the square loss have been observed to perform well in classification tasks.
1 code implementation • NeurIPS Workshop SVRHM 2020 • Elian Malkin, Arturo Deza, Tomaso Poggio
The spatially-varying field of the human visual system has recently received a resurgence of interest with the development of virtual reality (VR) and neural networks.
2 code implementations • NeurIPS 2020 • Manish V. Reddy, Andrzej Banburski, Nishka Pant, Tomaso Poggio
A convolutional neural network strongly robust to adversarial perturbations at reasonable computational and performance cost has not yet been demonstrated.
no code implementations • 28 Jun 2020 • Akshay Rangamani, Lorenzo Rosasco, Tomaso Poggio
We study the average $\mbox{CV}_{loo}$ stability of kernel ridge-less regression and derive corresponding risk bounds.
no code implementations • 24 Jun 2020 • Arturo Deza, Qianli Liao, Andrzej Banburski, Tomaso Poggio
For object recognition we find, as expected, that scrambling does not affect the performance of shallow or deep fully connected networks contrary to the out-performance of convolutional networks.
no code implementations • 12 Dec 2019 • Tomaso Poggio, Gil Kur, Andrzej Banburski
In solving a system of $n$ linear equations in $d$ variables $Ax=b$, the condition number of the $n, d$ matrix $A$ measures how much errors in the data $b$ affect the solution $x$.
no code implementations • 25 Aug 2019 • Tomaso Poggio, Andrzej Banburski, Qianli Liao
In approximation theory both shallow and deep networks have been shown to approximate any continuous functions on a bounded domain at the expense of an exponential number of parameters (exponential in the dimensionality of the function).
no code implementations • 12 Mar 2019 • Andrzej Banburski, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Fernanda De La Torre, Jack Hidary, Tomaso Poggio
In particular, gradient descent induces a dynamics of the normalized weights which converge for $t \to \infty$ to an equilibrium which corresponds to a minimum norm (or maximum margin) solution.
2 code implementations • ICLR 2019 • Will Xiao, Honglin Chen, Qianli Liao, Tomaso Poggio
These results complement the study by Bartunov et al. (2018), and establish a new benchmark for future biologically plausible learning algorithms on more difficult datasets and more complex architectures.
3 code implementations • 25 Jul 2018 • Qianli Liao, Brando Miranda, Andrzej Banburski, Jack Hidary, Tomaso Poggio
Given two networks with the same training loss on a dataset, when would they have drastically different test losses and errors?
no code implementations • 29 Jun 2018 • Tomaso Poggio, Qianli Liao, Brando Miranda, Andrzej Banburski, Xavier Boix, Jack Hidary
Here we prove a similar result for nonlinear multilayer DNNs near zero minima of the empirical loss.
no code implementations • 12 Jun 2018 • Charlie Frogner, Tomaso Poggio
We present a novel approximate inference method for diffusion processes, based on the Wasserstein gradient flow formulation of the diffusion.
no code implementations • 17 Feb 2018 • Hrushikesh Mhaskar, Tomaso Poggio
We argue that the minimal expected value of the square loss is inappropriate to measure the generalization error in approximation of compositional functions in order to take full advantage of the compositional structure.
no code implementations • 7 Jan 2018 • Chiyuan Zhang, Qianli Liao, Alexander Rakhlin, Brando Miranda, Noah Golowich, Tomaso Poggio
In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent.
no code implementations • 30 Dec 2017 • Tomaso Poggio, Kenji Kawaguchi, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Xavier Boix, Jack Hidary, Hrushikesh Mhaskar
In this note, we show that the dynamics associated to gradient descent minimization of nonlinear networks is topologically equivalent, near the asymptotically stable minima of the empirical error, to linear gradient system in a quadratic potential with a degenerate (for square loss) or almost degenerate (for logistic or crossentropy loss) Hessian.
1 code implementation • 5 Nov 2017 • Tengyuan Liang, Tomaso Poggio, Alexander Rakhlin, James Stokes
We study the relationship between geometry and capacity measures for deep neural networks from an invariance viewpoint.
no code implementations • 18 Jul 2017 • Gaurav Manek, Jie Lin, Vijay Chandrasekhar, Ling-Yu Duan, Sateesh Giduthuri, Xiao-Li Li, Tomaso Poggio
In this work, we focus on the problem of image instance retrieval with deep descriptors extracted from pruned Convolutional Neural Networks (CNN).
2 code implementations • NeurIPS 2017 • Anna Volokitin, Gemma Roig, Tomaso Poggio
Also, for all tested networks, when trained on targets in isolation, we find that recognition accuracy of the networks decreases the closer the flankers are to the target and the more flankers there are.
no code implementations • 28 Mar 2017 • Qianli Liao, Tomaso Poggio
Previous theoretical work on deep learning and neural network optimization tend to focus on avoiding saddle points and local minima.
no code implementations • 18 Jan 2017 • Vijay Chandrasekhar, Jie Lin, Qianli Liao, Olivier Morère, Antoine Veillard, Ling-Yu Duan, Tomaso Poggio
One major drawback of CNN-based {\it global descriptors} is that uncompressed deep neural network models require hundreds of megabytes of storage making them inconvenient to deploy in mobile applications or in custom hardware.
no code implementations • 2 Nov 2016 • Tomaso Poggio, Hrushikesh Mhaskar, Lorenzo Rosasco, Brando Miranda, Qianli Liao
The paper characterizes classes of functions for which deep learning can be exponentially better than shallow learning.
no code implementations • 19 Oct 2016 • Qianli Liao, Kenji Kawaguchi, Tomaso Poggio
We systematically explored a spectrum of normalization algorithms related to Batch Normalization (BN) and propose a generalized formulation that simultaneously solves two major limitations of BN: (1) online learning and (2) recurrent learning.
no code implementations • 10 Aug 2016 • Hrushikesh Mhaskar, Tomaso Poggio
The paper announces new results for a non-smooth activation function - the ReLU function - used in present-day neural networks, as well as for the Gaussian networks.
no code implementations • 5 Jun 2016 • Joel Z. Leibo, Qianli Liao, Winrich Freiwald, Fabio Anselmi, Tomaso Poggio
The primate brain contains a hierarchy of visual areas, dubbed the ventral stream, which rapidly computes object representations that are both specific for object identity and relatively robust against identity-preserving transformations like depth-rotations.
1 code implementation • 13 Apr 2016 • Qianli Liao, Tomaso Poggio
We discuss relations between Residual Networks (ResNet), Recurrent Neural Networks (RNNs) and the primate visual cortex.
no code implementations • 15 Mar 2016 • Olivier Morère, Jie Lin, Antoine Veillard, Vijay Chandrasekhar, Tomaso Poggio
The first one is Nested Invariance Pooling (NIP), a method inspired from i-theory, a mathematical theory for computing group invariant transformations with feed-forward neural networks.
no code implementations • 3 Mar 2016 • Hrushikesh Mhaskar, Qianli Liao, Tomaso Poggio
While the universal approximation property holds both for hierarchical and shallow networks, we prove that deep (hierarchical) networks can approximate the class of compositional functions with the same accuracy as shallow networks but with exponentially lower number of training parameters as well as VC-dimension.
no code implementations • 9 Jan 2016 • Olivier Morère, Antoine Veillard, Jie Lin, Julie Petta, Vijay Chandrasekhar, Tomaso Poggio
Based on a thorough empirical evaluation using several publicly available datasets, we show that our method is able to significantly and consistently improve retrieval results every time a new type of invariance is incorporated.
no code implementations • 19 Nov 2015 • Yan Luo, Xavier Boix, Gemma Roig, Tomaso Poggio, Qi Zhao
To see this, first, we report results in ImageNet that lead to a revision of the hypothesis that adversarial perturbations are a consequence of CNNs acting as a linear classifier: CNNs act locally linearly to changes in the image regions with objects recognized by the CNN, and in other regions the CNN may act non-linearly.
2 code implementations • 17 Oct 2015 • Qianli Liao, Joel Z. Leibo, Tomaso Poggio
Gradient backpropagation (BP) requires symmetric feedforward and feedback connections -- the same weights must be used for forward and backward passes.
Ranked #1 on
Handwritten Digit Recognition
on MNIST
(PERCENTAGE ERROR metric)
4 code implementations • 16 Oct 2015 • Maximilian Nickel, Lorenzo Rosasco, Tomaso Poggio
Learning embeddings of entities and relations is an efficient and versatile method to perform machine learning on relational data such as knowledge graphs.
Ranked #9 on
Link Prediction
on FB15k
no code implementations • 5 Aug 2015 • Fabio Anselmi, Lorenzo Rosasco, Cheston Tan, Tomaso Poggio
In i-theory a typical layer of a hierarchical architecture consists of HW modules pooling the dot products of the inputs to the layer with the transformations of a few templates under a group.
no code implementations • NeurIPS 2015 • Charlie Frogner, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya-Polo, Tomaso Poggio
In this paper we develop a loss function for multi-label learning, based on the Wasserstein distance.
no code implementations • NeurIPS 2015 • Youssef Mroueh, Stephen Voinea, Tomaso Poggio
Our analysis bridges invariant feature learning with kernel methods, as we show that this feature map defines an expected Haar integration kernel that is invariant to the specified group action.
1 code implementation • 13 Apr 2015 • Carlo Ciliberto, Youssef Mroueh, Tomaso Poggio, Lorenzo Rosasco
In this context a fundamental question is how to incorporate the tasks structure in the learning problem. We tackle this question by studying a general computational framework that allows to encode a-priori knowledge of the tasks structure in the form of a convex penalty; in this setting a variety of previously proposed methods can be recovered as special cases, including linear and non-linear approaches.
no code implementations • 19 Mar 2015 • Fabio Anselmi, Lorenzo Rosasco, Tomaso Poggio
We discuss data representation which can be learned automatically from data, are invariant to transformations, and at the same time selective, in the sense that two points have the same representation only if they are one the transformation of the other.
no code implementations • 12 Sep 2014 • Qianli Liao, Joel Z. Leibo, Tomaso Poggio
Populations of neurons in inferotemporal cortex (IT) maintain an explicit code for object identity that also tolerates transformations of object appearance e. g., position, scale, viewing angle [1, 2, 3].
no code implementations • 16 Jun 2014 • Georgios Evangelopoulos, Stephen Voinea, Chiyuan Zhang, Lorenzo Rosasco, Tomaso Poggio
Recognition of speech, and in particular the ability to generalize and learn from small sets of labelled examples like humans do, depends on an appropriate representation of the acoustic input.
no code implementations • 15 Jun 2014 • Cheston Tan, Tomaso Poggio
The main aim of this work is to further the fundamental understanding of what causes the visual processing of faces to be different from that of objects.
no code implementations • 6 Jun 2014 • Tomaso Poggio, Jim Mutch, Leyla Isik
From the slope of the inverse of the magnification factor, M-theory predicts a cortical "fovea" in V1 in the order of $40$ by $40$ basic units at each receptive field size -- corresponding to a foveola of size around $26$ minutes of arc at the highest resolution, $\approx 6$ degrees at the lowest resolution.
no code implementations • 1 Apr 2014 • Chiyuan Zhang, Georgios Evangelopoulos, Stephen Voinea, Lorenzo Rosasco, Tomaso Poggio
We present the main theoretical and computational aspects of a framework for unsupervised learning of invariant audio representations, empirically evaluated on music genre classification.
no code implementations • NeurIPS 2013 • Qianli Liao, Joel Z. Leibo, Tomaso Poggio
Next, we apply the model to non-affine transformations: as expected, it performs well on face verification tasks requiring invariance to the relatively smooth transformations of 3D rotation-in-depth and changes in illumination direction.
no code implementations • NeurIPS 2013 • Cheston Tan, Jedediah M. Singer, Thomas Serre, David Sheinberg, Tomaso Poggio
The macaque Superior Temporal Sulcus (STS) is a brain area that receives and integrates inputs from both the ventral and dorsal visual processing streams (thought to specialize in form and motion processing respectively).
no code implementations • 17 Nov 2013 • Fabio Anselmi, Joel Z. Leibo, Lorenzo Rosasco, Jim Mutch, Andrea Tacchetti, Tomaso Poggio
It also suggests that the main computational goal of the ventral stream of visual cortex is to provide a hierarchical representation of new objects/images which is invariant to transformations, stable, and discriminative for recognition---and that this representation may be continuously learned in an unsupervised way during development and visual experience.
no code implementations • 16 Nov 2013 • Qianli Liao, Joel Z. Leibo, Youssef Mroueh, Tomaso Poggio
The standard approach to unconstrained face recognition in natural photographs is via a detection, alignment, recognition pipeline.
no code implementations • 24 Mar 2013 • Silvia Villa, Lorenzo Rosasco, Tomaso Poggio
We consider the fundamental question of learnability of a hypotheses class in the supervised learning setting and in the general learning setting introduced by Vladimir Vapnik.
no code implementations • NeurIPS 2012 • Youssef Mroueh, Tomaso Poggio, Lorenzo Rosasco, Jean-Jeacques Slotine
In this paper we dicuss a novel framework for multiclass learning, defined by a suitable coding/decoding strategy, namely the simplex coding, that allows to generalize to multiple classes a relaxation approach commonly used in binary classification.
no code implementations • NeurIPS 2012 • Guillermo Canas, Tomaso Poggio, Lorenzo Rosasco
We study the problem of estimating a manifold from random samples.
no code implementations • NeurIPS 2011 • Joel Z. Leibo, Jim Mutch, Tomaso Poggio
Many studies have uncovered evidence that visual cortex contains specialized regions involved in processing faces but not other object classes.
no code implementations • NeurIPS 2009 • Jake Bouvrie, Lorenzo Rosasco, Tomaso Poggio
A goal of central importance in the study of hierarchical models for object recognition -- and indeed the visual cortex -- is that of understanding quantitatively the trade-off between invariance and selectivity, and how invariance and discrimination properties contribute towards providing an improved representation useful for learning from data.