Search Results for author: Brando Miranda

Found 13 papers, 2 papers with code

Is Pre-training Truly Better Than Meta-Learning?

no code implementations24 Jun 2023 Brando Miranda, Patrick Yu, Saumya Goyal, Yu-Xiong Wang, Sanmi Koyejo

Using this analysis, we demonstrate the following: 1. when the formal diversity of a data set is low, PT beats MAML on average and 2. when the formal diversity is high, MAML beats PT on average.

Few-Shot Learning

Beyond Scale: the Diversity Coefficient as a Data Quality Metric Demonstrates LLMs are Pre-trained on Formally Diverse Data

no code implementations24 Jun 2023 Alycia Lee, Brando Miranda, Sudharsan Sundar, Sanmi Koyejo

Current trends to pre-train capable Large Language Models (LLMs) mostly focus on scaling of model and dataset size.

Are Emergent Abilities of Large Language Models a Mirage?

no code implementations NeurIPS 2023 Rylan Schaeffer, Brando Miranda, Sanmi Koyejo

Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models.

Transformer Models for Type Inference in the Simply Typed Lambda Calculus: A Case Study in Deep Learning for Code

no code implementations15 Mar 2023 Brando Miranda, Avi Shinnar, Vasily Pestun, Barry Trager

Despite a growing body of work at the intersection of deep learning and formal languages, there has been relatively little systematic exploration of transformer models for reasoning about typed lambda calculi.

The Curse of Low Task Diversity: On the Failure of Transfer Learning to Outperform MAML and Their Empirical Equivalence

no code implementations2 Aug 2022 Brando Miranda, Patrick Yu, Yu-Xiong Wang, Sanmi Koyejo

This novel insight contextualizes claims that transfer learning solutions are better than meta-learned solutions in the regime of low diversity under a fair comparison.

Few-Shot Learning Transfer Learning

The Curse of Zero Task Diversity: On the Failure of Transfer Learning to Outperform MAML and their Empirical Equivalence

no code implementations24 Dec 2021 Brando Miranda, Yu-Xiong Wang, Sanmi Koyejo

We hypothesize that the diversity coefficient of the few-shot learning benchmark is predictive of whether meta-learning solutions will succeed or not.

Few-Shot Learning Transfer Learning

Does MAML Only Work via Feature Re-use? A Data Centric Perspective

1 code implementation24 Dec 2021 Brando Miranda, Yu-Xiong Wang, Sanmi Koyejo

Recent work has suggested that a good embedding is all we need to solve many few-shot learning benchmarks.

Few-Shot Learning

Theory III: Dynamics and Generalization in Deep Networks

no code implementations12 Mar 2019 Andrzej Banburski, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Fernanda De La Torre, Jack Hidary, Tomaso Poggio

In particular, gradient descent induces a dynamics of the normalized weights which converge for $t \to \infty$ to an equilibrium which corresponds to a minimum norm (or maximum margin) solution.

A Surprising Linear Relationship Predicts Test Performance in Deep Networks

3 code implementations25 Jul 2018 Qianli Liao, Brando Miranda, Andrzej Banburski, Jack Hidary, Tomaso Poggio

Given two networks with the same training loss on a dataset, when would they have drastically different test losses and errors?

General Classification Generalization Bounds

Theory IIIb: Generalization in Deep Networks

no code implementations29 Jun 2018 Tomaso Poggio, Qianli Liao, Brando Miranda, Andrzej Banburski, Xavier Boix, Jack Hidary

Here we prove a similar result for nonlinear multilayer DNNs near zero minima of the empirical loss.

Binary Classification

Theory of Deep Learning IIb: Optimization Properties of SGD

no code implementations7 Jan 2018 Chiyuan Zhang, Qianli Liao, Alexander Rakhlin, Brando Miranda, Noah Golowich, Tomaso Poggio

In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent.

Theory of Deep Learning III: explaining the non-overfitting puzzle

no code implementations30 Dec 2017 Tomaso Poggio, Kenji Kawaguchi, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Xavier Boix, Jack Hidary, Hrushikesh Mhaskar

In this note, we show that the dynamics associated to gradient descent minimization of nonlinear networks is topologically equivalent, near the asymptotically stable minima of the empirical error, to linear gradient system in a quadratic potential with a degenerate (for square loss) or almost degenerate (for logistic or crossentropy loss) Hessian.

General Classification

Why and When Can Deep -- but Not Shallow -- Networks Avoid the Curse of Dimensionality: a Review

no code implementations2 Nov 2016 Tomaso Poggio, Hrushikesh Mhaskar, Lorenzo Rosasco, Brando Miranda, Qianli Liao

The paper characterizes classes of functions for which deep learning can be exponentially better than shallow learning.

Cannot find the paper you are looking for? You can Submit a new open access paper.