Search Results for author: Michael C. Mozer

Found 46 papers, 17 papers with code

On the Foundations of Shortcut Learning

no code implementations24 Oct 2023 Katherine L. Hermann, Hossein Mobahi, Thomas Fel, Michael C. Mozer

Deep-learning models can extract a rich assortment of features from data.

Can Neural Network Memorization Be Localized?

1 code implementation18 Jul 2023 Pratyush Maini, Michael C. Mozer, Hanie Sedghi, Zachary C. Lipton, J. Zico Kolter, Chiyuan Zhang

Recent efforts at explaining the interplay of memorization and generalization in deep overparametrized networks have posited that neural networks $\textit{memorize}$ "hard" examples in the final few layers of the model.


Layer-Stack Temperature Scaling

no code implementations18 Nov 2022 Amr Khalifa, Michael C. Mozer, Hanie Sedghi, Behnam Neyshabur, Ibrahim Alabdulmohsin

Inspired by this, we show that extending temperature scaling across all layers improves both calibration and accuracy.

An Empirical Study on Clustering Pretrained Embeddings: Is Deep Strictly Better?

no code implementations9 Nov 2022 Tyler R. Scott, Ting Liu, Michael C. Mozer, Andrew C. Gallagher

Recent research in clustering face embeddings has found that unsupervised, shallow, heuristic-based methods -- including $k$-means and hierarchical agglomerative clustering -- underperform supervised, deep, inductive methods.


Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning

1 code implementation10 Jan 2022 Utku Evci, Vincent Dumoulin, Hugo Larochelle, Michael C. Mozer

We propose a method, Head-to-Toe probing (Head2Toe), that selects features from all layers of the source model to train a classification head for the target-domain.

Transfer Learning

Online Unsupervised Learning of Visual Representations and Categories

1 code implementation13 Sep 2021 Mengye Ren, Tyler R. Scott, Michael L. Iuzzolino, Michael C. Mozer, Richard Zemel

Real world learning scenarios involve a nonstationary distribution of classes with sequential dependencies among the samples, in contrast to the standard machine learning formulation of drawing samples independently from a fixed, typically uniform distribution.

Few-Shot Learning Representation Learning +1

Soft Calibration Objectives for Neural Networks

no code implementations NeurIPS 2021 Archit Karandikar, Nicholas Cain, Dustin Tran, Balaji Lakshminarayanan, Jonathon Shlens, Michael C. Mozer, Becca Roelofs

When incorporated into training, these soft calibration losses achieve state-of-the-art single-model ECE across multiple datasets with less than 1% decrease in accuracy.

Decision Making

von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning

1 code implementation ICCV 2021 Tyler R. Scott, Andrew C. Gallagher, Michael C. Mozer

Recent work has argued that classification losses utilizing softmax cross-entropy are superior not only for fixed-set classification tasks, but also by outperforming losses developed specifically for open-set tasks including few-shot learning and retrieval.

Classification Few-Shot Learning +3

Understanding Invariance via Feedforward Inversion of Discriminatively Trained Classifiers

no code implementations15 Mar 2021 Piotr Teterwak, Chiyuan Zhang, Dilip Krishnan, Michael C. Mozer

We use our reconstruction model as a tool for exploring the nature of representations, including: the influence of model architecture and training objectives (specifically robust losses), the forms of invariance that networks achieve, representational differences between correctly and incorrectly classified images, and the effects of manipulating logits and images.

Improving Anytime Prediction with Parallel Cascaded Networks and a Temporal-Difference Loss

1 code implementation NeurIPS 2021 Michael L. Iuzzolino, Michael C. Mozer, Samy Bengio

Although deep feedforward neural networks share some characteristics with the primate visual system, a key distinction is their dynamics.

Mitigating Bias in Calibration Error Estimation

1 code implementation15 Dec 2020 Rebecca Roelofs, Nicholas Cain, Jonathon Shlens, Michael C. Mozer

We find that binning-based estimators with bins of equal mass (number of instances) have lower bias than estimators with bins of equal width.

Transforming Neural Network Visual Representations to Predict Human Judgments of Similarity

no code implementations13 Oct 2020 Maria Attarian, Brett D. Roads, Michael C. Mozer

Deep-learning vision models have shown intriguing similarities and differences with respect to human vision.

Wandering Within a World: Online Contextualized Few-Shot Learning

1 code implementation ICLR 2021 Mengye Ren, Michael L. Iuzzolino, Michael C. Mozer, Richard S. Zemel

We aim to bridge the gap between typical human and machine-learning environments by extending the standard framework of few-shot learning to an online, continual setting.

Few-Shot Learning

Compositional Embeddings for Multi-Label One-Shot Learning

no code implementations11 Feb 2020 Zeqian Li, Michael C. Mozer, Jacob Whitehill

We present a compositional embedding framework that infers not just a single class per input image, but a set of classes, in the setting of one-shot learning.

Object Detection Object Recognition +2

Characterizing Structural Regularities of Labeled Data in Overparameterized Models

1 code implementation8 Feb 2020 Ziheng Jiang, Chiyuan Zhang, Kunal Talwar, Michael C. Mozer

We obtain empirical estimates of this score for individual instances in multiple data sets, and we show that the score identifies out-of-distribution and mislabeled examples at one end of the continuum and strongly regular examples at the other end.

Density Estimation Out-of-Distribution Detection +1

Learning Neural Causal Models from Unknown Interventions

2 code implementations2 Oct 2019 Nan Rosemary Ke, Olexa Bilaniuk, Anirudh Goyal, Stefan Bauer, Hugo Larochelle, Bernhard Schölkopf, Michael C. Mozer, Chris Pal, Yoshua Bengio

Promising results have driven a recent surge of interest in continuous optimization methods for Bayesian network structure learning from observational data.


Stochastic Prototype Embeddings

no code implementations25 Sep 2019 Tyler R. Scott, Karl Ridgeway, Michael C. Mozer

We propose a probabilistic method that treats embeddings as random variables.

Few-Shot Learning

Convolutional Bipartite Attractor Networks

no code implementations8 Jun 2019 Michael Iuzzolino, Yoram Singer, Michael C. Mozer

In human perception and cognition, a fundamental operation that brains perform is interpretation: constructing coherent neural states from noisy, incomplete, and intrinsically ambiguous evidence.

Image Denoising Imputation +1

Scaling characteristics of sequential multitask learning: Networks naturally learn to learn

no code implementations ICML Workshop Deep_Phenomen 2019 Guy Davidson, Michael C. Mozer

We explore the behavior of a standard convolutional neural net in a setting that introduces classification tasks sequentially and requires the net to master new tasks while preserving mastery of previously learned tasks.

Sequential mastery of multiple visual tasks: Networks naturally learn to learn and forget to forget

no code implementations CVPR 2020 Guy Davidson, Michael C. Mozer

Through simulations involving sequences of ten related visual tasks, we find reason for optimism that nets will scale well as they advance from having a single skill to becoming multi-skill domain experts.

Continual Learning

Neural Networks Trained on Natural Scenes Exhibit Gestalt Closure

1 code implementation4 Mar 2019 Been Kim, Emily Reif, Martin Wattenberg, Samy Bengio, Michael C. Mozer

The Gestalt laws of perceptual organization, which describe how visual elements in an image are grouped and interpreted, have traditionally been thought of as innate despite their ecological validity.

Image Classification

Identity Crisis: Memorization and Generalization under Extreme Overparameterization

no code implementations ICLR 2020 Chiyuan Zhang, Samy Bengio, Moritz Hardt, Michael C. Mozer, Yoram Singer

We study the interplay between memorization and generalization of overparameterized networks in the extreme case of a single training example and an identity-mapping task.


Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding

no code implementations NeurIPS 2018 Nan Rosemary Ke, Anirudh Goyal Alias Parth Goyal, Olexa Bilaniuk, Jonathan Binas, Michael C. Mozer, Chris Pal, Yoshua Bengio

We consider the hypothesis that such memory associations between past and present could be used for credit assignment through arbitrarily long sequences, propagating the credit assigned to the current state to the associated past state.

Temporal Sequences

Open-Ended Content-Style Recombination Via Leakage Filtering

no code implementations ICLR 2019 Karl Ridgeway, Michael C. Mozer

We present a domain-independent method that permits the open-ended recombination of style of one image with the content of another.

Few-Shot Learning Metric Learning

Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding

no code implementations11 Sep 2018 Nan Rosemary Ke, Anirudh Goyal, Olexa Bilaniuk, Jonathan Binas, Michael C. Mozer, Chris Pal, Yoshua Bengio

We consider the hypothesis that such memory associations between past and present could be used for credit assignment through arbitrarily long sequences, propagating the credit assigned to the current state to the associated past state.

Temporal Sequences

Adapted Deep Embeddings: A Synthesis of Methods for $k$-Shot Inductive Transfer Learning

2 code implementations22 May 2018 Tyler R. Scott, Karl Ridgeway, Michael C. Mozer

We hope our results will motivate a unification of research in weight transfer, deep metric learning, and few-shot learning.

Few-Shot Learning Metric Learning +1

State-Denoised Recurrent Neural Networks

no code implementations ICLR 2019 Michael C. Mozer, Denis Kazakov, Robert V. Lindsey

Attractor dynamics are incorporated into the hidden state to `clean up' representations at each step of a sequence.


Learning Deep Disentangled Embeddings with the F-Statistic Loss

3 code implementations NeurIPS 2018 Karl Ridgeway, Michael C. Mozer

Deep-embedding methods aim to discover representations of a domain that make explicit the domain's class structure and thereby support few-shot learning.

Few-Shot Learning

Discrete Event, Continuous Time RNNs

1 code implementation11 Oct 2017 Michael C. Mozer, Denis Kazakov, Robert V. Lindsey

The CT-GRU arises by interpreting the gates of a GRU as selecting a time scale of memory, and the CT-GRU generalizes the GRU by incorporating multiple time scales of memory and performing context-dependent selection of time scales for information storage and retrieval.

Inductive Bias Retrieval

Improving Human-Machine Cooperative Visual Search With Soft Highlighting

no code implementations24 Dec 2016 Ronald T. Kneusel, Michael C. Mozer

We describe a human-machine cooperative approach to visual search, the aim of which is to outperform either human or machine acting alone.

How deep is knowledge tracing?

no code implementations14 Mar 2016 Mohammad Khajah, Robert V. Lindsey, Michael C. Mozer

In theoretical cognitive science, there is a tension between highly structured models whose parameters have a direct psychological interpretation and highly complex, general-purpose models whose parameters and representations are difficult to interpret.

Knowledge Tracing

Learning to Generate Images with Perceptual Similarity Metrics

1 code implementation19 Nov 2015 Jake Snell, Karl Ridgeway, Renjie Liao, Brett D. Roads, Michael C. Mozer, Richard S. Zemel

We propose instead to use a loss function that is better calibrated to human perceptual judgments of image quality: the multiscale structural-similarity score (MS-SSIM).

Image Classification Image Generation +3

Automatic Discovery of Cognitive Skills to Improve the Prediction of Student Learning

no code implementations NeurIPS 2014 Robert V. Lindsey, Mohammad Khajah, Michael C. Mozer

First, in three of the five datasets, the skills inferred by our technique support significantly improved predictions of student performance over the expert-provided skills.


Optimizing Instructional Policies

no code implementations NeurIPS 2013 Robert V. Lindsey, Michael C. Mozer, William J. Huggins, Harold Pashler

For example, in the domain of concept learning, a policy might specify the nature of exemplars chosen over a training sequence.

An Unsupervised Decontamination Procedure For Improving The Reliability Of Human Judgments

no code implementations NeurIPS 2011 Michael C. Mozer, Benjamin Link, Harold Pashler

Psychologists have long been struck by individuals' limitations in expressing their internal sensations, impressions, and evaluations via rating scales.

Informativeness Test

Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory

no code implementations NeurIPS 2009 Harold Pashler, Nicholas Cepeda, Robert V. Lindsey, Ed Vul, Michael C. Mozer

MCM is intriguingly similar to a Bayesian multiscale model of memory (Kording, Tenenbaum, Shadmehr, 2007), yet MCM is better able to account for human declarative memory.

Sequential effects reflect parallel learning of multiple environmental regularities

no code implementations NeurIPS 2009 Matthew Wilder, Matt Jones, Michael C. Mozer

The Dynamic Belief Model (DBM) (Yu & Cohen, 2008) explains sequential effects in 2AFC tasks as a rational consequence of a dynamic internal representation that tracks second-order statistics of the trial sequence (repetition rates) and predicts whether the upcoming trial will be a repetition or an alternation of the previous trial.

Temporal Dynamics of Cognitive Control

no code implementations NeurIPS 2008 Jeremy Reynolds, Michael C. Mozer

We show that our model provides a parsimonious account of behavioral and neuroimaging data, and suggest that it offers an elegant conceptualization of control in which behavior can be cast as optimal, subject to limitations on learning and the rate of information processing.

Optimal Response Initiation: Why Recent Experience Matters

no code implementations NeurIPS 2008 Matt Jones, Sachiko Kinoshita, Michael C. Mozer

We propose a rationally motivated mathematical model of this sequential adaptation of control, based on a diffusion model of the decision process in which difficulty corresponds to the drift rate for the correct response.

Decision Making

Cannot find the paper you are looking for? You can Submit a new open access paper.