Memorization
391 papers with code • 1 benchmarks • 4 datasets
Libraries
Use these libraries to find Memorization models and implementationsMost implemented papers
mixup: Beyond Empirical Risk Minimization
We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.
Wide & Deep Learning for Recommender Systems
Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort.
Neural Machine Translation in Linear Time
The ByteNet is a one-dimensional convolutional neural network that is composed of two parts, one to encode the source sequence and the other to decode the target sequence.
PaLM: Scaling Language Modeling with Pathways
To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
In this paper we propose to study generalization of neural networks on small algorithmically generated datasets.
Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels
Deep learning with noisy labels is practically challenging, as the capacity of deep models is so high that they can totally memorize these noisy labels sooner or later during training.
Generalization through Memorization: Nearest Neighbor Language Models
Applying this augmentation to a strong Wikitext-103 LM, with neighbors drawn from the original training set, our $k$NN-LM achieves a new state-of-the-art perplexity of 15. 79 - a 2. 9 point improvement with no additional training.
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.
DAS3H: Modeling Student Learning and Forgetting for Optimally Scheduling Distributed Practice of Skills
In this article, we first frame the research problem of optimizing an adaptive and personalized spaced repetition scheduler when memorization concerns the application of underlying multiple skills.
Learning explanations that are hard to vary
In this paper, we investigate the principle that `good explanations are hard to vary' in the context of deep learning.