155 papers with code • 0 benchmarks • 3 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

mixup: Beyond Empirical Risk Minimization

facebookresearch/mixup-cifar10 ICLR 2018

We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.

Wide & Deep Learning for Recommender Systems

microsoft/recommenders 24 Jun 2016

Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort.

Neural Machine Translation in Linear Time

paarthneekhara/byteNet-tensorflow 31 Oct 2016

The ByteNet is a one-dimensional convolutional neural network that is composed of two parts, one to encode the source sequence and the other to decode the target sequence.

Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels

bhanML/Co-teaching NeurIPS 2018

Deep learning with noisy labels is practically challenging, as the capacity of deep models is so high that they can totally memorize these noisy labels sooner or later during training.

Generalization through Memorization: Nearest Neighbor Language Models

urvashik/knnlm ICLR 2020

Applying this augmentation to a strong Wikitext-103 LM, with neighbors drawn from the original training set, our $k$NN-LM achieves a new state-of-the-art perplexity of 15. 79 - a 2. 9 point improvement with no additional training.

PaLM: Scaling Language Modeling with Pathways

lucidrains/CoCa-pytorch Google Research 2022

To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.

Associative Long Short-Term Memory

mohammadpz/Associative_LSTM 9 Feb 2016

We investigate a new method to augment recurrent neural networks with extra memory without increasing the number of network parameters.

How does Disagreement Help Generalization against Label Corruption?

xingruiyu/coteaching_plus 14 Jan 2019

Learning with noisy labels is one of the hottest problems in weakly-supervised learning.

DAS3H: Modeling Student Learning and Forgetting for Optimally Scheduling Distributed Practice of Skills

BenoitChoffin/das3h 14 May 2019

In this article, we first frame the research problem of optimizing an adaptive and personalized spaced repetition scheduler when memorization concerns the application of underlying multiple skills.

Searching to Exploit Memorization Effect in Learning from Corrupted Labels

bhanML/Co-teaching 6 Nov 2019

Sample selection approaches are popular in robust learning from noisy labels.