Pushing the bounds of dropout

We show that dropout training is best understood as performing MAP estimation concurrently for a family of conditional models whose objectives are themselves lower bounded by the original dropout objective. This discovery allows us to pick any model from this family after training, which leads to a substantial improvement on regularisation-heavy language modelling... (read more)

Results in Papers With Code
(↓ scroll down to see all results)