24 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?



Most implemented papers

Mastering 2048 with Delayed Temporal Coherence Learning, Multi-Stage Weight Promotion, Redundant Encoding and Carousel Shaping

aszczepanski/2048 18 Apr 2016

With the aim to develop a strong 2048 playing program, we employ temporal difference learning with systematic n-tuple networks.

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

ofirpress/attention_with_linear_biases ICLR 2022

Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question has yet to be answered: how does a model achieve extrapolation at inference time for sequences that are longer than it saw during training?

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

facebookresearch/fairscale ICLR 2021

Neural network scaling has been critical for improving the model quality in many real-world machine learning applications with vast amounts of training data and compute.

ImageNet Training in Minutes

fuentesdt/livermask 14 Sep 2017

If we can make full use of the supercomputer for DNN training, we should be able to finish the 90-epoch ResNet-50 training in one minute.

Distributed Deep Reinforcement Learning: Learn how to play Atari games in 21 minutes

deepsense-ai/Distributed-BA3C 9 Jan 2018

We present a study in Distributed Deep Reinforcement Learning (DDRL) focused on scalability of a state-of-the-art Deep Reinforcement Learning algorithm known as Batch Asynchronous Advantage ActorCritic (BA3C).

Improving Electron Micrograph Signal-to-Noise with an Atrous Convolutional Encoder-Decoder

Jeffrey-Ede/Electron-Micrograph-Denoiser 30 Jul 2018

Our neural network was trained end-to-end to remove Poisson noise applied to low-dose ($\ll$ 300 counts ppx) micrographs created from a new dataset of 17267 2048$\times$2048 high-dose ($>$ 2500 counts ppx) micrographs and then fine-tuned for ordinary doses (200-2500 counts ppx).

Genetic Algorithm-based Polar Code Construction for the AWGN Channel

AhmedElkelesh/Genetic-Algorithm-based-Polar-Code-Construction 19 Jan 2019

We propose a new polar code construction framework (i. e., selecting the frozen bit positions) for the additive white Gaussian noise (AWGN) channel, tailored to a given decoding algorithm, rather than based on the (not necessarily optimal) assumption of successive cancellation (SC) decoding.

Decoder-tailored Polar Code Design Using the Genetic Algorithm

AhmedElkelesh/Genetic-Algorithm-based-Polar-Code-Construction 28 Jan 2019

We propose a new framework for constructing polar codes (i. e., selecting the frozen bit positions) for arbitrary channels, and tailored to a given decoding algorithm, rather than based on the (not necessarily optimal) assumption of successive cancellation (SC) decoding.

Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping

wushidonguc/distributed_ffn 13 May 2019

Mapping all the neurons in the brain requires automatic reconstruction of entire cells from volume electron microscopy data.

MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning


Distributed synchronous stochastic gradient descent has been widely used to train deep neural networks (DNNs) on computer clusters.