# 2048

24 papers with code • 1 benchmarks • 1 datasets

## Most implemented papers

# Mastering 2048 with Delayed Temporal Coherence Learning, Multi-Stage Weight Promotion, Redundant Encoding and Carousel Shaping

With the aim to develop a strong 2048 playing program, we employ temporal difference learning with systematic n-tuple networks.

# Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question has yet to be answered: how does a model achieve extrapolation at inference time for sequences that are longer than it saw during training?

# GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Neural network scaling has been critical for improving the model quality in many real-world machine learning applications with vast amounts of training data and compute.

# ImageNet Training in Minutes

If we can make full use of the supercomputer for DNN training, we should be able to finish the 90-epoch ResNet-50 training in one minute.

# Distributed Deep Reinforcement Learning: Learn how to play Atari games in 21 minutes

We present a study in Distributed Deep Reinforcement Learning (DDRL) focused on scalability of a state-of-the-art Deep Reinforcement Learning algorithm known as Batch Asynchronous Advantage ActorCritic (BA3C).

# Improving Electron Micrograph Signal-to-Noise with an Atrous Convolutional Encoder-Decoder

Our neural network was trained end-to-end to remove Poisson noise applied to low-dose ($\ll$ 300 counts ppx) micrographs created from a new dataset of 17267 2048$\times$2048 high-dose ($>$ 2500 counts ppx) micrographs and then fine-tuned for ordinary doses (200-2500 counts ppx).

# Genetic Algorithm-based Polar Code Construction for the AWGN Channel

We propose a new polar code construction framework (i. e., selecting the frozen bit positions) for the additive white Gaussian noise (AWGN) channel, tailored to a given decoding algorithm, rather than based on the (not necessarily optimal) assumption of successive cancellation (SC) decoding.

# Decoder-tailored Polar Code Design Using the Genetic Algorithm

We propose a new framework for constructing polar codes (i. e., selecting the frozen bit positions) for arbitrary channels, and tailored to a given decoding algorithm, rather than based on the (not necessarily optimal) assumption of successive cancellation (SC) decoding.

# Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping

Mapping all the neurons in the brain requires automatic reconstruction of entire cells from volume electron microscopy data.

# MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning

Distributed synchronous stochastic gradient descent has been widely used to train deep neural networks (DNNs) on computer clusters.