Playing the Game of 2048
24 papers with code • 1 benchmarks • 1 datasets
Most implemented papers
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question has yet to be answered: how does a model achieve extrapolation at inference time for sequences that are longer than it saw during training?
Mastering 2048 with Delayed Temporal Coherence Learning, Multi-Stage Weight Promotion, Redundant Encoding and Carousel Shaping
With the aim to develop a strong 2048 playing program, we employ temporal difference learning with systematic n-tuple networks.
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Neural network scaling has been critical for improving the model quality in many real-world machine learning applications with vast amounts of training data and compute.
Long Short-Term Transformer for Online Action Detection
We present Long Short-term TRansformer (LSTR), a temporal modeling algorithm for online action detection, which employs a long- and short-term memory mechanism to model prolonged sequence data.
Planning in Stochastic Environments with a Learned Model
However, previous instantiations of this approach were limited to the use of deterministic models.
ImageNet Training in Minutes
If we can make full use of the supercomputer for DNN training, we should be able to finish the 90-epoch ResNet-50 training in one minute.
Distributed Deep Reinforcement Learning: Learn how to play Atari games in 21 minutes
We present a study in Distributed Deep Reinforcement Learning (DDRL) focused on scalability of a state-of-the-art Deep Reinforcement Learning algorithm known as Batch Asynchronous Advantage ActorCritic (BA3C).
Improving Electron Micrograph Signal-to-Noise with an Atrous Convolutional Encoder-Decoder
Our neural network was trained end-to-end to remove Poisson noise applied to low-dose ($\ll$ 300 counts ppx) micrographs created from a new dataset of 17267 2048$\times$2048 high-dose ($>$ 2500 counts ppx) micrographs and then fine-tuned for ordinary doses (200-2500 counts ppx).
Genetic Algorithm-based Polar Code Construction for the AWGN Channel
We propose a new polar code construction framework (i. e., selecting the frozen bit positions) for the additive white Gaussian noise (AWGN) channel, tailored to a given decoding algorithm, rather than based on the (not necessarily optimal) assumption of successive cancellation (SC) decoding.
Decoder-tailored Polar Code Design Using the Genetic Algorithm
We propose a new framework for constructing polar codes (i. e., selecting the frozen bit positions) for arbitrary channels, and tailored to a given decoding algorithm, rather than based on the (not necessarily optimal) assumption of successive cancellation (SC) decoding.