NetHack
16 papers with code • 0 benchmarks • 0 datasets
Mean in-game score over 1000 episodes with random seeds not seen during training. See https://arxiv.org/abs/2006.13760 (Section 2.4 Evaluation Protocol) for details.
Benchmarks
These leaderboards are used to track progress in NetHack
Libraries
Use these libraries to find NetHack models and implementationsLatest papers
Playing NetHack with LLMs: Potential & Limitations as Zero-Shot Agents
In contrast, agents tested in dynamic robot environments face limitations due to simplistic environments with only a few objects and interactions.
Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning
Either they are too slow for meaningful research to be performed without enormous computational resources, like Crafter, NetHack and Minecraft, or they are not complex enough to pose a significant challenge, like Minigrid and Procgen.
diff History for Neural Language Agents
On NetHack, an unsolved video game that requires long-horizon reasoning for decision-making, LMs tuned with diff history match state-of-the-art performance for neural agents while needing 1800x fewer training examples compared to prior work.
Motif: Intrinsic Motivation from Artificial Intelligence Feedback
Exploring rich environments and evaluating one's actions without prior knowledge is immensely challenging.
LuckyMera: a Modular AI Framework for Building Hybrid NetHack Agents
In the last few decades we have witnessed a significant development in Artificial Intelligence (AI) thanks to the availability of a variety of testbeds, mostly based on simulated environments and video games.
Katakomba: Tools and Benchmarks for Data-Driven NetHack
NetHack is known as the frontier of reinforcement learning research where learning-based methods still need to catch up to rule-based solutions.
Dungeons and Data: A Large-Scale NetHack Dataset
Recent breakthroughs in the development of agents to solve challenging sequential decision making problems such as Go, StarCraft, or DOTA, have relied on both simulated environments and large-scale datasets.
Improving Policy Learning via Language Dynamics Distillation
Recent work has shown that augmenting environments with language descriptions improves policy learning.
Hierarchical Kickstarting for Skill Transfer in Reinforcement Learning
In this paper, we investigate how skills can be incorporated into the training of reinforcement learning (RL) agents in complex environments with large state-action spaces and sparse rewards.
Insights From the NeurIPS 2021 NetHack Challenge
In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge.