TASK |
DATASET |
MODEL |
METRIC NAME |
METRIC VALUE |
GLOBAL RANK |
REMOVE |
Atari Games
|
Atari 2600 Gravitar
|
RND
|
Score
|
3906
|
# 11
|
|
Atari Games
|
Atari 2600 Montezuma's Revenge
|
RND
|
Score
|
8152
|
# 5
|
|
Atari Games
|
Atari 2600 Pitfall!
|
RND
|
Score
|
-3
|
# 20
|
|
Atari Games
|
Atari 2600 Private Eye
|
RND
|
Score
|
8666
|
# 11
|
|
Atari Games
|
Atari 2600 Solaris
|
RND
|
Score
|
3282
|
# 17
|
|
Atari Games
|
Atari 2600 Venture
|
RND
|
Score
|
1859
|
# 9
|
|
Unsupervised Reinforcement Learning
|
URLB (pixels, 10^5 frames)
|
RND
|
Walker (mean normalized return)
|
23.87±10.21
|
# 3
|
|
Unsupervised Reinforcement Learning
|
URLB (pixels, 10^5 frames)
|
RND
|
Quadruped (mean normalized return)
|
24.37±8.70
|
# 4
|
|
Unsupervised Reinforcement Learning
|
URLB (pixels, 10^5 frames)
|
RND
|
Jaco (mean normalized return)
|
26.22±4.83
|
# 1
|
|
Unsupervised Reinforcement Learning
|
URLB (pixels, 10^5 frames)
|
APT
|
Walker (mean normalized return)
|
7.71±7.39
|
# 7
|
|
Unsupervised Reinforcement Learning
|
URLB (pixels, 10^5 frames)
|
APT
|
Quadruped (mean normalized return)
|
21.22±5.14
|
# 7
|
|
Unsupervised Reinforcement Learning
|
URLB (pixels, 10^5 frames)
|
APT
|
Jaco (mean normalized return)
|
0.37±0.64
|
# 9
|
|
Unsupervised Reinforcement Learning
|
URLB (pixels, 10^6 frames)
|
RND
|
Walker (mean normalized return)
|
30.46±14.18
|
# 4
|
|
Unsupervised Reinforcement Learning
|
URLB (pixels, 10^6 frames)
|
RND
|
Quadruped (mean normalized return)
|
41.89±11.72
|
# 1
|
|
Unsupervised Reinforcement Learning
|
URLB (pixels, 10^6 frames)
|
RND
|
Jaco (mean normalized return)
|
24.38±3.92
|
# 3
|
|
Unsupervised Reinforcement Learning
|
URLB (pixels, 2*10^6 frames)
|
RND
|
Walker (mean normalized return)
|
32.80±13.19
|
# 3
|
|
Unsupervised Reinforcement Learning
|
URLB (pixels, 2*10^6 frames)
|
RND
|
Quadruped (mean normalized return)
|
42.57±11.65
|
# 2
|
|
Unsupervised Reinforcement Learning
|
URLB (pixels, 2*10^6 frames)
|
RND
|
Jaco (mean normalized return)
|
27.51±7.12
|
# 4
|
|
Unsupervised Reinforcement Learning
|
URLB (pixels, 5*10^5 frames)
|
RND
|
Walker (mean normalized return)
|
25.44±9.92
|
# 4
|
|
Unsupervised Reinforcement Learning
|
URLB (pixels, 5*10^5 frames)
|
RND
|
Quadruped (mean normalized return)
|
36.02±10.27
|
# 1
|
|
Unsupervised Reinforcement Learning
|
URLB (pixels, 5*10^5 frames)
|
RND
|
Jaco (mean normalized return)
|
26.62±2.75
|
# 2
|
|
Unsupervised Reinforcement Learning
|
URLB (states, 10^5 frames)
|
RND
|
Walker (mean normalized return)
|
82.57±31.22
|
# 2
|
|
Unsupervised Reinforcement Learning
|
URLB (states, 10^5 frames)
|
RND
|
Quadruped (mean normalized return)
|
35.34±11.16
|
# 3
|
|
Unsupervised Reinforcement Learning
|
URLB (states, 10^5 frames)
|
RND
|
Jaco (mean normalized return)
|
72.84±6.87
|
# 2
|
|
Unsupervised Reinforcement Learning
|
URLB (states, 10^6 frames)
|
RND
|
Walker (mean normalized return)
|
84.93±29.64
|
# 1
|
|
Unsupervised Reinforcement Learning
|
URLB (states, 10^6 frames)
|
RND
|
Quadruped (mean normalized return)
|
69.12±11.95
|
# 2
|
|
Unsupervised Reinforcement Learning
|
URLB (states, 10^6 frames)
|
RND
|
Jaco (mean normalized return)
|
60.68±8.49
|
# 4
|
|
Unsupervised Reinforcement Learning
|
URLB (states, 2*10^6 frames)
|
RND
|
Walker (mean normalized return)
|
79.28±30.91
|
# 2
|
|
Unsupervised Reinforcement Learning
|
URLB (states, 2*10^6 frames)
|
RND
|
Quadruped (mean normalized return)
|
75.14±16.23
|
# 3
|
|
Unsupervised Reinforcement Learning
|
URLB (states, 2*10^6 frames)
|
RND
|
Jaco (mean normalized return)
|
56.05±8.73
|
# 5
|
|
Unsupervised Reinforcement Learning
|
URLB (states, 5*10^5 frames)
|
RND
|
Walker (mean normalized return)
|
87.15±27.65
|
# 1
|
|
Unsupervised Reinforcement Learning
|
URLB (states, 5*10^5 frames)
|
RND
|
Quadruped (mean normalized return)
|
59.90±12.95
|
# 1
|
|
Unsupervised Reinforcement Learning
|
URLB (states, 5*10^5 frames)
|
RND
|
Jaco (mean normalized return)
|
65.08±5.45
|
# 3
|
|