2 code implementations • 1 Mar 2019 • Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H. Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski
We describe Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models and present a comparison of several model architectures, including a novel architecture that yields the best results in our setting.
Ranked #12 on Atari Games 100k on Atari 100k
1 code implementation • 12 Feb 2021 • Piotr Kozakowski, Mikołaj Pacek, Piotr Miłoś
We present Adaptive Entropy Tree Search (ANTS) - a novel algorithm combining planning and learning in the maximum entropy paradigm.
1 code implementation • 19 Dec 2019 • Piotr Miłoś, Łukasz Kuciński, Konrad Czechowski, Piotr Kozakowski, Maciek Klimek
The former manifests itself through the use of value function, while the latter is powered by a tree search planner.
1 code implementation • NeurIPS Workshop LMCA 2020 • Piotr Kozakowski, Piotr Januszewski, Konrad Czechowski, Łukasz Kuciński, Piotr Miłoś
Planning in large state spaces inevitably needs to balance depth and breadth of the search.
1 code implementation • 25 Sep 2019 • Piotr Miłoś, Łukasz Kuciński, Konrad Czechowski, Piotr Kozakowski, Maciej Klimek
Notably, our method performs well in environments with sparse rewards where standard $TD(1)$ backups fail.
1 code implementation • 12 Feb 2021 • Piotr Kozakowski, Łukasz Kaiser, Henryk Michalewski, Afroz Mohiuddin, Katarzyna Kańska
QWR is an extension of Advantage Weighted Regression (AWR), an off-policy actor-critic algorithm that performs very well on continuous control tasks, also in the offline setting, but has low sample efficiency and struggles with high-dimensional observation spaces.
no code implementations • ICLR 2020 • Łukasz Kaiser, Mohammad Babaeizadeh, Piotr Miłos, Błażej Osiński, Roy H. Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski
We describe Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models and present a comparison of several model architectures, including a novel architecture that yields the best results in our setting.
no code implementations • 25 Sep 2019 • Piotr Kozakowski, Łukasz Kaiser, Afroz Mohiuddin
Concretely, we introduce a forecasting model that, given a hyperparameter schedule (e. g., learning rate, weight decay) and a history of training observations (such as loss and accuracy), predicts how the training will continue.
no code implementations • 11 Oct 2023 • Mikołaj Sacha, Michał Sadowski, Piotr Kozakowski, Ruard van Workum, Stanisław Jastrzębski
Retrosynthesis involves determining a sequence of reactions to synthesize complex molecules from simpler precursors.
no code implementations • 8 Mar 2024 • Łukasz Kuciński, Witold Drzewakowski, Mateusz Olko, Piotr Kozakowski, Łukasz Maziarka, Marta Emilia Nowakowska, Łukasz Kaiser, Piotr Miłoś
Time series methods are of fundamental importance in virtually any field of science that deals with temporally structured data.