Search Results for author: Bilal Kartal

Found 11 papers, 1 papers with code

Llama-Nemotron: Efficient Reasoning Models

no code implementations2 May 2025 Akhiad Bercovich, Itay Levy, Izik Golan, Mohammad Dabbah, Ran El-Yaniv, Omri Puny, Ido Galil, Zach Moshe, Tomer Ronen, Najeeb Nabwani, Ido Shahaf, Oren Tropp, Ehud Karpas, Ran Zilberstein, Jiaqi Zeng, Soumye Singhal, Alexander Bukharin, Yian Zhang, Tugrul Konuk, Gerald Shen, Ameya Sunil Mahabaleshwarkar, Bilal Kartal, Yoshi Suhara, Olivier Delalleau, Zijia Chen, Zhilin Wang, David Mosallanezhad, Adi Renduchintala, Haifeng Qian, Dima Rekesh, Fei Jia, Somshubra Majumdar, Vahid Noroozi, Wasi Uddin Ahmad, Sean Narenthiran, Aleksander Ficek, Mehrzad Samadi, Jocelyn Huang, Siddhartha Jain, Igor Gitman, Ivan Moshkov, Wei Du, Shubham Toshniwal, George Armstrong, Branislav Kisacanin, Matvei Novikov, Daria Gitman, Evelina Bakhturina, Jane Polak Scowcroft, John Kamalu, Dan Su, Kezhi Kong, Markus Kliegl, Rabeeh Karimi, Ying Lin, Sanjeev Satheesh, Jupinder Parmar, Pritam Gundecha, Brandon Norick, Joseph Jennings, Shrimai Prabhumoye, Syeda Nahida Akter, Mostofa Patwary, Abhinav Khattar, Deepak Narayanan, Roger Waleffe, Jimmy Zhang, Bor-Yiing Su, Guyue Huang, Terry Kong, Parth Chadha, Sahil Jain, Christine Harvey, Elad Segal, Jining Huang, Sergey Kashirsky, Robert McQueen, Izzy Putterman, George Lam, Arun Venkatesan, Sherry Wu, Vinh Nguyen, Manoj Kilaru, Andrew Wang, Anna Warno, Abhilash Somasamudramath, Sandip Bhaskar, Maka Dong, Nave Assaf, Shahar Mor, Omer Ullman Argov, Scot Junkin, Oleksandr Romanenko, Pedro Larroy, Marco Rovinelli, Viji Balas, Nicholas Edelman, Anahita Bhiwandiwalla, Muthu Subramaniam, Smita Ithape, Karthik Ramamoorthy, Yuting Wu, Suguna Varshini Velury, Omri Almog, Joyjit Daw, Denys Fridman, Erick Galinkin, Michael Evans, Shaona Ghosh, Katherine Luna, Leon Derczynski, Nikki Pope, Eileen Long, Seth Schneider, Guillermo Siman, Tomasz Grzegorzek, Pablo Ribalta, Monika Katariya, Chris Alexiuk, Joey Conway, Trisha Saar, Ann Guan, Krzysztof Pawelec, Shyamala Prayaga, Oleksii Kuchaiev, Boris Ginsburg, Oluwatobi Olabiyi, Kari Briski, Jonathan Cohen, Bryan Catanzaro, Jonah Alben, Yonatan Geifman, Eric Chung

We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use.

Knowledge Distillation Neural Architecture Search

Work in Progress: Temporally Extended Auxiliary Tasks

no code implementations1 Apr 2020 Craig Sherstan, Bilal Kartal, Pablo Hernandez-Leal, Matthew E. Taylor

Our overall conclusions are that TD-AE increases the robustness of the A2C algorithm to the trajectory length and while promising, further study is required to fully understand the relationship between auxiliary task prediction timescale and the agent's performance.

Prediction Reinforcement Learning

On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman

no code implementations26 Jul 2019 Chao Gao, Bilal Kartal, Pablo Hernandez-Leal, Matthew E. Taylor

In this paper, we illuminate reasons behind this failure by providing a thorough analysis on the hardness of random exploration in Pommerman.

reinforcement-learning Reinforcement Learning +1

Action Guidance with MCTS for Deep Reinforcement Learning

no code implementations25 Jul 2019 Bilal Kartal, Pablo Hernandez-Leal, Matthew E. Taylor

Deep reinforcement learning has achieved great successes in recent years, however, one main challenge is the sample inefficiency.

Deep Reinforcement Learning reinforcement-learning +1

Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning

no code implementations24 Jul 2019 Bilal Kartal, Pablo Hernandez-Leal, Matthew E. Taylor

Deep reinforcement learning has achieved great successes in recent years, but there are still open challenges, such as convergence to locally optimal policies and sample inefficiency.

Atari Games Deep Reinforcement Learning +3

Agent Modeling as Auxiliary Task for Deep Reinforcement Learning

no code implementations22 Jul 2019 Pablo Hernandez-Leal, Bilal Kartal, Matthew E. Taylor

In this paper we explore how actor-critic methods in deep reinforcement learning, in particular Asynchronous Advantage Actor-Critic (A3C), can be extended with agent modeling.

Deep Reinforcement Learning reinforcement-learning +2

Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition

1 code implementation20 Apr 2019 Chao Gao, Pablo Hernandez-Leal, Bilal Kartal, Matthew E. Taylor

The Pommerman Team Environment is a recently proposed benchmark which involves a multi-agent domain with challenges such as partial observability, decentralized execution (without communication), and very sparse and delayed rewards.

Deep Reinforcement Learning Reinforcement Learning (RL)

Safer Deep RL with Shallow MCTS: A Case Study in Pommerman

no code implementations10 Apr 2019 Bilal Kartal, Pablo Hernandez-Leal, Chao GAO, Matthew E. Taylor

In this paper, we shed light into the reasons behind this failure by exemplifying and analyzing the high rate of catastrophic events (i. e., suicides) that happen under random exploration in this domain.

Deep Reinforcement Learning reinforcement-learning +2

Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL

no code implementations30 Nov 2018 Bilal Kartal, Pablo Hernandez-Leal, Matthew E. Taylor

Deep reinforcement learning (DRL) has achieved great successes in recent years with the help of novel methods and higher compute power.

Deep Reinforcement Learning reinforcement-learning +1

A Survey and Critique of Multiagent Deep Reinforcement Learning

no code implementations12 Oct 2018 Pablo Hernandez-Leal, Bilal Kartal, Matthew E. Taylor

The primary goal of this article is to provide a clear overview of current multiagent deep reinforcement learning (MDRL) literature.

Deep Reinforcement Learning reinforcement-learning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.