Search Results for author: Markel Sanz Ausin

Found 4 papers, 1 papers with code

NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment

1 code implementation2 May 2024 Gerald Shen, Zhilin Wang, Olivier Delalleau, Jiaqi Zeng, Yi Dong, Daniel Egert, Shengyang Sun, Jimmy Zhang, Sahil Jain, Ali Taghibakhshi, Markel Sanz Ausin, Ashwath Aithal, Oleksii Kuchaiev

However, building efficient tools to perform alignment can be challenging, especially for the largest and most competent LLMs which often contain tens or hundreds of billions of parameters.

HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare

no code implementations18 Feb 2023 Ge Gao, Song Ju, Markel Sanz Ausin, Min Chi

Reinforcement learning (RL) has been extensively researched for enhancing human-environment interactions in various human-centric tasks, including e-learning and healthcare.

Off-policy evaluation

InferNet for Delayed Reinforcement Tasks: Addressing the Temporal Credit Assignment Problem

no code implementations2 May 2021 Markel Sanz Ausin, Hamoon Azizsoltani, Song Ju, Yeo Jin Kim, Min Chi

Overall, our results show that the effectiveness of InferNet is robust against noisy reward functions and is an effective add-on mechanism for solving temporal CAP in a wide range of RL tasks, from classic RL simulation environments to a real-world RL problem and for both online and offline learning.

Atari Games Offline RL +1

Cannot find the paper you are looking for? You can Submit a new open access paper.