Value prediction
16 papers with code • 1 benchmarks • 0 datasets
Most implemented papers
On the Estimation Bias in Double Q-Learning
Double Q-learning is a classical method for reducing overestimation bias, which is caused by taking maximum estimated values in the Bellman operation.
Learning, Fast and Slow: A Goal-Directed Memory-Based Approach for Dynamic Environments
To address these challenges, we do the following: i) Instead of a neural network, we do model-based planning using a parallel memory retrieval system (which we term the slow mechanism); ii) Instead of learning state values, we guide the agent's actions using goal-directed exploration, by using a neural network to choose the next action given the current state and the goal state (which we term the fast mechanism).
Reinforcement Learning from Passive Data via Latent Intentions
Passive observational data, such as human videos, is abundant and rich in information, yet remains largely untapped by current RL methods.
A Multi-Granularity-Aware Aspect Learning Model for Multi-Aspect Dense Retrieval
Dense retrieval methods have been mostly focused on unstructured text and less attention has been drawn to structured data with various aspects, e. g., products with aspects such as category and brand.
ExtremeCast: Boosting Extreme Value Prediction for Global Weather Forecast
Data-driven weather forecast based on machine learning (ML) has experienced rapid development and demonstrated superior performance in the global medium-range forecast compared to traditional physics-based dynamical models.
WorldValuesBench: A Large-Scale Benchmark Dataset for Multi-Cultural Value Awareness of Language Models
The awareness of multi-cultural human values is critical to the ability of language models (LMs) to generate safe and personalized responses.