Value prediction

16 papers with code • 1 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

On the Estimation Bias in Double Q-Learning

stilwell-git/doubly-bounded-q-learning NeurIPS 2021

Double Q-learning is a classical method for reducing overestimation bias, which is caused by taking maximum estimated values in the Bellman operation.

Learning, Fast and Slow: A Goal-Directed Memory-Based Approach for Dynamic Environments

tanchongmin/learning-fast-and-slow 31 Jan 2023

To address these challenges, we do the following: i) Instead of a neural network, we do model-based planning using a parallel memory retrieval system (which we term the slow mechanism); ii) Instead of learning state values, we guide the agent's actions using goal-directed exploration, by using a neural network to choose the next action given the current state and the goal state (which we term the fast mechanism).

Reinforcement Learning from Passive Data via Latent Intentions

dibyaghosh/icvf_release 10 Apr 2023

Passive observational data, such as human videos, is abundant and rich in information, yet remains largely untapped by current RL methods.

A Multi-Granularity-Aware Aspect Learning Model for Multi-Aspect Dense Retrieval

sunxiaojie99/mural 5 Dec 2023

Dense retrieval methods have been mostly focused on unstructured text and less attention has been drawn to structured data with various aspects, e. g., products with aspects such as category and brand.

ExtremeCast: Boosting Extreme Value Prediction for Global Weather Forecast

black-yt/ExtremeCast 2 Feb 2024

Data-driven weather forecast based on machine learning (ML) has experienced rapid development and demonstrated superior performance in the global medium-range forecast compared to traditional physics-based dynamical models.

WorldValuesBench: A Large-Scale Benchmark Dataset for Multi-Cultural Value Awareness of Language Models

demon702/worldvaluesbench 25 Apr 2024

The awareness of multi-cultural human values is critical to the ability of language models (LMs) to generate safe and personalized responses.