Search Results for author: Kenshi Abe

Found 19 papers, 7 papers with code

Filtered Direct Preference Optimization

1 code implementation • 22 Apr 2024 • Tetsuro Morimura, Mitsuki Sakamoto, Yuu Jinnai, Kenshi Abe, Kaito Ariu

This paper addresses the issue of text quality within the preference dataset by focusing on Direct Preference Optimization (DPO), an increasingly adopted reward-model-free RLHF method.

Paper
Code

Regularized Best-of-N Sampling to Mitigate Reward Hacking for Language Model Alignment

1 code implementation • 1 Apr 2024 • Yuu Jinnai, Tetsuro Morimura, Kaito Ariu, Kenshi Abe

Best-of-N (BoN) sampling with a reward model has been shown to be an effective strategy for aligning Large Language Models (LLMs) to human preferences at the time of decoding.

Language Modelling

Paper
Code

Scalable and Provably Fair Exposure Control for Large-Scale Recommender Systems

1 code implementation • 22 Feb 2024 • Riku Togashi, Kenshi Abe, Yuta Saito

Typical recommendation and ranking methods aim to optimize the satisfaction of users, but they are often oblivious to their impact on the items (e. g., products, jobs, news, video) and their providers.

Collaborative Filtering Exposure Fairness +1

Paper
Code

Return-Aligned Decision Transformer

no code implementations • 6 Feb 2024 • Tsunehiko Tanaka, Kenshi Abe, Kaito Ariu, Tetsuro Morimura, Edgar Simo-Serra

Traditional approaches in offline reinforcement learning aim to learn the optimal policy that maximizes the cumulative reward, also known as return.

Paper
Add Code

Learning Fair Division from Bandit Feedback

no code implementations • 15 Nov 2023 • Hakuei Yamada, Junpei Komiyama, Kenshi Abe, Atsushi Iwasaki

This work addresses learning online fair division under uncertainty, where a central planner sequentially allocates items without precise knowledge of agents' values or utilities.

Paper
Add Code

Model-Based Minimum Bayes Risk Decoding

no code implementations • 9 Nov 2023 • Yuu Jinnai, Tetsuro Morimura, Ukyo Honda, Kaito Ariu, Kenshi Abe

MBR decoding selects a hypothesis from a pool of hypotheses that has the least expected risk under a probability model according to a given utility function.

Text Generation

Paper
Add Code

Why Guided Dialog Policy Learning performs well? Understanding the role of adversarial learning and its alternative

no code implementations • 13 Jul 2023 • Sho Shimoyama, Tetsuro Morimura, Kenshi Abe, Toda Takamichi, Yuta Tomomatsu, Masakazu Sugiyama, Asahi Hentona, Yuuki Azuma, Hirotaka Ninomiya

One way to estimate rewards from collected data is to train the reward estimator and dialog policy simultaneously using adversarial learning (AL).

Reinforcement Learning (RL)

Paper
Add Code

Slingshot Perturbation to Learning in Monotone Games

no code implementations • 26 May 2023 • Kenshi Abe, Kaito Ariu, Mitsuki Sakamoto, Atsushi Iwasaki

This paper addresses the problem of learning Nash equilibria in {\it monotone games} where the gradient of the payoff functions is monotone in the strategy profile space, potentially containing additive noise.

Paper
Add Code

Exploration of Unranked Items in Safe Online Learning to Re-Rank

no code implementations • 2 May 2023 • Hiroaki Shiino, Kaito Ariu, Kenshi Abe, Togashi Riku

In this paper, we propose a safe OLTR algorithm that efficiently exchanges one of the items in the current ranking with an item outside the ranking (i. e., an unranked item) to perform exploration.

Learning-To-Rank Safe Exploration

Paper
Add Code

Fair Matrix Factorisation for Large-Scale Recommender Systems

no code implementations • 9 Sep 2022 • Riku Togashi, Kenshi Abe

However, the intrinsic nature of fairness destroys the separability of optimisation subproblems for users and items, which is an essential property of conventional scalable algorithms, such as implicit alternating least squares (iALS).

Collaborative Filtering Fairness +1

Paper
Add Code

Last-Iterate Convergence with Full and Noisy Feedback in Two-Player Zero-Sum Games

1 code implementation • 21 Aug 2022 • Kenshi Abe, Kaito Ariu, Mitsuki Sakamoto, Kentaro Toyoshima, Atsushi Iwasaki

This paper proposes Mutation-Driven Multiplicative Weights Update (M2WU) for learning an equilibrium in two-player zero-sum normal-form games and proves that it exhibits the last-iterate convergence property in both full and noisy feedback settings.

Multi-agent Reinforcement Learning

Paper
Code

Mutation-Driven Follow the Regularized Leader for Last-Iterate Convergence in Zero-Sum Games

1 code implementation • 18 Jun 2022 • Kenshi Abe, Mitsuki Sakamoto, Atsushi Iwasaki

In this study, we consider a variant of the Follow the Regularized Leader (FTRL) dynamics in two-player zero-sum games.

Paper
Code

Policy Gradient Algorithms with Monte-Carlo Tree Search for Non-Markov Decision Processes

no code implementations • 2 Jun 2022 • Tetsuro Morimura, Kazuhiro Ota, Kenshi Abe, Peinan Zhang

However, since the standard MCTS does not have the ability to learn state representation, the size of the tree-search space can be too large to search.

Reinforcement Learning (RL)

Paper
Add Code

Anytime Capacity Expansion in Medical Residency Match by Monte Carlo Tree Search

1 code implementation • 14 Feb 2022 • Kenshi Abe, Junpei Komiyama, Atsushi Iwasaki

Constructing a good search tree representation significantly boosts the performance of the proposed method.

Paper
Code

A Practical Guide of Off-Policy Evaluation for Bandit Problems

no code implementations • 23 Oct 2020 • Masahiro Kato, Kenshi Abe, Kaito Ariu, Shota Yasui

Based on the properties of the evaluation policy, we categorize OPE situations.

Off-policy evaluation

Paper
Add Code

Thresholded Lasso Bandit

1 code implementation • 22 Oct 2020 • Kaito Ariu, Kenshi Abe, Alexandre Proutière

In this paper, we revisit the regret minimization problem in sparse stochastic contextual linear bandits, where feature vectors may be of large dimension $d$, but where the reward function depends on a few, say $s_0\ll d$, of these features only.

Paper
Code

Mean-Variance Efficient Reinforcement Learning by Expected Quadratic Utility Maximization

no code implementations • 3 Oct 2020 • Masahiro Kato, Kei Nakagawa, Kenshi Abe, Tetsuro Morimura

To achieve this purpose, we train an agent to maximize the expected quadratic utility function, a common objective of risk management in finance and economics.

Decision Making Decision Making Under Uncertainty +3

Paper
Add Code

Off-Policy Exploitability-Evaluation in Two-Player Zero-Sum Markov Games

no code implementations • 4 Jul 2020 • Kenshi Abe, Yusuke Kaneko

The proposed estimators project exploitability that is often used as a metric for determining how close a policy profile (i. e., a tuple of policies) is to a Nash equilibrium in two-player zero-sum games.

Off-policy evaluation Vocal Bursts Valence Prediction

Paper
Add Code

A Simple Heuristic for Bayesian Optimization with A Low Budget

no code implementations • 18 Nov 2019 • Masahiro Nomura, Kenshi Abe

The aim of black-box optimization is to optimize an objective function within the constraints of a given evaluation budget.

Bayesian Optimization Hyperparameter Optimization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.