Search Results for author: Tetsuro Morimura

Found 15 papers, 4 papers with code

Regularized Best-of-N Sampling to Mitigate Reward Hacking for Language Model Alignment

1 code implementation1 Apr 2024 Yuu Jinnai, Tetsuro Morimura, Kaito Ariu, Kenshi Abe

Best-of-N (BoN) sampling with a reward model has been shown to be an effective strategy for aligning Large Language Models (LLMs) to human preferences at the time of decoding.

Language Modelling

Return-Aligned Decision Transformer

no code implementations6 Feb 2024 Tsunehiko Tanaka, Kenshi Abe, Kaito Ariu, Tetsuro Morimura, Edgar Simo-Serra

Traditional approaches in offline reinforcement learning aim to learn the optimal policy that maximizes the cumulative reward, also known as return.

Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding

1 code implementation10 Jan 2024 Yuu Jinnai, Ukyo Honda, Tetsuro Morimura, Peinan Zhang

We propose two variants of MBR, Diverse MBR (DMBR) and $k$-medoids MBR (KMBR), methods to generate a set of sentences with high quality and diversity.

Language Modelling Large Language Model +1

Model-Based Minimum Bayes Risk Decoding

no code implementations9 Nov 2023 Yuu Jinnai, Tetsuro Morimura, Ukyo Honda, Kaito Ariu, Kenshi Abe

MBR decoding selects a hypothesis from a pool of hypotheses that has the least expected risk under a probability model according to a given utility function.

Text Generation

Policy Gradient with Kernel Quadrature

no code implementations23 Oct 2023 Satoshi Hayakawa, Tetsuro Morimura

Reward evaluation of episodes becomes a bottleneck in a broad range of reinforcement learning tasks.

Causal Discovery

On the Depth between Beam Search and Exhaustive Search for Text Generation

no code implementations25 Aug 2023 Yuu Jinnai, Tetsuro Morimura, Ukyo Honda

To this end, we introduce Lookahead Beam Search (LBS), a multi-step lookahead search that optimizes the objective considering a fixed number of future steps.

Machine Translation Text Generation +2

Safe Collaborative Filtering

1 code implementation8 Jun 2023 Riku Togashi, Tatsushi Oka, Naoto Ohsaka, Tetsuro Morimura

Excellent tail performance is crucial for modern machine learning tasks, such as algorithmic fairness, class imbalance, and risk-sensitive decision making, as it ensures the effective handling of challenging samples within a dataset.

Collaborative Filtering Computational Efficiency +3

Policy Gradient Algorithms with Monte-Carlo Tree Search for Non-Markov Decision Processes

no code implementations2 Jun 2022 Tetsuro Morimura, Kazuhiro Ota, Kenshi Abe, Peinan Zhang

However, since the standard MCTS does not have the ability to learn state representation, the size of the tree-search space can be too large to search.

Reinforcement Learning (RL)

Mean-Variance Efficient Reinforcement Learning by Expected Quadratic Utility Maximization

no code implementations3 Oct 2020 Masahiro Kato, Kei Nakagawa, Kenshi Abe, Tetsuro Morimura

To achieve this purpose, we train an agent to maximize the expected quadratic utility function, a common objective of risk management in finance and economics.

Decision Making Decision Making Under Uncertainty +3

Visual analytics for team-based invasion sports with significant events and Markov reward process

no code implementations2 Jul 2019 Kun Zhao, Takayuki Osogami, Tetsuro Morimura

To solve this problem, we consider a whole match as a Markov chain of significant events, so that event values can be estimated with a continuous parameter space by solving the Markov chain with a machine learning model.

Sampler for Composition Ratio by Markov Chain Monte Carlo

no code implementations16 Jun 2019 Yachiko Obara, Tetsuro Morimura, Hiroki Yanagisawa

The key points of our approach are (1) designing an appropriate target distribution by using a condition on the number of nonzero elements, and (2) changing values only between a certain pair of elements in each iteration.

Solving inverse problem of Markov chain with partial observations

no code implementations NeurIPS 2013 Tetsuro Morimura, Takayuki Osogami, Tsuyoshi Ide

The Markov chain is a convenient tool to represent the dynamics of complex systems such as traffic and social systems, where probabilistic transition takes place between internal states.

A Generalized Natural Actor-Critic Algorithm

no code implementations NeurIPS 2009 Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto, Kenji Doya

In this paper, we describe a generalized Natural Gradient (gNG) by linearly interpolating the two FIMs and propose an efficient implementation for the gNG learning based on a theory of the estimating function, generalized Natural Actor-Critic (gNAC).

Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.