Search Results for author: Tetsuro Morimura

Found 15 papers, 4 papers with code

Regularized Best-of-N Sampling to Mitigate Reward Hacking for Language Model Alignment

1 code implementation • 1 Apr 2024 • Yuu Jinnai, Tetsuro Morimura, Kaito Ariu, Kenshi Abe

Best-of-N (BoN) sampling with a reward model has been shown to be an effective strategy for aligning Large Language Models (LLMs) to human preferences at the time of decoding.

Language Modelling

Paper
Code

On the True Distribution Approximation of Minimum Bayes-Risk Decoding

1 code implementation • 31 Mar 2024 • Atsumoto Ohashi, Ukyo Honda, Tetsuro Morimura, Yuu Jinnai

Minimum Bayes-risk (MBR) decoding has recently gained renewed attention in text generation.

Anomaly Detection Text Generation

Paper
Code

Return-Aligned Decision Transformer

no code implementations • 6 Feb 2024 • Tsunehiko Tanaka, Kenshi Abe, Kaito Ariu, Tetsuro Morimura, Edgar Simo-Serra

Traditional approaches in offline reinforcement learning aim to learn the optimal policy that maximizes the cumulative reward, also known as return.

Paper
Add Code

Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding

1 code implementation • 10 Jan 2024 • Yuu Jinnai, Ukyo Honda, Tetsuro Morimura, Peinan Zhang

We propose two variants of MBR, Diverse MBR (DMBR) and $k$-medoids MBR (KMBR), methods to generate a set of sentences with high quality and diversity.

Language Modelling Large Language Model +1

Paper
Code

Model-Based Minimum Bayes Risk Decoding

no code implementations • 9 Nov 2023 • Yuu Jinnai, Tetsuro Morimura, Ukyo Honda, Kaito Ariu, Kenshi Abe

MBR decoding selects a hypothesis from a pool of hypotheses that has the least expected risk under a probability model according to a given utility function.

Text Generation

Paper
Add Code

Policy Gradient with Kernel Quadrature

no code implementations • 23 Oct 2023 • Satoshi Hayakawa, Tetsuro Morimura

Reward evaluation of episodes becomes a bottleneck in a broad range of reinforcement learning tasks.

Causal Discovery

Paper
Add Code

On the Depth between Beam Search and Exhaustive Search for Text Generation

no code implementations • 25 Aug 2023 • Yuu Jinnai, Tetsuro Morimura, Ukyo Honda

To this end, we introduce Lookahead Beam Search (LBS), a multi-step lookahead search that optimizes the objective considering a fixed number of future steps.

Machine Translation Text Generation +2

Paper
Add Code

Why Guided Dialog Policy Learning performs well? Understanding the role of adversarial learning and its alternative

no code implementations • 13 Jul 2023 • Sho Shimoyama, Tetsuro Morimura, Kenshi Abe, Toda Takamichi, Yuta Tomomatsu, Masakazu Sugiyama, Asahi Hentona, Yuuki Azuma, Hirotaka Ninomiya

One way to estimate rewards from collected data is to train the reward estimator and dialog policy simultaneously using adversarial learning (AL).

Reinforcement Learning (RL)

Paper
Add Code

Safe Collaborative Filtering

1 code implementation • 8 Jun 2023 • Riku Togashi, Tatsushi Oka, Naoto Ohsaka, Tetsuro Morimura

Excellent tail performance is crucial for modern machine learning tasks, such as algorithmic fairness, class imbalance, and risk-sensitive decision making, as it ensures the effective handling of challenging samples within a dataset.

Collaborative Filtering Computational Efficiency +3

Paper
Code

Policy Gradient Algorithms with Monte-Carlo Tree Search for Non-Markov Decision Processes

no code implementations • 2 Jun 2022 • Tetsuro Morimura, Kazuhiro Ota, Kenshi Abe, Peinan Zhang

However, since the standard MCTS does not have the ability to learn state representation, the size of the tree-search space can be too large to search.

Reinforcement Learning (RL)

Paper
Add Code

Mean-Variance Efficient Reinforcement Learning by Expected Quadratic Utility Maximization

no code implementations • 3 Oct 2020 • Masahiro Kato, Kei Nakagawa, Kenshi Abe, Tetsuro Morimura

To achieve this purpose, we train an agent to maximize the expected quadratic utility function, a common objective of risk management in finance and economics.

Decision Making Decision Making Under Uncertainty +3

Paper
Add Code

Visual analytics for team-based invasion sports with significant events and Markov reward process

no code implementations • 2 Jul 2019 • Kun Zhao, Takayuki Osogami, Tetsuro Morimura

To solve this problem, we consider a whole match as a Markov chain of significant events, so that event values can be estimated with a continuous parameter space by solving the Markov chain with a machine learning model.

Paper
Add Code

Sampler for Composition Ratio by Markov Chain Monte Carlo

no code implementations • 16 Jun 2019 • Yachiko Obara, Tetsuro Morimura, Hiroki Yanagisawa

The key points of our approach are (1) designing an appropriate target distribution by using a condition on the number of nonzero elements, and (2) changing values only between a certain pair of elements in each iteration.

Paper
Add Code

Solving inverse problem of Markov chain with partial observations

no code implementations • NeurIPS 2013 • Tetsuro Morimura, Takayuki Osogami, Tsuyoshi Ide

The Markov chain is a convenient tool to represent the dynamics of complex systems such as traffic and social systems, where probabilistic transition takes place between internal states.

Paper
Add Code

A Generalized Natural Actor-Critic Algorithm

no code implementations • NeurIPS 2009 • Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto, Kenji Doya

In this paper, we describe a generalized Natural Gradient (gNG) by linearly interpolating the two FIMs and propose an efficient implementation for the gNG learning based on a theory of the estimating function, generalized Natural Actor-Critic (gNAC).

Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.