Photonic accelerators have recently attracted soaring interest, harnessing the ultimate nature of light for information processing.
In this study, we explore the application of a laser network, acting as a photonic accelerator, to the competitive multi-armed bandit problem.
no code implementations • 3 May 2023 • Honoka Shiratori, Hiroaki Shinkawa, André Röhm, Nicolas Chauvet, Etsuo Segawa, Jonathan Laurent, Guillaume Bachelier, Tomoki Yamagami, Ryoichi Horisaki, Makoto Naruse
Quantum processes can realize conflict-free joint decisions among two agents using the entanglement of photons or quantum interference of orbital angular momentum (OAM).
Quantum walks (QWs) have a property that classical random walks (RWs) do not possess -- the coexistence of linear spreading and localization -- and this property is utilized to implement various kinds of applications.
In recent years, reservoir computing has expanded to new functions such as the autonomous generation of chaotic time series, as well as time series prediction and classification.
In addition, we propose a multi-agent architecture in which agents are indirectly connected through quantum interference of light and quantum principles ensure the conflict-free property of state-action pair selections among agents.
We solve a 512-armed bandit problem online, which is much larger than previous experiments by two orders of magnitude.
Second, to derive the optimal joint selection probability matrix, all players must disclose their probabilistic preferences.
Q-learning is a well-known approach in reinforcement learning that can deal with many states.
In this paper, we propose a method for controlling the chaotic itinerancy in a multi-mode semiconductor laser to solve a machine learning task, known as the multi-armed bandit problem, which is fundamental to reinforcement learning.
Here, we theoretically derive conflict-free joint decision-making that can satisfy the probabilistic preferences of all individual players.
In this study, we demonstrate a theoretical model to account for accelerating decision-making by correlated time sequence.
In recent cross-disciplinary studies involving both optics and computing, single-photon-based decision-making has been demonstrated by utilizing the wave-particle duality of light to solve multi-armed bandit problems.
By exploiting ultrafast and irregular time series generated by lasers with delayed feedback, we have previously demonstrated a scalable algorithm to solve multi-armed bandit (MAB) problems utilizing the time-division multiplexing of laser chaos time series.
Here, we propose a scheme of adaptive model selection in photonic reservoir computing using reinforcement learning.
Decision making is a fundamental capability of living organisms, and has recently been gaining increasing importance in many engineering applications.
Here we utilize chaotic time series generated experimentally by semiconductor lasers for the latent variables of GAN whereby the inherent nature of chaos can be reflected or transformed into the generated output data.
Our proposed model is inspired by the viewpoint that a decision is affected by its local environment, which is referred to as a local reservoir.
The competitive multi-armed bandit (CMAB) problem is related to social issues such as maximizing total social benefits while preserving equality among individuals by overcoming conflicts between individual decisions, which could seriously decrease social benefits.
In this study, we demonstrated a scalable, pipelined principle of resolving the multi-armed bandit problem by introducing time-division multiplexing of chaotically oscillated ultrafast time-series.
Reinforcement learning involves decision making in dynamic and uncertain environments, and constitutes one important element of artificial intelligence (AI).
In a past study, we successfully used the wave-particle duality of single photons to solve the two-armed bandit problem, which constitutes the foundation of reinforcement learning and decision making.
Decision making is a vital function in this age of machine learning and artificial intelligence, yet its physical realization and theoretical fundamentals are still not completely understood.
Our society comprises a collection of such individuals, and the society is expected to maximise the total rewards, while the individuals compete for common rewards.