Bounded Regret for Finitely Parameterized Multi-Armed Bandits

abstract

We consider the problem of finitely parameterized multi-armed bandits where the model of the underlying stochastic environment can be characterized based on a common unknown parameter. The true parameter is unknown to the learning agent. However, the set of possible parameters, which is finite, is known a priori. We propose an algorithm that is simple and easy to implement, which we call Finitely Parameterized Upper Confidence Bound (FP-UCB) algorithm, which uses the information about the underlying parameter set for faster learning. In particular, we show that the FP-UCB algorithm achieves a bounded regret under some structural condition on the underlying parameter set. We also show that, if the underlying parameter set does not satisfy the necessary structural condition, the FP-UCB algorithm achieves a logarithmic regret, but with a smaller preceding constant compared to the standard UCB algorithm. We also validate the superior performance of the FP-UCB algorithm through extensive numerical simulations.

authors

Kalathil, Dileep

published proceedings

IEEE CONTROL SYSTEMS LETTERS

altmetric score

1.75

author list (cited authors)

Panaganti, K., & Kalathil, D.

citation count

0

complete list of authors

Panaganti, Kishan||Kalathil, Dileep

publication date

July 2021

publisher

Institute of Electrical and Electronics Engineers (IEEE) Publisher

published in

IEEE Control Systems Letters Journal

keywords

Multi-armed Bandits
Online Learning
Reinforcement Learning

Digital Object Identifier (DOI)

10.1109/LCSYS.2020.3008798

start page

1073

end page

1078

volume

5

issue

3

URL

http://dx.doi.org/10.1109/lcsys.2020.3008798

Bounded Regret for Finitely Parameterized Multi-Armed Bandits Academic Article

Overview

abstract

authors

published proceedings

altmetric score

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL