Open Access. Powered by Scholars. Published by Universities.®

1990

Dynamic programming

Articles 1 - 1 of 1

Full-Text Articles in Numerical Analysis and Computation

On The Optimal Reward Function Of The Continuous Time Multiarmed Bandit Problem, José Luis Menaldi, Maurice Robin Jan 1990

On The Optimal Reward Function Of The Continuous Time Multiarmed Bandit Problem, José Luis Menaldi, Maurice Robin

Mathematics Faculty Research Publications

The optimal reward function associated with the so-called "multiarmed bandit problem" for general Markov-Feller processes is considered. It is shown that this optimal reward function has a simple expression (product form) in terms of individual stopping problems, without any smoothness properties of the optimal reward function neither for the global problem nor for the individual stopping problems. Some results relative to a related problem with switching cost are obtained.