Search Machine Learning Repository: Prediction with Limited Advice and Multiarmed Bandits with Paid Observations
Authors: Yevgeny Seldin, Peter Bartlett, Koby Crammer and Yasin Abbasi-yadkori
Conference: Proceedings of the 31st International Conference on Machine Learning (ICML-14)
Year: 2014
Pages: 280-287
Abstract: We study two problems of online learning under restricted information access. In the first problem, \emph{prediction with limited advice}, we consider a game of prediction with expert advice, where on each round of the game we query the advice of a subset of $M$ out of $N$ experts. We present an algorithm that achieves $O(\sqrt{(N/M)T\ln N})$ regret on $T$ rounds of this game. The second problem, the \emph{multiarmed bandit with paid observations}, is a variant of the adversarial $N$-armed bandit game, where on round $t$ of the game we can observe the reward of any number of arms, but each observation has a cost $c$. We present an algorithm that achieves $O((cN\ln N)^{1/3} T^{2/3} + \sqrt{T \ln N})$ regret on $T$ rounds of this game in the worst case. Furthermore, we present a number of refinements that treat arm- and time-dependent observation costs and achieve lower regret under benign conditions. We present lower bounds that show that, apart from the logarithmic factors, the worst-case regret bounds cannot be improved.
[pdf] [BibTeX]

authors venues years
Suggest Changes to this paper.
Brought to you by the WUSTL Machine Learning Group. We have open faculty positions (tenured and tenure-track).