Search Machine Learning Repository: Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
Authors: Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li and Robert Schapire
Conference: Proceedings of the 31st International Conference on Machine Learning (ICML-14)
Year: 2014
Pages: 1638-1646
Abstract: We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ \emph{actions} in response to the observed \emph{context}, and observes the \emph{reward} only for that action. Our method assumes access to an oracle for solving fully supervised cost-sensitive classification problems and achieves the statistically optimal regret guarantee with only $\otil(\sqrt{KT})$ oracle calls across all $T$ rounds. By doing so, we obtain the most practical contextual bandit learning algorithm amongst approaches that work for general policy classes. We conduct a proof-of-concept experiment which demonstrates the excellent computational and statistical performance of (an online variant of) our algorithm relative to several strong baselines.
[pdf] [BibTeX]

authors venues years
Suggest Changes to this paper.
Brought to you by the WUSTL Machine Learning Group. We have open faculty positions (tenured and tenure-track).