Search Machine Learning Repository: Better Rates for Any Adversarial Deterministic MDP
Authors: Ofer Dekel and Elad Hazan
Conference: Proceedings of the 30th International Conference on Machine Learning (ICML-13)
Year: 2013
Pages: 675-683
Abstract: We consider regret minimization in adversarial deterministic Markov Decision Processes (ADMDPs) with bandit feedback. We devise a new algorithm that pushes the state-of-the-art forward in two ways: First, it attains a regret of $O(T^{2/3})$ with respect to the best fixed policy in hindsight, whereas the previous best regret bound was $O(T^{3/4})$. Second, the algorithm and its analysis are compatible with any feasible ADMDP graph topology, while all previous approaches required additional restrictions on the graph topology.
[pdf] [BibTeX]

authors venues years
Suggest Changes to this paper.
Brought to you by the WUSTL Machine Learning Group. We have open faculty positions (tenured and tenure-track).