Exact Regret Minimization in Repeated Games
The regret-minimization paradigm has emerged as an effective technique for designing robust algorithms for online decision-making in uncertain environments. But so far, designing exact minimax optimal policies that minimize the worst-case regret has proven to be a difficult task in general. In this talk, I will describe a new set-valued dynamic programming approach for characterizing and approximating such regret-optimal policies in repeated games with discounted losses and finite action sets.
I will illustrate this approach on the problem of prediction using expert advice under discounted {0,1} losses. Our numerical evaluations demonstrate the sub-optimality of well-known off-the-shelf online learning algorithms like Hedge and a significantly improved performance on using our approximately optimal policies in this setting.
Based on joint work with Jean Walrand (Berkeley) and Patrick Loiseau (INRIA).