Abstract:
|
In March of 2016, Google DeepMind's AlphaGo, a computer Go-playing program, defeated the reigning human world champion Go player, 4-1, a feat far more impressive than previous victories by computer programs in chess (IBM's Deep Blue) and Jeopardy (IBM's Watson). In December 2017, AlphaZero, a successor of AlphaGo designed for more generic game playing, dominated the well-known computer chess program Stockfish (which has a far higher rating than any human) in a 100-game match by winning 28 games and losing none (72 draws). The main engine behind these programs combines machine learning approaches with a technique called Monte Carlo tree search. After reviewing the history and background of AlphaGo/AlphaZero, we trace the origins of Monte Carlo tree search (particularly its relationship to simulation-based algorithms for Markov decision processes), discuss its present role in AlphaZero, and propose a substantial modification of the algorithm based on results from the well-established field of statistical ranking & selection. We present preliminary computational results based on simulations of an inventory control problem.
|