Arthur Guez
Arthur Guez,
a Canadian computer and neuro scientist, currently researcher at Google DeepMind with expertise in machine learning, in particular deep learning, and involved in the AlphaGo and AlphaZero projects. He holds a M.Sc. in machine learning from McGill University in 2010 and a Ph.D. from Gatsby Computational Neuroscience Unit at University College London in 2015 titled Sample-based Search Methods for Bayes-Adaptive Planning, where he was supervised by Peter Dayan and David Silver.
Contents
Ph.D. Thesis
In his Ph.D. thesis, Arthur Guez elaborates on search and planning methods in the face of uncertainty about the environment inducing the exploration versus exploitation trade-off of an agent-based model to optimize the return by maintaining a posterior distribution over possible environments considering all possible future paths. This optimization is equivalent to solving a Markov decision process (MDP) whose hyperstate comprises the agent’s beliefs about the environment, as well as its current state in that environment - the corresponding process is called a Bayes-Adaptive MDP (BAMDP), also using a tailored Monte-Carlo tree search. In historical notes on Bayesian Adaptive control, Arthur Guez mentions Abraham Wald's Sequential Probability Ratio Test (SPRT) [2], and that Alan Turing assisted by Jack Good used a similar sequential testing technique to help decipher enigma codes at Bletchley Park [3] [4].
Selected Publications
2010 ...
- Arthur Guez (2010). Adaptive control of epileptic seizures using reinforcement learning. M.Sc. thesis, McGill University, pdf
- Arthur Guez, David Silver, Peter Dayan (2012). Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search. NIPS 2012, pdf
- Arthur Guez, David Silver, Peter Dayan (2013). Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search. Journal of Artificial Intelligence Research, Vol. 48, pdf
- Arthur Guez, David Silver, Peter Dayan (2014). Better Optimism By Bayes: Adaptive Planning with Rich Models. arXiv:1402.1958v1
- Arthur Guez, Nicolas Heess, David Silver, Peter Dayan (2014). Bayes-Adaptive Simulation-based Search with Value Function Approximation. NIPS 2014, pdf
2015 ...
- Hado van Hasselt, Arthur Guez, David Silver (2015). Deep Reinforcement Learning with Double Q-learning. arXiv:1509.06461
- Arthur Guez (2015). Sample-based Search Methods for Bayes-Adaptive Planning. Ph.D. thesis, Gatsby Computational Neuroscience Unit, University College London, pdf
- David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, Demis Hassabis (2016). Mastering the game of Go with deep neural networks and tree search. Nature, Vol. 529 » AlphaGo
- Hado van Hasselt, Arthur Guez, Matteo Hessel, Volodymyr Mnih, David Silver (2016). Learning values across many orders of magnitude. arXiv:1602.07714v2, NIPS 2016
- David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, Demis Hassabis (2017). Mastering the game of Go without human knowledge. Nature, Vol. 550 [6]
- David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815 » AlphaZero
- Arthur Guez, Théophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Rémi Munos, David Silver (2018). Learning to Search with MCTSnets. arXiv:1802.04697
- David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, Vol. 362, No. 6419 [7]
- Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver (2019). Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. arXiv:1911.08265
2020 ...
- Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver (2020). Mastering Atari, Go, chess and shogi by planning with a learned model. Nature, Vol. 588 [8]
External Links
References
- ↑ Image clipped from aguez.jpg, Arthur Guez's Homepage
- ↑ Abraham Wald (1945). Sequential Tests of Statistical Hypotheses. Annals of Mathematical Statistics, Vol. 16, No. 2, doi: 10.1214/aoms/1177731118
- ↑ Jack Good (1979). Studies in the history of probability and statistics. XXXVII AM Turing’s statistical work in World War II. Biometrika, Vol. 66, No. 2
- ↑ Arthur Guez (2015). Sample-based Search Methods for Bayes-Adaptive Planning. Ph.D. thesis, Gatsby Computational Neuroscience Unit, University College London, pdf
- ↑ DBLP: Arthur Guez
- ↑ AlphaGo Zero: Learning from scratch by Demis Hassabis and David Silver, DeepMind, October 18, 2017
- ↑ AlphaZero: Shedding new light on the grand games of chess, shogi and Go by David Silver, Thomas Hubert, Julian Schrittwieser and Demis Hassabis, DeepMind, December 03, 2018
- ↑ MuZero: Mastering Go, chess, shogi and Atari without rules