TY - JOUR
T1 - Towards a General Transfer Approach for Policy-Value Networks
AU - Soemers, Dennis J.N.J.
AU - Mella, Vegard
AU - Piette, Éric
AU - Stephenson, Matthew
AU - Browne, Cameron
AU - Teytaud, Olivier
PY - 2023
Y1 - 2023
N2 - Transferring trained policies and value functions from one task to another, such as one game to another with a different board size, board shape, or more substantial rule changes, is a challenging problem. Popular benchmarks for reinforcement learning (RL), such as Atari games and ProcGen, have limited variety especially in terms of action spaces. Due to a focus on such benchmarks, the development of transfer methods that can also handle changes in action spaces has received relatively little attention. Furthermore, we argue that progress towards more general methods should include benchmarks where new problem instances can be described by domain experts, rather than machine learning experts, using convenient, high-level domain specific languages (DSLs). In addition to enabling end users to more easily describe their problems, user-friendly DSLs also contain relevant task information which can be leveraged to make effective zero-shot transfer plausibly achievable. As an example, we use the Ludii general game system, which includes a highly varied set of over 1000 distinct games described in such a language. We propose a simple baseline approach for transferring fully convolutional policy-value networks, which are used to guide search agents similar to AlphaZero, between any pair of games modelled in this system. Extensive results—including various cases of highly successful zero-shot transfer—are provided for a wide variety of source and target games.
AB - Transferring trained policies and value functions from one task to another, such as one game to another with a different board size, board shape, or more substantial rule changes, is a challenging problem. Popular benchmarks for reinforcement learning (RL), such as Atari games and ProcGen, have limited variety especially in terms of action spaces. Due to a focus on such benchmarks, the development of transfer methods that can also handle changes in action spaces has received relatively little attention. Furthermore, we argue that progress towards more general methods should include benchmarks where new problem instances can be described by domain experts, rather than machine learning experts, using convenient, high-level domain specific languages (DSLs). In addition to enabling end users to more easily describe their problems, user-friendly DSLs also contain relevant task information which can be leveraged to make effective zero-shot transfer plausibly achievable. As an example, we use the Ludii general game system, which includes a highly varied set of over 1000 distinct games described in such a language. We propose a simple baseline approach for transferring fully convolutional policy-value networks, which are used to guide search agents similar to AlphaZero, between any pair of games modelled in this system. Extensive results—including various cases of highly successful zero-shot transfer—are provided for a wide variety of source and target games.
KW - game theory
KW - machine learning
KW - artificial Intelligence
UR - http://www.scopus.com/inward/record.url?scp=86000610981&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:86000610981
SN - 2835-8856
VL - 12
SP - 1
EP - 39
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -