

introduce BiCNet based on multi-agent reinforcement learning combined with actor-critic. Present a heuristic reinforcement learning algorithm combining exploration in the space of policy and back propagation. Previous works on StarCraft are mostly focused on local battles or part-length and often get features directly from game engine. In recent years, more studies use RL algorithms to conduct research on RTS and one of the most famous RTS research environments is StarCraft. Traditionally, research on real-time strategy games is based on search and planning approaches. RL problems on real-time strategy (RTS) games are far more difficult than problems on Go due to complexity of states, diversity of actions, and long time horizon. Games are ideal environments for reinforcement learning research. MLSH has achieved better results on some tasks than the PPO algorithm, but because its setting is multi-tasking, it is more difficult to apply to our environment. MLSH proposes a hierarchical learning approach based on meta-learning, which enhances the learning ability of transferring to new tasks through sub-policies learned in multiple tasks.

However, due to the complexity, this architecture is hard-to-tune. The paper proposes a gradient transfer strategy to learn the parameters of the Manager and Worker in an end-to-end manner. The Manager assigns sub-goals to the Worker, and the Worker is responsible for doing specific actions. FeUdalNetwork designed a hierarchical architecture that includes a Manager module and a Worker module. However, the automatically learned options do not perform as well as non-hierarchical algorithms on certain tasks.
Difficulty level of starcraft 2 game manual#
Option-Critic is a method using theorem of gradient descent to learn the options and policy over options simultaneously, thus reducing the effort of manual designing options. In recent years, some novel hierarchical reinforcement learning algorithms have been proposed. Hope this study could shed some light on the future research of large-scale Generalization performance, when tested against never seen opponents includingĬheating levels built-in AI and all levels of Zerg and Protoss built-in AI. Single machine with only 48 CPU cores and 8 K40 GPUs. Non-cheating built-in AI (level-7) of Terran, training within two days using a Through theĬurriculum transfer learning algorithm and a mixture of combat model, we canĪchieve over 93% winning rate of Protoss against the most difficult Rate of more than 99% against the difficulty level-1 built-in AI. On a 64圆4 map and using restrictive units, we achieve a winning The reinforcement training algorithm for this architecture is also Two-layer hierarchical architecture which is modular and easy to scale,Įnabling a curriculum transferring from simpler tasks to more complex tasks. The action space in an order of magnitude yet remains effective. Macro-action automatically extracted from expert's trajectories, which reduces The hierarchy involves two levels of abstraction. In this paper, we investigate a hierarchical reinforcement learning approachįor StarCraft II. The mainĭifficulties of it include huge state and action space and a long-time horizon. StarCraft II poses a grand challenge for reinforcement learning.
