원문정보
초록
영어
This paper investigates the application of multi-agent deep reinforcement learning in the fighting game Samurai Shodown using Proximal Policy Optimization (PPO) and Advantage Actor-Critic (A2C) algorithms. Initially, agents are trained separately for 200,000 timesteps using Convolutional Neural Network (CNN) and Multi-Layer Perceptron (MLP) with LSTM networks. PPO demonstrates superior performance early on with stable policy updates, while A2C shows better adaptation and higher rewards over extended training periods, culminating in A2C outperforming PPO after 1,000,000 timesteps. These findings highlight PPO's effectiveness for short-term training and A2C's advantages in long-term learning scenarios, emphasizing the importance of algorithm selection based on training duration and task complexity. The code can be found in this link https://github.com/Lexer04/Samurai-Shodown-with-Reinforcement-Learning-PPO.
목차
1. Introduction
2. Related Work
2.1 Multi-Agent Reinforcement Learning
2.2 Proximal Policy Optimization
2.3 Advantage Actor-Critic
3. Experiment Setup and Methodology
3.1 Setup Environment
3.2 Independent Learning
4. Result and Discussion
5. Conclusion
Acknowledgement
References