원문정보
초록
영어
This paper studies ε-greedy algorithm and softmax algorithm in obstacle avoidance and balance study. In the experiment, Sarsa algorithm and Q-Learning algorithm were used to appropriately simplify and build the model of obstacle avoidance; softmax algorithm was used to address how to balance exploration and utilisation; and two classical algorithms of reinforcement learning were adopted to deal with obstacle avoidance. The results generated by simulation prove that Sarsa algorithm and Q-Learning algorithm can handle obstacle avoidance and balance study in limited time step, which makes the intelligent agent improve the non-maximum estimated value of the value function of the state so as to choose the best action that has been carried out. In addition, Sarsa algorithm and Q-Learning algorithm can also enable the intelligent agent to try new actions and find out the optimal one.
목차
1. Introduction
2. Reinforcement Learning
2.1. Theory Framework of Reinforcement Learning
2.2. Key Elements of Reinforcement Learning
2.3. Exploration and Utilization
2.4. Sarsa Algorithm
2.5. Q-Learning Algorithm
3. Obstacle Avoidance Model
4. The Results of Simulation and its Analysis
5. Conclusions
Acknowledgments
References
