강화학습 기반 적대적 위협 환경 하에서의 정찰드론 경로 계획

최용찬; 박성수

강화학습 기반 적대적 위협 환경 하에서의 정찰드론 경로 계획

원문정보

Reconnaissance Drone Path Planning under Hostile Threat Environment Based on Reinforcement Learning

최용찬, 박성수

국제차세대융합기술학회 차세대융합기술학회논문지 제6권 4호 2022.04 pp.624-631 KCI 등재

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

The military utility of reconnaissance drone has been proven in modern warfare such as the Nagorno-Karabakh War. However, path planning of reconnaissance drones is still a difficult to solve due to high complexity. Real-time interaction requires consideration of the types of sensor data that drone collect. In this study, we tried to solve using the deep reinforcement learning, which is a methodology in which the agent learns by itself through interaction with the environment. The collected information of the drone was divided into the location of the target, the current location and the sensor data of the drone. In this study, the PPO algorithm, a policy-based deep reinforcement learning with excellent performance, was used, and the experimental environment was built using Unity, a platform that can build a 3D game environment. As a result of learning for 2 million episodes, both types of reconnaissance drones showed high performance such as 100% mission success rate, and through this, the possibility of path planning missions of reconnaissance drones in hostile threats was confirmed through deep reinforcement learning.

한국어

나고르노-카라바흐 전쟁 등 현대전에서 정찰용 드론의 군사적 효용성은 이미 입증되었다. 하지만, 정찰 드론의 실시간 경로계획 또는 자율주행 기술은 높은 복잡도로 인하여 여전히 해결하기 어려운 문제이다. 환경과 효과적인 실시간 상호작용을 위해서는 정찰 드론이 수집하는 센서의 유형에 대한 고려도 필요하다. 본 연구에서는 환경과 상호작용을 통해 에이전트가 스스로 학습하여 연속적인 의사결정을 하는 방법론인 심층 강화학습 기법을 활용하여 경로계획 문제를 해결하고자 하였다. 정찰드론의 수집정보로 정찰목표의 위치, 드론의 현재위치와 드론 의 센서로 구분하였으며 드론의 센서는 카메라와 라이다 2가지 유형으로 구분하여 실험을 진행하였다. 본 연구에 서는 우수한 성능의 정책기반 심층 강화학습 기법인 PPO(Proximal Policy Optimization) 알고리즘을 활용하였으 며 실험 환경은 3D지능형 게임 환경을 구축할 수 있는 플랫폼인 Unity를 이용하여 구축하였다. 200만회의 에피스 드 동안 학습한 결과 두 유형의 정찰드론 모두 100% 임무성공률 등 높은 성능을 보였으며 이를 통해 심층 강화 학습을 통한 적대적 위협에서 정찰 드론의 경로계획 임무수행 가능성을 확인했다.

요약
Abstract
Ⅰ. 서론
Ⅱ. 이론적 배경
Ⅲ. 정찰드론 경로계획 모형
3.1 Unity 기반 정찰드론 임무환경 구현
3.2 정찰드론의 상태정의 및 네트워크 구조
3.3 정찰드론의 행동 공간(State Space)
3.4 정찰 드론의 보상 설계
3.5 하이퍼 파라미터
Ⅳ. 실험결과
Ⅴ. 결론
REFERENCES

키워드

저자정보

최용찬 Yong-Chan Choi. 육군 3사관학교 경제경영학 강사
박성수 Seong-Su Park. 육군 3사관학교 법정학 조교수

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 기관로그인 시 무료 이용이 가능합니다.

4,000원

0개의 논문이 장바구니에 담겼습니다.

earticle