earticle

논문검색

Multi-armed Bandit Online Learning Based on POMDP in Cognitive Radio

초록

영어

In cognitive radio, most of existing research efforts devoted to spectrum sharing have two weakness as follows. First, they are largely formulated as a Markov decision process (MDP), which requires a complete knowledge of channel. Second, most of the studies are online learning based on perceived channel. To solve the above problems, a new algorithm is proposed in this paper: if the authorized user exists in the current channel, Second user will send conservatively in low rate, or send aggressively. When sending conservatively, the state of the channel is not directly observable, the problem turns out to be Partially Observable Markov Decision Process (POMDP).We first establish the optimal threshold when the channel is known, then consider the optimal transmission when the channel is unknown and model for multi-armed bandit. We get the optimal K-conservative policy through the UCB algorithm and improve the convergence speed by UCB-TUNED algorithm. Simulation and analysis results show that it is the same result of K-conservative policy no matter the multi-armed bandit online learning under not fully known channel or the optimal threshold policy under known channel .At the same time, we improve the convergence speed by UCB-TUNED algorithm.

목차

Abstract
 1. Introduction
 2. The System Model
  2.1. POMDP Model
  2.2. Channel Modeling based on POMDP
 3. The known Channel State of the Optimal Transmission Threshold Strategy
  3.1. K Conservative Strategy Structure Modeling
  3.2. The Challenge of the K Conservative Strategy
  3.3. UCB Algorithm
 4. Simulation Results
  4.1. The off-line Algorithm for Optimal Transmission Threshold Strategy
  4.2. Online Learning Algorithm of K Arm Gambling Machine in the Unknown Channel State
 5. Conclusions
 Acknowledgements
 References

저자정보

  • Juan Zhang the Open Fund of Robot Technology Used for Special Environment Key Laboratory of Sichuan Province , School of Information Engineering, Southwest University of Science and Technology, Mianyang, China
  • Hesong-Jiang the Open Fund of Robot Technology Used for Special Environment Key Laboratory of Sichuan Province , School of Information Engineering, Southwest University of Science and Technology, Mianyang, China
  • Hong Jiang the Open Fund of Robot Technology Used for Special Environment Key Laboratory of Sichuan Province , School of Information Engineering, Southwest University of Science and Technology, Mianyang, China
  • Chunmei Chen the Open Fund of Robot Technology Used for Special Environment Key Laboratory of Sichuan Province , School of Information Engineering, Southwest University of Science and Technology, Mianyang, China

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.