원문정보
한국차세대컴퓨팅학회
한국차세대컴퓨팅학회 학술대회
ICNGC 2025 The 11th International Conference on Next Generation Computing 2025
2025.12
pp.59-60
피인용수 : 0건 (자료제공 : 네이버학술정보)
초록
영어
We investigate whether Large Language Model (LLM)s can learn strategic reasoning and social deception abilities through Reinforcement Learning (RL) finetuning via a multi-agent “Mafia Game” simulation environment. We finetune a baseline 7B model using Proximal Policy Optimization (PPO) with sparse binary rewards based on game outcomes. Training samples are collected through an opponent pool consisting of different versions of the finetuned model. Our experiment results show that the finetuned model outperforms the baseline model by a significant margin and suggest that strategic capabilities unseen in baseline models emerge.
목차
Abstract
I. INTRODUCTION
II. METHODOLOGY
A. Game Environment Setup
B. Training Setup
C. Opponent Pool Design
III. EXPERIMENTS AND RESULTS
IV. CONCLUSION
ACKNOWLEDGMENT
REFERENCES
I. INTRODUCTION
II. METHODOLOGY
A. Game Environment Setup
B. Training Setup
C. Opponent Pool Design
III. EXPERIMENTS AND RESULTS
IV. CONCLUSION
ACKNOWLEDGMENT
REFERENCES
키워드
저자정보
참고문헌
자료제공 : 네이버학술정보
