원문정보
초록
영어
With the rapid advancement of generative artificial intelligence and virtual human technologies, AI-driven educational systems had demonstrated significant potential in enabling personalized instruction and intelligent interaction. However, existing systems commonly suffer from limited emotional expressiveness, inadequate knowledge response mechanisms, and weak multimodal integration. This study proposed a multimodal AI tutor system based on Unreal Engine, integrating key technologies such as the GPT-4 language model, VITS-based speech synthesis, and NVIDIA Audio2Face for facial animation. To enhance content accuracy and adaptive responsiveness, a dual knowledge graph framework was introduced, comprising a structured teaching knowledge graph and a student cognitive intent graph. The system employs MetaHuman for high-fidelity avatar modeling and leverages Live Link to establish synchronized speech-expression feedback. Experimental results validate the feasibility of the proposed AI tutor system in virtual education environments, providing a practical foundation for the development of future intelligent educational platforms.
목차
1. Introduction
2. Related Works
3. Research Objectives and Methodology
3.1 Research Objectives
3.2 System Architecture Overview
3.3 Methodology
4. System Design Process
4.1 AI Tutor Character Modeling — Construction Based on MetaHuman
4.2 AI Tutor Voice Activation Mechanism and GPT Integration
4.3 Implementation of Knowledge Base Enhancement and Semantic Interaction for the AI Tutor Based on the Qianfan Platform
4.4 Speech Synthesis for the AI Tutor
4.5 Integration of Audio-to-Face and Lip-Sync Implementation
4.6 Module Integration Logic and Data Flow Description
4.7 Comparative Analysis with Existing Intelligent Tutoring Systems
5. Discussion
6. Conclusion
References
