원문정보
초록
영어
The advancement of generative AI technologies has significantly impacted various domains in software engineering, particularly in automating test case generation. As software systems become increasingly complex, manual test case creation faces limitations in terms of efficiency and coverage. This study analyzes the capabilities and limitations of major generative AI models—ChatGPT, Copilot, and Gemini—in generating software test cases. We focus on evaluating their performance in boundary value analysis, exception handling, and property-based testing. Using the ArrayUtils.indexOf() function from the Apache Commons Lang library as the test subject, we conducted experiments to compare the quality and effectiveness of the test cases generated by each model. Our findings indicate that while generative AI can efficiently produce a substantial number of high-quality test cases, there are instances of incorrect test cases and test codes. To address these issues, we propose guidelines for developers to enhance the reliability and consistency of test case generation using generative AI. Future research will explore the application of these models to more complex software systems and further methods to improve their test generation capabilities.
목차
1. Introduction
2. Related Works
3. Generative AI's Test Case Generation Capabilities
3.1 Test Case Generation According to Boundary Value Analysis
3.2 Test Case Generation with Exception Objects
3.3 Property-Based Testing Technique
4. Comparative Analysis
4.1 Code Coverage Measurement
4.2 Mutation Testing
4.3 Analysis Results and Guidelines
5. Conclusion
Acknowledgement
References
