earticle

논문검색

Extracting High-Level Concepts from Open-Source Systems

초록

영어

Analyzing the unstructured information in the source code (that is, the comments and identifiers) is based on the idea that the unstructured information reveals, to some extent, the concepts of the problem domain of the software. This information adds a new layer of source code semantic information and captures the domain semantics of the software. Developers use identifiers, method names, and comments to incorporate components of the solution domain of the software. Topic models reveal topics from the corpus, which embody real world concepts by analyzing words that frequently co-occur. These topics have been found to be effective mechanisms for describing the major themes spanning a corpus. Recently, software engineering researchers established that topic models can be effective in structuring various software artifacts, such as bug reports and requirements documents. In this paper, we extract topic models from the textual content of source code by conducting a case study on the source code of Java-based open-source systems, ArgoUML, Checkstyle, JHotDraw and jEdit. The paper investigates the effectiveness of LDA in comprehending large open-source software systems.

목차

Abstract
 1. Introduction
 2. Latent Dirichlet Allocation (LDA)
 3. Study Methodology
  3.1. Unstructured Source Code
  3.2. Systems Under Study
  3.3. Study Setup
  3.4. Choosing K
 4. The Discovered Topics
 5. Threats to Validity
 6. Related Work
 7. Conclusion
 References

저자정보

  • Mamdouh Alenezi College of Computer and Information Science, Prince Sultan University, Riyadh 11586, Saudi Arabia

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.