원문정보
초록
영어
Datasets are a foundational step in the development of any Artificial Intelligence (AI) powered solutions. In cybersecurity, especially in malware detection and mitigation, cybersecurity AI datasets focusing on malware can play a critical role in improving accuracy and efficiency of AI models. In this paper we explore several recent techniques used in construction of malware AI datasets, identify gaps and recommend practical solutions to address them. Specifically, we explore various frameworks and techniques for improving data collection, preprocessing and dataset validation. Furthermore, we explore various recent approaches applied in AI based malware detection. In a special way we examine shallow learning, deep learning, bio-inspired computing, behavior-based detection, heuristic-based approaches, and hybrid approaches. We then draw our observations and recommend specific strategies for improving the process of malware AI dataset construction as well as detection techniques. Through our research we also contribute to the ongoing much needed efforts for combating malware attacks by providing a framework for building quality malware focused cybersecurity AI datasets, there by improving the current state of the art AI-powered malware detection systems.
목차
1. Introduction
2. Background and Motivation
3. Cybersecurity AI Dataset construction frameworks
3.1 Hybrid Framework
3.2 Crowdsourcing
3.3 Transfer Learning
3.4 Active Learning
3.5 Semi-Supervised Learning
3.6 Weakly Supervised Learning
4. Cybersecurity AI Dataset construction process
4.1 Data Collection
4.2 Data Preprocessing
4.3 Feature Extraction
4.4 Dataset Validation
5. AI Applications for Malware Detection and Analysis Using the Cybersecurity AI Dataset
5.1 Shallow Learning
5.2 Deep Learning
5.3 Bio-Inspired Computing
5.4 Behavior-Based Detection
5.5 Heuristic-Based Approaches
5.6 Hybrid Approaches
6. Observations and Recommendations
6.1 Observations
6.2 Recommendations
7. Conclusion
Acknowledgement
References
