원문정보
초록
영어
Recording server log data files is nowadays commonplace practice. The server log data files capture useful information during the interaction of users with the online site, as well as the interaction among users during online sessions. Extracting such information can be helpful for many purposes, such as user modelling, user activity analysis, data analytics, security, monitoring, etc. However, there are two main issues with server log data files: (1) they tend to be of very large size due the considerable number of online users and (2) such data is not ready to be analysed due to log files needing to be pre-processed and cleaned from redundant and futile information to make it ready for data mining and analysis. In this work we consider a real case study of the massive processing and analysis of the log data files of the real Virtual Campus of the Open University of Catalonia. We demonstrate the need and feasibility of using massive processing through different distributed infrastructures. Then we show the analysis of the navigation patterns of users’ activity within the Virtual Campus. This work shows a real example of the convergence of different technologies and paradigms such as massive processing, data mining and online learning to study user behaviour and modelling in online sites.
목차
I. INTRODUCTION
II. THE VIRTUAL CAMPUS AND THE NEED FOR MASSIVE PROCESSING
III. LOG DATA FILES OF THE VIRTUAL CAMPUS
IV. MASSIVE PROCESSING
A. Local processing
B. Cluster processing
C. PlanetLab processing
V. MINING NAVIGATION PATTERNS
A. Data mining methods
B. WEKA Application
VI. CONCLUSIONS
REFERENCES
BIOGRAPHYS