원문정보
초록
영어
The growth of websites and the Internet has opened up new research, social, entertainment, education and business opportunities. With the fast growth of the Internet, the digital data generated by the websites is becoming so massive that the traditional text software and relational database technology faces a bottleneck while processing such massive data and the results generated by these technologies are not satisfactory. Cloud computing offers a good solution for this problem. Cloud computing is not only capable of storing such massive data but also capable of processing and analyzing such voluminous data faster, by making use of distributed storage and distributed computing technology. A weblog is a group of connected web pages that consists of a log or daily record of information, particular fields or views which is altered, every now and then, by owner of site, other websites or by website users. An enterprise weblog analysis system based on Hadoop architecture with Hadoop Distributed File System (HDFS), Hadoop MapReduce Software Framework and Pig Latin Language aids the business decision-making process of the system administrators and helps them to collect and identify the potential value which is hidden within such huge data generated by the websites. Such a weblog analysis includes the analysis of an Internet site’s entry log as well as provides information about the amount of visitors, days of week and rush hours, views, hits, very often accessed pages, application server traffic trends, performance reports at varying intervals and statistical reports which indicate the performance of program.
목차
1. Introduction
1.1. History of Weblog Analysis using Hadoop
1.2. Various Steps to Perform a Weblog Analysis using Hadoop
2. Literature Review
2.1. Various Indicators derived through Weblog Analysis
2.2. Hadoop
2.3. Hadoop Distributed File System (HDFS)
2.4. Hadoop MapReduce
2.5. Pig Programming Language
3. Results and Comparisons
4. Conclusion
5. Future Scope
References
