I. Overview
(Log collection) website log traffic analysis of the data has been collected and the ground floor to HDFS, based on site traffic analysis system logs in architecture diagram, the next thing to do is to do off-line analysis, preparation of MR procedures or by handwriting on HDFS HQL the data cleaning; I chose to come here to clean the data in HDFS with Hive.