First, the log file will be generated by the server Nginx collected by HDFS The Flume;
Second, developers of data processing in accordance with the provisions of the original log file and data formats custom development MapReduce programs;
Then, the most important data analysis by Hive;
Again, the demerit analysis of export to a relational database MySQL by sqoop tool;
Finally, the Web system, the most important data analysis
Second, the system overview
Log Flume collection site in the virtual machine, the virtual machine is stored into the hdfs in.
The log data in a virtual machine hdfs, stored in the window in the D: / input folder
In the preparation of the eclipse in the window MapReduce programs, to D: / input log data in cleaning, and outputs it to D: / output
Then D: output data in the cleaning / uploaded to the virtual machine in hdfs
The hive in a virtual machine, create a data warehouse tables, fields corresponding to the data in the log and data are hdfs after cleaning to the table. Write HQL statement (similar sql statement), the data are aggregated for statistical analysis. Because the hive is stored in a table in hdfs, the data after the meta-analysis are in hdfs.
By sqoop import data after the hive statistical analysis to the mysql.
Ssm frame prepared by Echarts, the data were in mysql visual display