Web site traffic log analysis system notes (Hadoop big data technology principle and application)

First, the system architecture design

Here Insert Picture Description

  • First, the log file will be generated by the server Nginx collected by HDFS The Flume;
  • Second, developers of data processing in accordance with the provisions of the original log file and data formats custom development MapReduce programs;
  • Then, the most important data analysis by Hive;
  • Again, the demerit analysis of export to a relational database MySQL by sqoop tool;
  • Finally, the Web system, the most important data analysis

Second, the system overview

  1. Log Flume collection site in the virtual machine, the virtual machine is stored into the hdfs in.

  1. The log data in a virtual machine hdfs, stored in the window in the D: / input folder

  1. In the preparation of the eclipse in the window MapReduce programs, to D: / input log data in cleaning, and outputs it to D: / output

  1. Then D: output data in the cleaning / uploaded to the virtual machine in hdfs

  1. The hive in a virtual machine, create a data warehouse tables, fields corresponding to the data in the log and data are hdfs after cleaning to the table. Write HQL statement (similar sql statement), the data are aggregated for statistical analysis. Because the hive is stored in a table in hdfs, the data after the meta-analysis are in hdfs.

  1. By sqoop import data after the hive statistical analysis to the mysql.

  1. Ssm frame prepared by Echarts, the data were in mysql visual display

Third, the final display of results

Here Insert Picture Description

Published 43 original articles · won praise 13 · views 4907

Guess you like

Origin blog.csdn.net/qq_30693057/article/details/96052930