Previous use of Docker build a fully distributed Hadoop: The use of Docker build Hadoop cluster (pseudo-distributed and fully distributed) , this record to build spark cluster, before using both to implement the project has not been completed: Web logs traffic analysis system (the system is currently using the virtual machine to achieve the offline analysis module, real-time analysis has not been completed due to resource issues --- the spark cluster for real-time analysis of the project)
First, build the basic environment according to Chart
①Scala Version: 2.13 Download: https://www.scala-lang.org/download/
②Docker version: Docker version 19.03.5, Download: https://docs.docker.com/install/linux/docker-ce/centos/
③ build a zookeeper cluster (Version: 3.4.14), Download: http://mirror.bit.edu.cn/apache/zookeeper/
④ build hadoop cluster (Version: 2.7.7), Download: https://archive.apache.org/dist/hadoop/common/
⑤ installation flume (Version: 1.9.0), Download: http://flume.apache.org/download.html
⑥ build Kafka cluster (Version: 2.4.0), Download: http://kafka.apache.org/downloads
⑦ build HBase cluster (version: 0.98.17), Download: https://archive.apache.org/dist/hbase/
⑧ build Spark cluster (Version: 2.4.4), Download: https://www.apache.org/dyn/closer.lua/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7 .tgz
Based on the above to build a cluster environment Spark