Use Docker build Spark clusters (used to achieve real-time web traffic analysis module) using Docker build Hadoop cluster (pseudo-distributed and fully distributed) web site traffic analysis system log

  Previous use of Docker build a fully distributed Hadoop: The use of Docker build Hadoop cluster (pseudo-distributed and fully distributed) , this record to build spark cluster, before using both to implement the project has not been completed: Web logs traffic analysis system (the system is currently using the virtual machine to achieve the offline analysis module, real-time analysis has not been completed due to resource issues --- the spark cluster for real-time analysis of the project)

First, build the basic environment according to Chart

  ①Scala Version: 2.13 Download: https://www.scala-lang.org/download/

  ②Docker version: Docker version 19.03.5, Download: https://docs.docker.com/install/linux/docker-ce/centos/

  ③ build a zookeeper cluster (Version: 3.4.14), Download: http://mirror.bit.edu.cn/apache/zookeeper/

  ④ build hadoop cluster (Version: 2.7.7), Download: https://archive.apache.org/dist/hadoop/common/

  ⑤ installation flume (Version: 1.9.0), Download: http://flume.apache.org/download.html

  ⑥ build Kafka cluster (Version: 2.4.0), Download: http://kafka.apache.org/downloads

  ⑦ build HBase cluster (version: 0.98.17), Download: https://archive.apache.org/dist/hbase/

  ⑧ build Spark cluster (Version: 2.4.4), Download: https://www.apache.org/dyn/closer.lua/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7 .tgz

 Based on the above to build a cluster environment Spark

Guess you like

Origin www.cnblogs.com/rmxd/p/12103447.html