2020 winter vacation (1)

Holiday some time ago for large data last semester we do a little summary of visualization tools echarts reference b stand to learn a few sections.

The next holiday will learn the Spark .

Today will spark the installation is complete, and students mooc on its spark chapter video learning finish. Divided into 6 sections.

Spark overview, the Spark ecosystem, the Spark is running architecture, the Spark SQL , the Spark deployment and application mode, the Spark programming practices.

 Successfully installed spark

 

 

 

 

  1. Spark is calculated based on the large data parallel computing frame memory, as compared to a disk-based computing hadoop computing framework having low latency, fast advantages.
  2. Spark ecosystem includes the Spark Core (provided in-memory computing,), the Spark SQL (providing interactive analysis), the Spark Streaming (providing flow calculation function), MLLib (machine learning algorithms provide component library) and Graphx (Fig provide computing) and other components.

 

 

 3. Run architecture

 

 

 

spark running processes:

 

 

 Spark using Executor advantages

① use multiple threads to perform specific tasks, reduce startup overhead tasks

② use BlockManager storage module to reduce the IO overhead

 

4.spark sql: the hive compatible level depends only HiveQL resolved, Hive metadata.

The next day ready to spark experiment

Guess you like

Origin www.cnblogs.com/zjl-0217/p/12231339.html