Spark has to organize

Spark is a big data computing framework written in Scala and based on in-memory computing.

With Spark core as the core, it provides several major functional components of Spark SQL, Spark Streaming, MLlib

Chinese document: https://spark.apachecn.org/#/

github address: https://github.com/apache/spark

Spark Core

Spark provides a variety of resource scheduling frameworks, based on in-memory computing, DAG execution process management, and RDD blood relationship to ensure fast and highly fault-tolerant computing. RDD is the core concept of Spark

Spark SQL

SparkSQL optimizes the sql query based on Spark Core, converts the sql query to the corresponding RDD (DateFrame), and optimizes it, simplifying development and improving the efficiency of data cleaning

Spark Streaming

SparkStreaming is a stream processing framework based on SparkCore. It implements stream processing (DStream) through the concept of micro-batch. It can guarantee the data delay to at least 500ms. It is a high-throughput and high-tolerance stream processing framework.

Published 238 original articles · praised 429 · 250,000 views

Guess you like

Origin blog.csdn.net/qq_45765882/article/details/105522676