Big data: hadoop spark, spark features, functions, architecture, modules, roles

Big data: hadoop spark

2022找工作是学历、能力和运气的超强结合体,遇到寒冬,大厂不招人,可能很多算法学生都得去找开发,测开
测开的话,你就得学数据库,sql,oracle,尤其sql要学,当然,像很多金融企业、安全机构啥的,他们必须要用oracle数据库
这oracle比sql安全,强大多了,所以你需要学习,最重要的,你要是考网络警察公务员,这玩意你不会就别去报名了,耽误时间!
与此同时,既然要考网警之数据分析应用岗,那必然要考数据挖掘基础知识,今天开始咱们就对数据挖掘方面的东西好生讲讲 最最最重要的就是大数据,什么行测和面试都是小问题,最难最最重要的就是大数据技术相关的知识笔试


Big data: hadoop spark: the world's most famous distributed computing framework

Similar to MapReduce, it is a computing framework

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
Analytical computing, memory iteration?
In-memory computing
is very cool.
What is in-memory computing?

insert image description here
insert image description here
Unification: it is applicable to a wide range of

insert image description here
insert image description here
Spark's pyspark is a python interface
Niubi

insert image description here
RDD provides quite a lot of operators, which
is much better than map and reduce
.

Spark avoids network transmission as much as possible and calculates in memory

insert image description here
spark is a computing framework [replacing MapReduce in hadoop]
hadoop is computing, storage and scheduling, the three concentrated
insert image description here
insert image description here
memory, space for time,
insert image description here
its programming, especially in python, looks very simple, easy to use
insert image description here
All kinds of show
machine learning It can also be played,
insert image description here
and the data source is easy to read.
It’s amazing.

insert image description here
insert image description here
local: suitable for development and testing
standalone: ​​running in Linux
hadoop yarn: inside the yarn container
kubernetes: inside the kubernetes container
cloud server: Alibaba Cloud, Amazon, etc.

easy to say

insert image description here
insert image description here

Overall chairman,
department head,
team leader,
secretary

insert image description here
The names of the four roles of spark are different, and the work they do is the same as that of yarn

Chairman: master
Department supervisor: worker, stand-alone resource management
Team leader: driver
Officer: executor

insert image description here
The role of spark is the same as that of yarn ,
but they are called differently
insert image description here
insert image description here
. These four roles run through the entire learning process of spark.
They are the system

insert image description here

insert image description here


Summarize

提示:重要经验:

1)
2) Learn oracle well, even if the economy is cold, the whole test offer is definitely not a problem! At the same time, it is also the only way for you to test the public Internet police.
3) When seeking AC in the written test, space complexity may not be considered, but the interview must consider both the optimal time complexity and the optimal space complexity.

Guess you like

Origin blog.csdn.net/weixin_46838716/article/details/131022051