Big data: Distributed resource scheduling framework YARN, core architecture, master-slave structure, auxiliary structure, yarn and MapReduce deployment and configuration, Monte Carlo method for PI

Big data: Distributed resource scheduling framework YARN, core architecture, master-slave structure, auxiliary structure, yarn and MapReduce deployment and configuration, Monte Carlo method for PI

2022找工作是学历、能力和运气的超强结合体,遇到寒冬,大厂不招人,可能很多算法学生都得去找开发,测开
测开的话,你就得学数据库,sql,oracle,尤其sql要学,当然,像很多金融企业、安全机构啥的,他们必须要用oracle数据库
这oracle比sql安全,强大多了,所以你需要学习,最重要的,你要是考网络警察公务员,这玩意你不会就别去报名了,耽误时间!
与此同时,既然要考网警之数据分析应用岗,那必然要考数据挖掘基础知识,今天开始咱们就对数据挖掘方面的东西好生讲讲 最最最重要的就是大数据,什么行测和面试都是小问题,最难最最重要的就是大数据技术相关的知识笔试


Big Data: Distributed Resource Scheduling Framework YARN

insert image description here
insert image description here
insert image description here
Yarn manages resources and scheduling MapReduce
can be implemented in the scheduling process

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
Tens of thousands of people in the school can be managed if there are classrooms. Scheduling
is easy
and efficient.
insert image description here
insert image description here
insert image description here

insert image description here
The best cluster
insert image description here
manager

insert image description here
Allocation of resources, decentralized computing, and aggregation are all supervised by yarn, and allocation

insert image description here
Applying for
insert image description here
yarn can schedule resources.
The remaining
insert image description here
three components are used by others, which is basically OK.

Storage, computing, resource scheduling.

Yarn's architecture, core architecture and auxiliary architecture

insert image description here
insert image description here
insert image description here
One storage, one resource scheduling,
each is its own,
insert image description here
the same as the factory,
the general chairman resourcemanager

Each factory manager nodemanager

insert image description here
The director will make overall arrangements,
and the rest of the factory directors can make their own arrangements.

Customers only need to ask the resourcemanager for resources, then they can get the
insert image description here
insert image description here
container container, just find the container
insert image description here
one by one, you can’t load more , you can only load so much
insert image description here



insert image description here


Auxiliary structure of yarn

insert image description here
insert image description here
insert image description here
insert image description here
Auxiliary to improve security
insert image description here
Just provide security guarantee for yarn

History, records
insert image description here
feel like auxiliary work
insert image description here

Isolate resources , simply set up a server that
insert image description here
insert image description here
records logs in a unified way, and collect logs in a unified way .

insert image description here
insert image description here


This is the Auxiliary Architecture
insert image description here
Master-Slave Role
Auxiliary Role
insert image description here

MapReduce and yarn deployment

insert image description here
insert image description here
insert image description here
Deployment is to start the master-slave auxiliary node

MapReduce running on yarn
does not need to start the process, only need to modify the configuration

insert image description here
insert image description here
insert image description here
Why configure so much memory on node1,
it takes on a lot of things

insert image description here
Various configurations can be made for MapReduce,
insert image description here
and yarn also needs to be configured for various environments. Configure resourcemanager
insert image description here
and nodemanager
local logs ,
history server port logs , proxy servers
, and security .
insert image description here

insert image description here


Mapred is the start of the history server
Sao
insert image description here
insert image description here

hdfs is 9870 and port
8088 is the monitoring interface of the yarn cluster.
insert image description here
init 0 shutdown
insert image description here
MapReduce does not need to start the process separately.

First experience with MapReduce and yarn

insert image description here
insert image description here
insert image description here
Master-slave, proxy server in secondary
History server needs to be started separately
insert image description here
insert image description here

Submit MapReduce tasks to yarn for execution

insert image description here
insert image description here
insert image description here
insert image description here
Hive uses MapReduce

You don’t need to write code.
Spark and Flink need to write code.
The performance is fast.
insert image description here
insert image description here
The jar means running the program.
The program code is in the jar.
The Java class is the wordcount class we want to use in the program. The
input file
and the output result should not exist in the wc folder

insert image description here
insert image description here
Results
flattered

insert image description here
Job History Server
Record History

insert image description here
The number of maps,
the number of samples,
insert image description here
and the calculation of pi
insert image description here

Monte Carlo algorithm to find pi

insert image description here
Pi is a ratio, which is calculated as the area of ​​the entire square. Multiplying the ratio is
awesome
insert image description here
. If the distance within the semicircle is less than 1, then
the number of points falling inside the semicircle is counted, which
is pi/4

Finally, easy found pi


Summarize

提示:重要经验:

1)
2) Learn oracle well, even if the economy is cold, the whole test offer is definitely not a problem! At the same time, it is also the only way for you to test the public Internet police.
3) When seeking AC in the written test, space complexity may not be considered, but the interview must consider both the optimal time complexity and the optimal space complexity.

Guess you like

Origin blog.csdn.net/weixin_46838716/article/details/130984436