Big data: Distributed resource scheduling framework YARN, core architecture, master-slave structure, auxiliary structure, yarn and MapReduce deployment and configuration, Monte Carlo method for PI
2022找工作是学历、能力和运气的超强结合体,遇到寒冬,大厂不招人,可能很多算法学生都得去找开发,测开
测开的话,你就得学数据库,sql,oracle,尤其sql要学,当然,像很多金融企业、安全机构啥的,他们必须要用oracle数据库
这oracle比sql安全,强大多了,所以你需要学习,最重要的,你要是考网络警察公务员,这玩意你不会就别去报名了,耽误时间!
与此同时,既然要考网警之数据分析应用岗,那必然要考数据挖掘基础知识,今天开始咱们就对数据挖掘方面的东西好生讲讲 最最最重要的就是大数据,什么行测和面试都是小问题,最难最最重要的就是大数据技术相关的知识笔试
Article Directory
- Big data: Distributed resource scheduling framework YARN, core architecture, master-slave structure, auxiliary structure, yarn and MapReduce deployment and configuration, Monte Carlo method for PI
-
- Big Data: Distributed Resource Scheduling Framework YARN
- Yarn's architecture, core architecture and auxiliary architecture
- Auxiliary structure of yarn
- MapReduce and yarn deployment
- First experience with MapReduce and yarn
- Submit MapReduce tasks to yarn for execution
- Monte Carlo algorithm to find pi
- Summarize
Article Directory
- Big data: Distributed resource scheduling framework YARN, core architecture, master-slave structure, auxiliary structure, yarn and MapReduce deployment and configuration, Monte Carlo method for PI
- Big Data: Distributed Resource Scheduling Framework YARN
- Yarn's architecture, core architecture and auxiliary architecture
- Auxiliary structure of yarn
- MapReduce and yarn deployment
- First experience with MapReduce and yarn
- Submit MapReduce tasks to yarn for execution
- Monte Carlo algorithm to find pi
- Summarize
Big Data: Distributed Resource Scheduling Framework YARN
Yarn manages resources and scheduling MapReduce
can be implemented in the scheduling process
Tens of thousands of people in the school can be managed if there are classrooms. Scheduling
is easy
and efficient.
The best cluster
manager
Allocation of resources, decentralized computing, and aggregation are all supervised by yarn, and allocation
Applying for
yarn can schedule resources.
The remaining
three components are used by others, which is basically OK.
Storage, computing, resource scheduling.
Yarn's architecture, core architecture and auxiliary architecture
One storage, one resource scheduling,
each is its own,
the same as the factory,
the general chairman resourcemanager
Each factory manager nodemanager
The director will make overall arrangements,
and the rest of the factory directors can make their own arrangements.
Customers only need to ask the resourcemanager for resources, then they can get the
container container, just find the container
one by one, you can’t load more , you can only load so much
Auxiliary structure of yarn
Auxiliary to improve security
Just provide security guarantee for yarn
History, records
feel like auxiliary work
Isolate resources , simply set up a server that
records logs in a unified way, and collect logs in a unified way .
This is the Auxiliary Architecture
Master-Slave Role
Auxiliary Role
MapReduce and yarn deployment
Deployment is to start the master-slave auxiliary node
MapReduce running on yarn
does not need to start the process, only need to modify the configuration
Why configure so much memory on node1,
it takes on a lot of things
Various configurations can be made for MapReduce,
and yarn also needs to be configured for various environments. Configure resourcemanager
and nodemanager
local logs ,
history server port logs , proxy servers
, and security .
Mapred is the start of the history server
Sao
hdfs is 9870 and port
8088 is the monitoring interface of the yarn cluster.
init 0 shutdown
MapReduce does not need to start the process separately.
First experience with MapReduce and yarn
Master-slave, proxy server in secondary
History server needs to be started separately
Submit MapReduce tasks to yarn for execution
Hive uses MapReduce
You don’t need to write code.
Spark and Flink need to write code.
The performance is fast.
The jar means running the program.
The program code is in the jar.
The Java class is the wordcount class we want to use in the program. The
input file
and the output result should not exist in the wc folder
Results
flattered
Job History Server
Record History
The number of maps,
the number of samples,
and the calculation of pi
Monte Carlo algorithm to find pi
Pi is a ratio, which is calculated as the area of the entire square. Multiplying the ratio is
awesome
. If the distance within the semicircle is less than 1, then
the number of points falling inside the semicircle is counted, which
is pi/4
Finally, easy found pi
Summarize
提示:重要经验:
1)
2) Learn oracle well, even if the economy is cold, the whole test offer is definitely not a problem! At the same time, it is also the only way for you to test the public Internet police.
3) When seeking AC in the written test, space complexity may not be considered, but the interview must consider both the optimal time complexity and the optimal space complexity.