Big Data: Birth of Big Data, Overview, Big Data Software Ecosystem, Apache Hadoop Overview

Big Data:

2022找工作是学历、能力和运气的超强结合体,遇到寒冬,大厂不招人,可能很多算法学生都得去找开发,测开
测开的话,你就得学数据库,sql,oracle,尤其sql要学,当然,像很多金融企业、安全机构啥的,他们必须要用oracle数据库
这oracle比sql安全,强大多了,所以你需要学习,最重要的,你要是考网络警察公务员,这玩意你不会就别去报名了,耽误时间!
与此同时,既然要考网警之数据分析应用岗,那必然要考数据挖掘基础知识,今天开始咱们就对数据挖掘方面的东西好生讲讲 最最最重要的就是大数据,什么行测和面试都是小问题,最难最最重要的就是大数据技术相关的知识笔试
insert image description here


Big Data

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
Records of various operational behaviors
insert image description hereinsert image description here
insert image description here
What kind of person is the user?
Whatever he wants to buy, he can get it basically based on the data
insert image description here
insert image description here
insert image description here

Birth of big data

insert image description here
insert image description here
Before the computer was invented, paper was used to record,
and later it was recorded by computer.
In the last century,
they were all independent computers.
insert image description here
insert image description here
Later, small-scale interconnection
insert image description here
and later global interconnection
insert image description here
insert image description here
. With the development of the global Internet, there are more and more users, and
the data is getting bigger
and bigger
. Big
is big.
Too much data , can you handle it?
A computer cannot solve this problem

insert image description here
insert image description here

distributed processing technology

The amount of data is large, and large-scale servers are used to solve it. It
needs to be stored
and calculated
.
insert image description here
insert image description here
Before 2008,
small companies could not play it, and
only large companies could do it.

Later, Aliyun appeared,
open source
Hadoop appeared, open
source

insert image description here
Awesome ,
insert image description here
insert image description here
gradually blossoming and bearing fruit.
Awesome
insert image description here
, the core is distributed computing
, storage and resource scheduling

Apache Hadoop Super Hang

Big Data Overview

insert image description here
insert image description here
The essence is the value behind
the processing of distributed massive data , mining the large volume in the digital age , the variety of data sources , the low value density, and the need to mine the quality of velocity, fast growth, fast acquisition, fast use, and high-performance veracity data , Accurate, credible, and reliable conclusions. Useful and high-quality results are mined from massive, high-growth, multi-category, and low-information-density big data . and scheduling


insert image description here
insert image description here
insert image description here





insert image description here
insert image description here




insert image description here
insert image description here
insert image description here
insert image description here

Big data software ecology

insert image description here
insert image description here
This wave is the theoretical focus of the test for the Internet police.
insert image description here
insert image description here
In 2023, the special recruitment of the Internet police will take the test of
HDFS, which is a distributed storage technology.
HBase is a nosql database technology.
HBase is based on HDFS.

insert image description here
storage technology

The following is the computing technology.
The core of the technology is MapReduce
, and Hive is the database computing technology based on MapReduce.

insert image description here
This is a compulsory test for the special recruitment network police exam
insert image description here

What about data transfer?
insert image description here
insert image description here

Storage, computing, and transmission
are all very rich

insert image description here
insert image description here
Apache
is the company
insert image description here

Apache Hadoop Overview

insert image description here
insert image description here
insert image description here
Apache Software Foundation
Distributed Storage, Computing, Resource Scheduling

insert image description here
From the avenue to simplicity, simple and important

Big Data: Birth of Big Data, Overview, Big Data Software Ecosystem, Apache Hadoop Overview

insert image description here
Resource scheduling is the forward-looking function of YARN that transmits data . It is very important
.

insert image description here
The distributed storage MapReduce in GFS
is distributed computing.
Based on these three papers, it directly designed Hadoop and made it open source.
Awesome
Awesome
Awesome

gangster
gangster
gangster

insert image description here
The open-source community version and
the commercial hairstyle version
insert image description here
insert image description here
insert image description here
insert image description here
Google is still awesome,
it has this technology itself


Summarize

提示:重要经验:

1)
2) Learn oracle well, even if the economy is cold, the whole test offer is definitely not a problem! At the same time, it is also the only way for you to test the public Internet police.
3) When seeking AC in the written test, space complexity may not be considered, but the interview must consider both the optimal time complexity and the optimal space complexity.

Guess you like

Origin blog.csdn.net/weixin_46838716/article/details/130940720