Hadoop----------Building a single-machine pseudo-cluster for big data (this article is enough)

distributed

The entire application can be formed by the collaboration of processes (programs) distributed on different hosts.

Browser/web server: Thin client program.

Big data 4V characteristics

1. Volume: large in size

2. Velocity: fast

3.Variety: many styles

4. Value: low value density

Hadoop

Open source software for reliable, scalable, distributed computing.

Is a framework that allows the processing of large data sets across computer clusters, using a simple programming model (MapReduce).

Scalable from a single server to thousands of hosts, each node provides computing and storage functions. rather than relying on highly available machines

Depends on the implementation at the application level,

Hadoop module

1.hadoop common public class library

2.HDFS hadoop distributed file system

3.Hadoop Yarn job scheduling and resource management framework

4.Hadoop MapReduce large data set parallel processing technology based on yarn system

How MapReduce works

 

Hadoop installation

  1. Jdk (recommended to use JDK 1.8.11)

Prerequisites: Prepare the Linux environment

big data hbase

1 file system

linux Exts XFS 
windons HTFS 
hbase install HDFS first

2.Icon

hbase killer whale 
hive elephant head bee tail 
hadoop elephant

3 big data ecosystem

 

Elasticsearch search engine 
language (Java/Scala 
)

Hadoop 3 versions 


1 Community version Apache Hadoop (free features are not good) 
2 Distribution version CDH (currently used) 
3 Distribution version HDP (paid features are awesome)

 

OLAP database 
OLTA big data

HDFS
MapReduce
YARN

Stand-alone Hadoop operating environment construction

1Copy base to hadoop01

hostnamectl set-hostname hadoop01

vim /etc/systemconfig/network-scripts/ifcfg-ens33 

vim  /etc/hosts 

拖入hadoop相关jar包到 /opt
cd /opt
tar -zxf hadoop-2.6.0-cdh5.14.2.tar.gz
mv hadoop soft/hadoop260
cd soft/hadoop260
cd etc/hadoop
pwd
vim hadoop-env.sh
1=============================
export JAVA_HOME=/opt/soft/jdk180
:wq
1=============================
​
vim core-site.xml
2============================
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://192.168.64.210:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/soft/hadoop260/tmp</value>
    </property>
</configuration>
:wq
2============================
​
vim hdfs-site.xml
3============================
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
:wq
3============================
​
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
4============================
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
:wq
4============================
vim yarn-site.xml
5============================
<configuration>
    <property>
        <name>yarn.resourcemanager.localhost</name>
        <value>localhost</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>
:wq 
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_HOME=/opt/soft/hadoop260
# Hadoop ENV
6=== =========================
vim /etc/profile
​#Configure
hadoop environment variables. Please use your own hadoop260
5============================
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME 
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native 
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin 
export HADOOP_INSTALL=$HADOOP_HOME 
​:
wq 
6============ ================ 
#Activate the above configuration 
source /etc/profile 
#Log in without password 
ssh-keygen -t rsa -P '' 
cd /root/.ssh/ 
ls 
ssh- copy-id -i ~/.ssh/id_rsa.pub [email protected] 
yes 
ok 
ls 
ll 
ssh 192.168.64.210 
exit 
#Remotely log in to hadoop210 as its own host name/ect/hosts or systemctl sethostname hadoop210# 
ssh hadoop210 
yes 
exit  
#Log in directly without password
ssh hadoop210 
exit 
#Format NameNode 
hdfs namenode 
-format

 

read

1 client looks for the NameNode and asks for the file location 
2 NameNode--->fsimage(editlog)--->client (give the address) 
3 client uses the address to find the data in the corresponding DataNode

Write

1 client looks for NameNode and asks for the file storage address 
2 NameNode--yarn gives an address--->client 
3 client--->DataNode--->NameNode wants to back up the node address 
4 NameNode--->DataNode-->gives Address 
5 DataNode--channel-->Transmit data to backup node 
6 After the backup node completes writing--->First DataNode--->client 
7 client-NameNode

2Start hadoop01

start-all.sh 
yes 
yes 
jps 

#Browser to view the hadoop single-machine cluster construction completed 
192.168.64.210:50070

3. Shut down the system

stop-all.sh

Guess you like

Origin blog.csdn.net/just_learing/article/details/126129255