distributed
The entire application can be formed by the collaboration of processes (programs) distributed on different hosts.
Browser/web server: Thin client program.
Big data 4V characteristics
1. Volume: large in size
2. Velocity: fast
3.Variety: many styles
4. Value: low value density
Hadoop
Open source software for reliable, scalable, distributed computing.
Is a framework that allows the processing of large data sets across computer clusters, using a simple programming model (MapReduce).
Scalable from a single server to thousands of hosts, each node provides computing and storage functions. rather than relying on highly available machines
Depends on the implementation at the application level,
Hadoop module
1.hadoop common public class library
2.HDFS hadoop distributed file system
3.Hadoop Yarn job scheduling and resource management framework
4.Hadoop MapReduce large data set parallel processing technology based on yarn system
How MapReduce works
Hadoop installation
- Jdk (recommended to use JDK 1.8.11)
Prerequisites: Prepare the Linux environment
big data hbase
1 file system
linux Exts XFS windons HTFS hbase install HDFS first
2.Icon
hbase killer whale hive elephant head bee tail hadoop elephant
3 big data ecosystem
Elasticsearch search engine language (Java/Scala )
Hadoop 3 versions 1 Community version Apache Hadoop (free features are not good) 2 Distribution version CDH (currently used) 3 Distribution version HDP (paid features are awesome)
OLAP database OLTA big data
HDFS MapReduce YARN
Stand-alone Hadoop operating environment construction
1Copy base to hadoop01
hostnamectl set-hostname hadoop01vim /etc/systemconfig/network-scripts/ifcfg-ens33
vim /etc/hosts
拖入hadoop相关jar包到 /opt
cd /opt
tar -zxf hadoop-2.6.0-cdh5.14.2.tar.gz
mv hadoop soft/hadoop260
cd soft/hadoop260
cd etc/hadoop
pwd
vim hadoop-env.sh
1=============================
export JAVA_HOME=/opt/soft/jdk180
:wq
1=============================
vim core-site.xml
2============================
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.64.210:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/soft/hadoop260/tmp</value>
</property>
</configuration>
:wq
2============================
vim hdfs-site.xml
3============================
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
:wq
3============================
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
4============================
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
:wq
4============================
vim yarn-site.xml
5============================
<configuration>
<property>
<name>yarn.resourcemanager.localhost</name>
<value>localhost</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
:wq
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_HOME=/opt/soft/hadoop260
# Hadoop ENV
6=== =========================
vim /etc/profile
#Configure
hadoop environment variables. Please use your own hadoop260
5============================
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_INSTALL=$HADOOP_HOME
:
wq
6============ ================
#Activate the above configuration
source /etc/profile
#Log in without password
ssh-keygen -t rsa -P ''
cd /root/.ssh/
ls
ssh- copy-id -i ~/.ssh/id_rsa.pub [email protected]
yes
ok
ls
ll
ssh 192.168.64.210
exit
#Remotely log in to hadoop210 as its own host name/ect/hosts or systemctl sethostname hadoop210#
ssh hadoop210
yes
exit
#Log in directly without password
ssh hadoop210
exit
#Format NameNode
hdfs namenode
-format
read
1 client looks for the NameNode and asks for the file location 2 NameNode--->fsimage(editlog)--->client (give the address) 3 client uses the address to find the data in the corresponding DataNode
Write
1 client looks for NameNode and asks for the file storage address 2 NameNode--yarn gives an address--->client 3 client--->DataNode--->NameNode wants to back up the node address 4 NameNode--->DataNode-->gives Address 5 DataNode--channel-->Transmit data to backup node 6 After the backup node completes writing--->First DataNode--->client 7 client-NameNode
2Start hadoop01
start-all.sh yes yes jps #Browser to view the hadoop single-machine cluster construction completed 192.168.64.210:50070
3. Shut down the system
stop-all.sh