Hadoop is mainly divided into name node auxiliary name node data node 1 data node 2 data node n.
The name node is the data node on which the record file is placed, and the data node is used to store the actual data. It is the relationship between pointers and memory.
After downloading hadoop, edit the configuration file in the xx/hadoop-2.7.3/etc/hadoop folder.
core-site.xml configures the address of the name node.
The backup of the data configured in hdfs-site.xml, that is to say, the storage of a data will be reflected on several nodes.
mapred-site.xml configures the resource framework that mapreduce depends on.
The address of the data yarn configured in yarn-site.xml
The slaves are configured with the address of the data node
After configuration, scp these configuration files to other node configurations.
The above is the configuration, the next is to start
hadoop namenode -format format hadoop space first
Then use jps to query whether there is a java process started, and then you can start it
start-dfs.sh start-yarn.sh or start-all.sh
Then use jps to query, and the results are as follows:
4099 SecondaryNameNode
4359 Jps
4283 ResourceManager
3918 NameNode
The jps command to remotely go to the data node is as follows:
3657 Jps
3513 NodeManager
3386 DataNode
hadoop fs -mkdir -p /usr/test This is to create a folder in the hadoop file system. -p means that the parent directory of test is also created together.
hadoop fs -ls / query, you can get the following content
drwxr-xr-x - ubuntu supergroup 0 2017-03-25 19:17 /usr
hadoop fs -ls -R / query, you can get the following content -R means recursion
drwxr-xr-x - ubuntu supergroup 0 2017-03-25 19:17 /usr
drwxr-xr-x - ubuntu supergroup 0 2017-03-25 19:17 /usr/test