1. Basic Concepts
namenode: Metadata such as directories, data blocks, etc. of dfs
datanode: specific data
Synchronization of metadata between journalnode namenodez
dfs:distributed file system
mapred:map reduce
ResourceManager: total entry and total scheduling (for an app)
ApplicationMaster: specific job scheduling (supports non-map reduce)
NodeManager: A node's management daemon
container: the environment (resource) executed within the node
Job History Server (api +RPC): Collect and display log information
WebAppProxy: a relay between internal and external access
yarn.nodemanager.health-checker.script.path: 监 copy node
Rack Awareness: Rack Awareness, Improve Scheduling Performance
Second, installation: configuration + start
1. Configuration:
etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
etc/hadoop/mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
etc/hadoop/yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
2. Make sure you can ssh localhost
3、start:
bin / hdfs purpose -format
sbin/start-dfs.sh
sbin/start-yarn.sh
4、url
http://localhost:50070/ # dfs
http://localhost:8088/ # yarn
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/root # create user
$ bin/hdfs dfs -put etc/hadoop input
# execute jar
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'
bin/hdfs dfs -get output output
cat output/*
5. Stop
$ sbin/stop-yarn.sh
$ sbin/stop-dfs.sh
3. Command
hadoop archive -archiveName zoo.har -p /foo/bar -r 3 /outputdir
hadoop classpath --glob
hadoop jar * .jar # 执行 jar
hadoop fs -appendToFile localfile /user/hadoop/hadoopfile # fs命令
Fourth, the file system common commands
bin/hadoop fs -cat /user/root/output/*
hdfs dfsadmin -disallowSnapshot <path>
hdfs dfs -createSnapshot <path> [<snapshotName>]
hadoop dfs -df /user/hadoop/dir1
bin/hadoop fs -ls /user/root/output/*
5. Others
1. CLI MiniCluster: avoid configuration and start a cluster by parameterization
bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3-tests.jar minicluster -rmport RM_PORT -jhsport JHS_PORT
2. Rack Awareness requires a script extension to output /myrack/myhost