hadoop installation and maintenance

1. Basic Concepts

namenode: Metadata such as directories, data blocks, etc. of dfs

datanode: specific data

Synchronization of metadata between journalnode namenodez

dfs:distributed file system

mapred:map reduce

 

ResourceManager: total entry and total scheduling (for an app)

ApplicationMaster: specific job scheduling (supports non-map reduce)

NodeManager: A node's management daemon

container: the environment (resource) executed within the node

Job History Server (api +RPC): Collect and display log information

WebAppProxy: a relay between internal and external access

yarn.nodemanager.health-checker.script.path: 监 copy node

Rack Awareness: Rack Awareness, Improve Scheduling Performance

 

Second, installation: configuration + start

1. Configuration:

etc/hadoop/core-site.xml:

<configuration>

    <property>

        <name>fs.defaultFS</name>

        <value>hdfs://localhost:9000</value>

    </property>

</configuration>

 

etc/hadoop/hdfs-site.xml:

<configuration>

    <property>

        <name>dfs.replication</name>

        <value>1</value>

    </property>

</configuration>

 

etc/hadoop/mapred-site.xml:

<configuration>

    <property>

        <name>mapreduce.framework.name</name>

        <value>yarn</value>

    </property>

</configuration>

 

etc/hadoop/yarn-site.xml:

<configuration>

    <property>

        <name>yarn.nodemanager.aux-services</name>

        <value>mapreduce_shuffle</value>

    </property>

</configuration>

2. Make sure you can ssh localhost

3、start:

bin / hdfs purpose -format

sbin/start-dfs.sh

sbin/start-yarn.sh

4、url

http://localhost:50070/  # dfs

http://localhost:8088/ # yarn

 

$ bin/hdfs dfs -mkdir /user

$ bin/hdfs dfs -mkdir /user/root # create user

$ bin/hdfs dfs -put etc/hadoop input

# execute jar

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'

 

bin/hdfs dfs -get output output

cat output/*

 

5. Stop

$ sbin/stop-yarn.sh

$ sbin/stop-dfs.sh

 

3. Command

hadoop archive -archiveName zoo.har -p /foo/bar -r 3 /outputdir

hadoop classpath --glob 

hadoop jar * .jar # 执行 jar

hadoop fs -appendToFile localfile /user/hadoop/hadoopfile # fs命令

 

Fourth, the file system common commands

bin/hadoop fs -cat /user/root/output/*

hdfs dfsadmin -disallowSnapshot <path>

hdfs dfs -createSnapshot <path> [<snapshotName>]

hadoop dfs -df /user/hadoop/dir1

bin/hadoop fs -ls /user/root/output/*

 

5. Others

1. CLI MiniCluster: avoid configuration and start a cluster by parameterization

bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3-tests.jar minicluster -rmport RM_PORT -jhsport JHS_PORT

2. Rack Awareness requires a script extension to output /myrack/myhost

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326494705&siteId=291194637