Install Hadoop on Windows 10

One install Hadoop

  1. Download Hadoop-3.0.0 from http://archive.apache.org/dist/hadoop/core/ , download the binary compressed package file: hadoop-3.0.0.tar.gz 
  2. From https://github.com/steveloughran/winutils download for Windows environments Hadoop fixes winutils (Hadoop-3.0.0 corresponding path https://github.com/steveloughran / winutils / Tree /master/hadoop-3.0.0 ). If you will not use Git tool to download, you can directly download the package file https://github.com/steveloughran/winutils/archive/master.zip and then unzip it.
  3. Make sure that the Java development and running environment of JDK 1.8 or above is correctly installed on your computer. (You can verify by running the java –version command in the command line mode)
  4. Unzip hadoop-3.0.0.tar.gz to the C: \ Hadoop subdirectory (you can change the Hadoop installation directory according to your preferences)
  5. Add the environment variable HADOOP_HOME (as a system variable) and set its value to "C: \ Hadoop". (Setting method: Enter "Control Panel \ System and Security \ System", then click "Advanced System Settings", and then click the "Environment Variable (N) .." button)
  6. Observe whether the JAVA_HOME variable is set correctly. (JAVA_HOME on my computer is set to C: \ Program Files \ Java \ jdk1.8.0_192)
  7. Add the "C: \ Hadoop \ bin" and "C: \ Hadoop \ sbin" paths to the Path environment variable.
  8. Paste the following content into the C: \ Hadoop \ etc \ hadoop \ core-site.xml file:

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

  1. Paste the following content into the C: \ Hadoop \ etc \ hadoop \ mapred-site.xml file:

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

  1. Create a "data" subdirectory under the C: \ Hadoop path;

Create a "namenode" subdirectory under the C: \ Hadoop \ data path;

Create a "datanode" subdirectory under the C: \ Hadoop \ data path.

  1. Paste the following into the C: \ Hadoop \ etc \ hadoop \ hdfs-site.xml file:

<configuration>

   <property>

       <name>dfs.replication</name>

       <value>1</value>

   </property>

   <property>

       <name>dfs.namenode.name.dir</name>

       <value>C:\hadoop\data\namenode</value>

   </property>

   <property>

       <name>dfs.datanode.data.dir</name>

       <value>C:\hadoop\data\datanode</value>

   </property>

</configuration>

  1. Paste the following content into the C: \ Hadoop \ etc \ hadoop \ yarn-site.xml file:

<configuration>

   <property>

           <name>yarn.nodemanager.aux-services</name>

           <value>mapreduce_shuffle</value>

   </property>

   <property>

              <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name> 

        <value>org.apache.hadoop.mapred.ShuffleHandler</value>

   </property>

</configuration>

  1. 编辑文件C:\Hadoop\etc\hadoop\hadoop-env.cmd,将语句:

“set JAVA_HOME=%JAVA_HOME%”

修改为“set JAVA_HOME=C:\PROGRA~1\Java\jdk1.8.0_192”。

(注意:此处有坑。如果将hadoop-env.cmd中的JAVA_HOME设置为“C:\Program Files\Java\jdk1.8.0_192”,将会出错,因为路径中不能含有空格。)

  1. C:\Hadoop\bin目录下的内容删除。
  2. 将第2步下载的“winutils-master.zip”解压,然后将解压文件中“..\winutils-master\hadoop-3.0.0\bin”目录下的内容拷贝到“C:\Hadoop\bin”目录。

二 检验Hadoop是否按照成功

经过前面15步,Hadoop安装结束。

运行如下命令测试Hadoop是否可以正常运行:

  1. 运行命令“hdfs namenode –format”,先对HDFS分布式文件系统进行格式化。
  2. 进入“C:\Hadoop\sbin”目录运行命令“start-dfs”。如果一切正常,将会启动一个“hdfs namenode”进程和一个“hdfs datanode”进程,构成了只有1个主节点和1个从节点的“HDFS分布式文件系统集群”。可以通过“http://localhost:9870”监控HDFS系统。(用jps命令可以查看所有jvm相关的进程)
  3. HDFS分布式文件系正常启动之后,可以用“hadoop fs”或“hdfs dfs”命令在分布式文件系统中实现“浏览目录”、“创建子目录”、“删除子目录”、“创建文件”、“拷贝文件”、“移动子目录或文件”、“查看文件内容”、“删除文件”、“上传本地文件”等操作。

hadoop fs ls /

显示根目录下的所有文件和目录

hadoop fs –mkdir /test

创建子目录/test,创建多级目录 加上 p

hadoop fs -rm /test1.txt

删除文件

hadoop fs -rm -r /test

删除子目录(要加-r参数)

hadoop fs -put C:\tmp\test.txt \test

将本地文件上传到HDFS分布式文件系统

hadoop fs -cat \test\test.txt

查看文件内容

hadoop fs -cp URI [URI …] <dest>

cp 复制系统内文件

hadoop fs -get[-ignorecrc] [-crc] <src> <localdst>

下载文件到本地

hadoop fs -mv URI [URI …] <dest>

将文件从源路径移动到目标路径

hadoop fs -du URI [URI …]

显示文件大小

  1. http://localhost:9870工具中在“utilities”菜单下有一个“Browse the File System”工具,可以浏览、操作HDFS分布式文件系统。
  2. 进入“C:\Hadoop\sbin”目录运行命令“stop-dfs”,关闭HDFS分布式文件系统。

三 运行MapReduce任务

注意:以下操作必须以管理员身份执行。

  1. 进入“C:\Hadoop\sbin”目录运行命令“start-all”。如果一切正常,将会启动一个“hdfs namenode”进程、一个“hdfs datanode”进程、一个“yarn resourcemanager”进程,一个“yarn nodemanager”进程。
  2. 用“hadoop fs –mkdir /input”命令在HDFS中创建“/input”子目录。
  3. 用“hadoop fs –put c:\source\input_file.txt /input”命令把本地文件系统中的“c:\source\input_file.txt”文件上传到HDFS的“/input”子目录。input_file.txt文件将作为WordCount MapReduce任务的输入文件。
  4. 用命令“yarn jar c:\source\mrtest.jar WordCount /input /output”命令启动MapReduce任务。任务执行成功后到“/output”目录下查找执行结果文件。
发布了108 篇原创文章 · 获赞 48 · 访问量 5万+

Guess you like

Origin blog.csdn.net/larry1648637120/article/details/93380911