About vim using sudo apt-get install vim installed
vi often go wrong, not used, it is recommended to use vim
First, the local mode === "the official explained Case Case Grep
Create a folder in the input file hadoop-2.7.7 below
mkdir /opt/software/hadoop-2.7.7/input
Copy the hadoop configuration file into the input
cp /opt/software/hadoop-2.7.7/etc/hadoop/*.xml /opt/software/hadoop-2.7.7/input
Explain the format command: cp file copy destination file path
Case Grep performed:
Syntax: execution command jar coated with hadoop jar package path wordcount case within a jar file to be calculated (input) the output (output) regular filterhadoop jar /opt/software/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar grep input output 'd[a-z.]+'
Command Function: Calculate the number of input appears at the beginning of the document d
* Note that output can not be created
otherwise it will throw an exception file already exists
org.apache.hadoop.mapred.FileAlreadyExistsException: the Output File Directory: /opt/software/hadoop-2.7.7/output already EXISTS
AT org.apache.hadoop.mapreduce. lib.output.FileOutputFormat.checkOutputSpecs (FileOutputFormat.java:146)See the results,
first look at the output of those filesls -al /opt/software/hadoop-2.7.7/output
cat look at the output file, such as
cat /opt/software/hadoop-2.7.7/output/part-r-00000
Two, wordcount case === "count the number of words
- The first step to create a folder
mkdir /opt/software/hadoop-2.7.7/wcinput- Wc.input file into the folder created some of the content is written to the file
into the folderCreate a filecd /opt/software/hadoop-2.7.7/wcinput
touch /opt/software/hadoop-2.7.7/wcinput/wc.input
Write data to a fileFor example, writes:vim /opt/software/hadoop-2.7.7/wcinput/wc.input
wc.input contents of the filehadoop yarn hadoop mapreduce abcdabc admin a
- Execute command format
* Note that the output path can not exist in advance! ! !
hadoop jar hadoop-mapreduce-examples- 2.7.7.jar wordcount computing file name output file
: explain command execution jar package with hadoop command jar package path wordcount cases within a jar file to calculate the outputhadoop jar /opt/software/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount wcinput wcoutput
- The fourth step, cat cat /opt/software/hadoop-2.7.7/wcoutput/part-r-00000 View Results
Third, the pseudo-distributed mode to start and run MapReduce hdfs
* Note Enabling pseudo-distributed mode, the local mode can not be used! !
The reason is that using a native mode protocol File, and using a pseudo-distribution pattern hdfs protocol.Pseudo-distribution model is based on the cluster approach built, actually fully distributed mode. End configuration is applicable to the computer, no more than one computer programmers
cd /opt/software/hadoop-2.7.7/etc/hadoop
The first step, configure the environment
- Cluster Configuration
1.1 Configuration hadoop-env.sh ===> java_home environment to configure hadoopYou may further open a window, and then enter the acquired echo $ JAVA_HOME position jdkvim /opt/software/hadoop-2.7.7/etc/hadoop/hadoop-env.sh
/opt/software/jdk1.8.0_211/ #不要看到就复制粘贴,看上面的讲解!这个路径是通过上面查询到JAVA_HOME的值
1.2 arrangement core-site.xmlvim /opt/software/hadoop-2.7.7/etc/hadoop/core-site.xml
Add the <configuration> </ configuration> tag
<!-- 指定HDFS中NameNode的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value><!--按集群搭建时需要改成相应的域名--> </property> <!-- 指定Hadoop运行时产生文件的存储目录 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/software/hadoop-2.7.7/data/tmp</value> </property>
==============================================
1.3, configuration hdfs -site.xmlvim /opt/software/hadoop-2.7.7/etc/hadoop/hdfs-site.xml
As above the tag placed in the configuration
<!-- 指定HDFS副本的数量 --> <property> <name>dfs.replication</name> <value>1</value> </property>
=============================================
Step Two: Start Cluster
1, format NameNode ( Note: The first format start, since you do not always formatted! )About not been formatted to view this https://blog.csdn.net/qq_41813208/article/details/100753659
hdfs namenode -format
2, start NameNode
/opt/software/hadoop-2.7.7/sbin/hadoop-daemon.sh start namenode
3, start DataNode
/opt/software/hadoop-2.7.7/sbin/hadoop-daemon.sh start datanode
Step 3: Check the cluster
Enter jpsThe results obtained were as follows, and description NameNode DataNode normal! ! !
Note: jps in the JDK command, not a Linux command. Do not install the JDK can not use the jps
Within a virtual machine is in the browser enter the server ip: 50070 Porthttp://localhost:50070
Real host enter, ip virtual machine: 50070 to access the same, normally below
If you encounter inaccessible: Check the firewall is not turned off
centos6关闭防火墙: ================ service iptables status chkconfig iptables off //永久关闭防火墙 service iptables stop //临时关闭防火墙 service iptables status ccentos7关闭防火墙: ============= systemctl status firewalld //查看状态 systemctl stop firewalld //关闭防火墙 systemctl start firewalld //启动防火墙 systemctl disable firewalld //开机禁用防火墙 systemctl enable firewalld //开机启动防火墙
Step Four: Operating the cluster
Create a directory
in /opt/module/hadoop-2.7.7 under execute the following commandhdfs dfs -mkdir -p /home/hadoop/user/atgnnu/input
Here take note! ! ! Once created without any prompts, the directory does not exist on a real machine!
Once created you can view below
Test files on a file system
to upload files to the home examples.desktophdfs dfs -put examples.desktop /home
注意命令格式
hdfs dfs -put 想要上传的文件 上传到对应的目录(该目录必须要用上面的命令创建,原因是目录必须有hadoop已经管理的) 成功点击上面红框查看到根目录下的home文件夹,点进去后可以查看到子目录
所以 你可以随便找个文件替换上面的examples.desktop文件
上传成功的可以在浏览器上找到上传的文件,以及可以下载下来查看文件
Block Size表示 存放文件的最小单位,一块占128MB,可以后面更改,节约空间命令参数: -put 上传文件到/user/atguigu/input/下 hdfs dfs -put wcinput/wc.input /user/atgnnu/input/ -ls 查看文件是否正确 hdfs dfs -ls /user/atgnnu/input/ 运行mapreduce hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /user/atgnnu/input/ /user/atgnnu/output -cat 查看输出结果 hdfs dfs -cat /user/atgnnu/output/* -get 测试文件内容下载到本地 hdfs dfs -get /user/atgnnu/output/part-r-00000 ./wcoutput/ -rm -r 删除输出结果 hdfs dfs -rm -r /user/atgnnu/output
伪分布式模式二:YARN方式启动集群
第一步、配置yarn 集群环境
- 修改yarn-env.sh
vim /opt/software/hadoop-2.7.7/etc/hadoop/yarn-env.sh
和之前一样配置JAVA_HOME环境变量,环境变量的获取用echo $JAVA_HOME获取jdk的路径
/opt/software/jdk1.8.0_211/ #不要看到就复制粘贴,看上面的讲解!这个路径是通过上面查询到JAVA_HOME的值
- 修改yarn-site.xml
vim /opt/software/hadoop-2.7.7/etc/hadoop/yarn-env.sh<!-- Reducer获取数据的方式 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
- 修改mapred-env.sh 和前面一样配置JAVA_HOME
vim /opt/software/hadoop-2.7.7/etc/hadoop/mapred-env.sh
/opt/software/jdk1.8.0_211/ #不要看到就复制粘贴,看上面的讲解!这个路径是通过上面查询到JAVA_HOME的值
- 修改mapred-site.xml
先cp复制一个模板文件并且命名去掉template后缀cp /opt/software/hadoop-2.7.7/etc/hadoop/mapred-site.xml.template /opt/software/hadoop-2.7.7/etc/hadoop/mapred-site.xml
vim /opt/software/hadoop-2.7.7/etc/hadoop/mapred-site.xml
复制下面的代码到<confuguration></configuration>标签内
<!-- 指定MR运行在YARN上 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
第二步、启动集群
(用jps查看一下)首先保证namenode 和 datanode启动起来
如果配置了hadoop环境变量则可以直接输入下面两条命令,没有配置则找前面的完整路径启动
启动了就不需要运行下面命令:hadoop-daemon.sh start namenode; hadoop-daemon.sh start datanode
配置了环境变量则可以简写
yarn-daemon.sh start resourcemanager; yarn-daemon.sh start nodemanager
绝对路径运行:
/opt/software/hadoop-2.7.7/sbin/yarn-daemon.sh start resourcemanager
/opt/software/hadoop-2.7.7/sbin/yarn-daemon.sh start nodemanager
操作集群:浏览器输入虚拟机的外网ip:8088得到下面的结果表示正常
- 删除文件系统的输出文件(如果不存在则不需要操作!)
命令行输入,格式如下hdfs dfs -rm -R 输出文件的文件位置
- 执行MapReduce程序(下面是一条命令)
命令格式 hadoop jar mapreduce的jar包 输入路径(浏览器可以访问到的)如下图url的 输出路径(不能提前存在)hadoop jar /opt/software/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /home/hadoop/user/atgnnu/input /home/hadoop/user/atgnnu/output
- 查看运行结果
hdfs dfs -cat /home/hadoop/user/atgnnu/output/*配置历史服务器
- 修改mapred-site.xml;
vim /opt/software/hadoop-2.7.7/etc/hadoop/mapred-site.xml<!-- 历史服务器端地址 --> <property> <name>mapreduce.jobhistory.address</name> <value>localhost:10020</value> </property> <!-- 历史服务器web端地址 --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>localhost:19888</value> </property>
完全分布式: