在hdfs上操作:
备注:每次打开hdfs与yarn前需要将namenode格式化,否则namenode将不会开启
即:
hadoop namenode -format #先格式化,中间有时需要同意某个东西,即y同意 start-yarn.sh #start-all.sh 不推荐使用,但也可以用 start-dfs.sh
使用jps可以查看namenode和datanode等是否全部开启
hadoop fs -mkdir /input #在hdfs上创建input目录 echo "hello adu hello world"> file #创建本地文件 hadoop fs -put file /input #上传到hdfs上
可以在localhost:50070查看hdfs的中状态(页面右上侧utilities中browse the file system)
idea中pom.xml内容:
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>hadoop</groupId> <artifactId>com.adu</artifactId> <version>1.0-SNAPSHOT</version> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.7.2</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.7.2</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.7.2</version> </dependency> </dependencies> <repositories> <repository> <id>apache</id> <url>http://maven.apache.org</url> </repository> </repositories> <build> <plugins> <plugin> <artifactId>maven-dependency-plugin</artifactId> <configuration> <excludeTransitive>false</excludeTransitive> <stripVersion>true</stripVersion> <outputDirectory>./lib</outputDirectory> </configuration> </plugin> </plugins> </build> </project>
map函数和reduce函数和前几章中的代码一样
点击run》》edit configurations
将program arguments改为:hdfs://localhost:9000/input/file hdfs://localhost:9000/output
其他和上一章一样,上面两个目录之间有一个空格,并且output目录自己不用创建,file文件为上传文件