hadoop开发第三步之HDFS

在hdfs上操作:

 备注:每次打开hdfs与yarn前需要将namenode格式化,否则namenode将不会开启

 即:

hadoop namenode -format    #先格式化,中间有时需要同意某个东西,即y同意
start-yarn.sh              #start-all.sh 不推荐使用,但也可以用
start-dfs.sh

使用jps可以查看namenode和datanode等是否全部开启

hadoop fs -mkdir /input           #在hdfs上创建input目录
echo "hello adu  hello world"> file    #创建本地文件
hadoop fs -put file /input            #上传到hdfs上

可以在localhost:50070查看hdfs的中状态(页面右上侧utilities中browse the file system)

idea中pom.xml内容:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" 
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>hadoop</groupId>
    <artifactId>com.adu</artifactId>
    <version>1.0-SNAPSHOT</version>
    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.7.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.7.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.7.2</version>
        </dependency>
    </dependencies>
    <repositories>
        <repository>
            <id>apache</id>
            <url>http://maven.apache.org</url>
        </repository>
    </repositories>
    <build>
        <plugins>
            <plugin>
                <artifactId>maven-dependency-plugin</artifactId>
                <configuration>
                    <excludeTransitive>false</excludeTransitive>
                    <stripVersion>true</stripVersion>
                    <outputDirectory>./lib</outputDirectory>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

map函数和reduce函数和前几章中的代码一样

点击run》》edit configurations

将program arguments改为:hdfs://localhost:9000/input/file   hdfs://localhost:9000/output

其他和上一章一样,上面两个目录之间有一个空格,并且output目录自己不用创建,file文件为上传文件


猜你喜欢

转载自blog.csdn.net/douzhenwen/article/details/80140259