Hadoop (Alibaba Cloud Intranet Environment) integrates springboot and some pits

Introduce dependencies

<dependency>
    <groupId>org.apache.hadoop</groupId>
     <artifactId>hadoop-client</artifactId>
     <version>2.7.3</version>
     <exclusions>
         <exclusion>
             <groupId>org.slf4j</groupId>
             <artifactId>slf4j-log4j12</artifactId>
         </exclusion>
         <exclusion>
             <groupId>javax.servlet</groupId>
             <artifactId>servlet-api</artifactId>
         </exclusion>
     </exclusions>
 </dependency>

 <dependency>
     <groupId>org.apache.hadoop</groupId>
     <artifactId>hadoop-common</artifactId>
     <version>2.7.3</version>
     <exclusions>
         <exclusion>
             <groupId>org.slf4j</groupId>
             <artifactId>slf4j-log4j12</artifactId>
         </exclusion>
         <exclusion>
             <groupId>javax.servlet</groupId>
             <artifactId>servlet-api</artifactId>
         </exclusion>
     </exclusions>
 </dependency>

 <dependency>
     <groupId>org.apache.hadoop</groupId>
     <artifactId>hadoop-hdfs</artifactId>
     <version>2.7.3</version>
     <exclusions>
         <exclusion>
             <groupId>org.slf4j</groupId>
             <artifactId>slf4j-log4j12</artifactId>
         </exclusion>
         <exclusion>
             <groupId>javax.servlet</groupId>
             <artifactId>servlet-api</artifactId>
         </exclusion>
     </exclusions>
 </dependency>

Write integration configuration

@Configuration
public class HadoopConfig {
    
    

   
    @Value("${hadoop.node}") //这个就是你的hadoop的地址
    //hadoop.node=hdfs://阿里外网ip:9000
    private String hadoopNode;

    @Bean("fileSystem")
    public FileSystem createFs() throws Exception {
    
    
        //读取配置文件
        org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
        conf.set("fs.defalutFS", hadoopNode);
        conf.set("dfs.client.use.datanode.hostname","true");
        conf.set("dfs.replication", "1");
        FileSystem fs = null;
        // 返回指定的文件系统,如果在本地测试,需要使用此种方法获取文件系统
        fs = FileSystem.get(new URI(hadoopNode), conf, "root");
        return  fs;
    }

}

Test code

@SpringBootTest
class SpringbootIntegrationTestApplicationTests {
    
    

    @Autowired
    private FileSystem fileSystem;

    @Test
    void contextLoads() {
    
    
    }

    @Test
    void hadoopTest() throws Exception {
    
    

        //本地文件上传到hdfs中
        fileSystem.copyFromLocalFile(
                new Path("/Users/wuxinxin/IdeaProjects/springboot-integration-test/src/main/resources/application.properties"),
                new Path("/user"));

    }

}

Pit and solution

1. Set to non-safe mode
Insert picture description here

错误---Name node is in safe mode
解决---hadoop dfsadmin -safemode leave

2. The exposed node uses the domain name, otherwise the Alibaba Cloud intranet address is returned and the datanode cannot be accessed
Insert picture description here
Insert picture description here

错误----Connection refused
   -----Excluding datanode DatanodeInfoWithStorage
   [172.24.55.121:50010,DS-2b89f9a8-037a-499b-807b-7ea54bc99205,DISK]
解决
1.使用域名配置方式--etc/hadoop/core-site.xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop:9000</value>
    </property>
</configuration>
2.在阿里镜像修改hosts--
172.24.55.121(阿里内网地址) hadoop
3.要求namenode返回datanode的域名,而不是内网地址,否则访问不了datanode
Configuration conf = new Configuration();
conf.set("dfs.client.use.datanode.hostname","true");

3. The domain name of the returned datanode cannot be resolved
Insert picture description here

错误---DataStreamer Exception
   ---UnresolvedAddressException
解决---本地host修改,对datanode域名进行公网ip映射
39.10.62.158(阿里云公网ip) hadoop   

View effect

Insert picture description here
Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_38312719/article/details/114647902