Introduce dependencies
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.3</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.3</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.3</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
</exclusions>
</dependency>
Write integration configuration
@Configuration
public class HadoopConfig {
@Value("${hadoop.node}") //这个就是你的hadoop的地址
//hadoop.node=hdfs://阿里外网ip:9000
private String hadoopNode;
@Bean("fileSystem")
public FileSystem createFs() throws Exception {
//读取配置文件
org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
conf.set("fs.defalutFS", hadoopNode);
conf.set("dfs.client.use.datanode.hostname","true");
conf.set("dfs.replication", "1");
FileSystem fs = null;
// 返回指定的文件系统,如果在本地测试,需要使用此种方法获取文件系统
fs = FileSystem.get(new URI(hadoopNode), conf, "root");
return fs;
}
}
Test code
@SpringBootTest
class SpringbootIntegrationTestApplicationTests {
@Autowired
private FileSystem fileSystem;
@Test
void contextLoads() {
}
@Test
void hadoopTest() throws Exception {
//本地文件上传到hdfs中
fileSystem.copyFromLocalFile(
new Path("/Users/wuxinxin/IdeaProjects/springboot-integration-test/src/main/resources/application.properties"),
new Path("/user"));
}
}
Pit and solution
1. Set to non-safe mode
错误---Name node is in safe mode
解决---hadoop dfsadmin -safemode leave
2. The exposed node uses the domain name, otherwise the Alibaba Cloud intranet address is returned and the datanode cannot be accessed
错误----Connection refused
-----Excluding datanode DatanodeInfoWithStorage
[172.24.55.121:50010,DS-2b89f9a8-037a-499b-807b-7ea54bc99205,DISK]
解决
1.使用域名配置方式--etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop:9000</value>
</property>
</configuration>
2.在阿里镜像修改hosts--
172.24.55.121(阿里内网地址) hadoop
3.要求namenode返回datanode的域名,而不是内网地址,否则访问不了datanode
Configuration conf = new Configuration();
conf.set("dfs.client.use.datanode.hostname","true");
3. The domain name of the returned datanode cannot be resolved
错误---DataStreamer Exception
---UnresolvedAddressException
解决---本地host修改,对datanode域名进行公网ip映射
39.10.62.158(阿里云公网ip) hadoop
View effect