1. 首先创建Configuration类
Configuration config = new Configuration();
2.然后创建DatasetRepositoryFactory类
DatasetRepositoryFactory datasetRepositoryFactory = new DatasetRepositoryFactory();
有三个方法
setConf() 里面放Configuration的实例
setNamespace() 里面放数据目录
setBasePath() 里面放hdfs的地址- hdfs://......
DatasetDefinition类
DatasetDefinition definition = new DatasetDefinition();
definition.setFormat(Formats.AVRO.getName()); // file type
definition.setTargetClass(实体.class);
definition.setAllowNullValues(false); // 是否允许空值
definition.setPartitionStrategy(
new PartitionStrategy.Builder()
.identity("实体中的分区属性名", "表中的分区字段")
.build());
DataStoreWriter<实体>类
new AvroPojoDatasetStoreWriter<ImportPojo>(ImportPojo.class, datasetRepositoryFactory(),
刚刚创建的DatasetDefinition实例)
DatasetOperations类
DatasetTemplate datasetOperations = new DatasetTemplate();
datasetOperations.setDatasetDefinitions(Collections.singletonList(刚刚创建的DatasetDefinition实例));
datasetOperations.setDatasetRepositoryFactory(刚刚创建的DatasetRepositoryFactory实例);
(DatasetOperations)datasetOperations // 强转一下就好,接口关系
这么一串下来,获得一个DatasetOperations类的实例对象
DatasetOperations类
write(Collection<T> records)方法,接受一个Collection参数,list<XXX>,会将数据以刚刚上面设置的格式写入hdfs,我设置的avro
由于kerberos的原因,还为测试能否成功,再测
==================================================
上面比较乱,下面贴一个简单的测试,将实体生成为avro文件上传至hdfs
package com.avro; import java.util.ArrayList; import java.util.Collections; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.junit.Test; import org.kitesdk.data.Formats; import org.kitesdk.data.PartitionStrategy; import org.springframework.data.hadoop.store.DataStoreWriter; import org.springframework.data.hadoop.store.dataset.AvroPojoDatasetStoreWriter; import org.springframework.data.hadoop.store.dataset.DatasetDefinition; import org.springframework.data.hadoop.store.dataset.DatasetOperations; import org.springframework.data.hadoop.store.dataset.DatasetRepositoryFactory; import org.springframework.data.hadoop.store.dataset.DatasetTemplate; import com.avro.pojo.ImportPOJO; public class AVROTest { @Test public void test() throws Exception { Configuration config = new Configuration(); DatasetRepositoryFactory datasetRepositoryFactory = new DatasetRepositoryFactory(); datasetRepositoryFactory.setConf(config); datasetRepositoryFactory.setNamespace("files"); datasetRepositoryFactory.setBasePath("hdfs://localhost:9000/"); datasetRepositoryFactory.afterPropertiesSet(); DatasetDefinition definition = new DatasetDefinition(); definition.setFormat(Formats.AVRO.getName()); definition.setTargetClass(ImportPOJO.class); definition.setAllowNullValues(false); definition.setPartitionStrategy( new PartitionStrategy.Builder() .identity("id", "pre_id") .build()); DataStoreWriter<ImportPOJO> dataStoreWriter = new AvroPojoDatasetStoreWriter<ImportPOJO>(ImportPOJO.class, datasetRepositoryFactory,definition); DatasetTemplate datasetOperations = new DatasetTemplate(); datasetOperations.setDatasetDefinitions(Collections.singletonList(definition)); datasetOperations.setDatasetRepositoryFactory(datasetRepositoryFactory); DatasetOperations DatasetOperations = (DatasetOperations)datasetOperations; ImportPOJO pojo = new ImportPOJO(); pojo.setId(4); pojo.setName("rihanna"); pojo.setAge(24); List<ImportPOJO> list = new ArrayList<ImportPOJO>(); list.add(pojo); DatasetOperations.write(list); } }
Configuration我没有加其他的东西,因为环境就是本机器的,由于我的是linux系统,用户是root,和是hdfs上的用户是hadoop,我在eclipse中测试,为了方便,我在运行的时候加了参数-DHADOOP_USER_NAME=hadoop,然后就行了,先测试把