spark organize data into mysql

spark I was a novice, all is not particularly familiar with him, but after finishing the data, I need to generate the data into the table and I want it found two different ways.

1. Generate List <Row>

2.JavaRDD<Row>

The first way :( connection to my previous article on the generation of the Map <Integer, List>)

List<Row> rows = new ArrayList<>();
Iterator<Entry<Integer, List>> itMap = map.entrySet().iterator();
while(itMap.hasNext()){//遍历map
    Entry<Integer, List> entry = itMap.next();
    int key = entry.getKey();
    List list2 = entry.getValue();
    for(int i = 0;i<list2.size();i++){//循环list
        rows.add(RowFactory.create(key,list2.get(i), list2.get(i+1), list2.get(i+2),
        new Timestamp(new java.util.Date().getTime())));//添加需要的字段内容
        i = 3;
    }
}
StructType schema = createSchema();//创建表结构
Dataset<Row> df_target = sparkSession.createDataFrame(rows, schema);
df_target.show();//打印内容
df_target.write().mode(SaveMode.Append).jdbc(properties.getProperty("url_jdbc_master"),"analyze_point", this.properties);
//连接数据库,创建表,添加内容

Create a table structure ( table of contents field type and field type added must be the same ):

StructType createSchema() {
StructType ret = new StructType()//表的列名及表的字段类型,字段注释
    .add(new StructField("orgId",DataTypes.IntegerType,true,Metadata.empty()).withComment("orgId"))
    .add(new StructField("morePhoneLogin", DataTypes.IntegerType, true, Metadata.empty()).withComment("xx"))
    .add(new StructField("openAccountMore", DataTypes.IntegerType, true, Metadata.empty()).withComment("xx"))
    .add(new StructField("remoteLogin", DataTypes.IntegerType, true, Metadata.empty()).withComment("xx"))
    .add(new StructField("time", DataTypes.TimestampType, true, Metadata.empty()).withComment("时间"));
return ret;
}


The second :( dsAccountCount: mysql statement returns based on data compiled from the database, the database into a new table, type the DataSet <Row> )

JavaRDD<Row> rdd_target= dsAccountCount.toJavaRDD().map(t2->{
    return RowFactory.create(t2.get(0),t2.get(1),t2.get(2),new Timestamp(new java.util.Date().getTime()));//根据下标来获取内容
});
StructType schema = createSchema();//上面已经贴出生成新表结构样式
Dataset<Row> df_target = sparkSession.createDataFrame(rdd_target, schema);
df_target.show();
df_target.write().mode(SaveMode.Append).jdbc(this.properties.getProperty(PropertyKeys.URL_JDBC_MASTER),"analyze_point_operator_login", this.properties);

Published 47 original articles · won praise 10 · views 40000 +

Guess you like

Origin blog.csdn.net/fearlessnesszhang/article/details/80402521