1. Avro basic data type
Type description Schema example
null The absence of a value "null"
boolean A binary value "boolean"
int 32-bit signed integer "int"
long 64-bit signed integer "long"
float 32-bit single-precision float Point "float"
double 64-bit double-precision floating-point number "double"
bytes byte array (8-bit uncharacterized byte sequence) "bytes"
string Unicode string "string"
[Avro primitive data types can also be used in a more verbose form using type Attribute to specify such as {"type":"null"}]
2.Avro complex data type
Data type type description schema example
array An ordered collection of objects. {
All objects in a particular "type": " array",
array must have the same schema. "items": "long"
}
map An unordered collection of key-value pairs. {
Keys must be strings and values may be any type, "type": "map",
although within a particular map, "values": "string"
all values must have the same schema. }
record A collection of named fields of any type. {
"type": "record",
"name": "WeatherRecord",
"doc": "A weather reading.",
"fields": [
{"name": "year", "type": "int"},
{"name": "temperature", "type": "int"},
{"name": "stationId", "type": "string"}
]
}
enum A set of named values. {
"type": "enum",
"name": "Cutlery",
"doc": "An eating utensil.",
"symbols": ["KNIFE", "FORK", "SPOON"]
}
fixed
A fixed number of 8-bit unsigned bytes.
{
"type": "fixed",
"name": "Md5Hash",
"size": 16
}
union A union of schemas. A union is represented by a JSON [
array, where each element in the array is a schema. "null",
Data represented by a union must match "string",
one of the schemas in the union. { "type": "map", "values": "string"}
]
As shown in the figure above, the local small file can be packaged by the program, assembled into a large file and saved in HDFS, and the local small file becomes Avro records. The specific procedure is shown in the following code:
//Write to Avro data file
public class AVRO_WRITE {
public static final String FIELD_CONTENTS = "contents";
public static final String SCHEMA_JSON = "{\"type\": \"record\",\"name\": \"SmallFilesTest\", "
+ "\"fields\": ["
+ "{\"name\":\""
+ FIELD_FILENAME
+ "\",\"type\":\"string\"},"
+ "{\"name\":\""
+ FIELD_CONTENTS
+ "\", \"type\":\"bytes\"}]}";
public static final Schema SCHEMA = new Schema.Parser().parse(SCHEMA_JSON);
public static void writeToAvro(File srcPath,OutputStream outputStream) throws IOException {
DataFileWriter<Object> writer = new DataFileWriter<Object>(new GenericDatumWriter<Object>()).setSyncInterval(100);
writer.setCodec(CodecFactory.snappyCodec());
writer.create(SCHEMA, outputStream);
for (Object obj : FileUtils.listFiles(srcPath, null, false)){
File file = (File) obj;
String filename = file.getAbsolutePath();
byte content[] = FileUtils.readFileToByteArray(file);
GenericRecord record = new GenericData.Record(SCHEMA);
record.put(FIELD_FILENAME, filename);
record.put(FIELD_CONTENTS, ByteBuffer.wrap(content));
writer.append(record);
System.out.println(file.getAbsolutePath() + ":"+ DigestUtils.md5Hex(content));
}
IOUtils.cleanup(null, writer);
IOUtils.cleanup(null, outputStream);
}
public static void main(String args[]) throws Exception {
Configuration config = new Configuration();
FileSystem hdfs = FileSystem.get(config);
File sourceDir = new File(args[0]);
Path destFile = new Path(args[1]);
OutputStream os = hdfs.create(destFile);
writeToAvro(sourceDir, os);
}
}
//对Avro数据文件的读取
public class AVRO_READ{
private static final String FIELD_FILENAME = "filename";
private static final String FIELD_CONTENTS = "contents";
public static void readFromAvro(InputStream is) throws IOException {
DataFileStream<Object> reader = new DataFileStream<Object>(is,new GenericDatumReader<Object>());
for (Object o : reader) {
GenericRecord r = (GenericRecord) o;
System.out.println(r.get(FIELD_FILENAME)+ ":"+DigestUtils.md5Hex(((ByteBuffer)r.get(FIELD_CONTENTS)).array()));
}
IOUtils.cleanup(null, is);
IOUtils.cleanup(null, reader);
}
public static void main(String... args) throws Exception {
Configuration config = new Configuration();
FileSystem hdfs = FileSystem.get(config);
Path destFile = new Path(args[0]);
InputStream is = hdfs.open(destFile);
readFromAvro(is);
}
}
Hadoop_Avro data types and schemas
Guess you like
Origin http://10.200.1.11:23101/article/api/json?id=326781803&siteId=291194637
Recommended
Ranking