[Introduction to the serialization tool Avro]

1. Introduction to Avro

Avro is a data serialization system. Avro is a sub-project in Hadoop and an independent project in Apache. Avro is a high-performance middleware based on binary data transmission. This tool is also used in other Hadoop projects such as HBase (Ref) and Hive (Ref) for data transmission between the client and the server. Avro is a data serialization system. Avro can convert data structures or objects into a format that is convenient for storage or transmission. Avro was originally designed to support data-intensive applications and is suitable for remote or local large-scale data storage and exchange.




 
 

 

2. Avro features

1 Rich data structure types

2 Fast and compressible binary data form

3 File Containers for Persistent Data

4 Remote Procedure Call RPC

5 Simple dynamic language combination function. After Avro is combined with dynamic language, code generation is not required for reading and writing data files and using RPC protocol, and code generation as an optional optimization is only worth implementing in statically typed languages.

 

 

avro supports cross programming language implementation (C, C++, C#, Java, Python, Ruby, PHP), similar to Thrift, but the salient features of avro are: avro depends on the schema, the schema of dynamically loading related data, the reading and writing of Avro data Operations are frequent, and these operations are all schemas, which reduces the overhead of writing to each data file, making serialization fast and lightweight. This self-description of data and its schema facilitates the use of dynamic scripting languages. When Avro data is stored in a file, its schema is also stored so that any program can process the file. It's also easy to fix if the schema used when reading data is different from the schema used when writing data, since the schemas for both reading and writing are known.

 

Avro specifies two data serialization encoding methods: binary encoding and Json encoding. Using binary encoding will efficiently serialize, and the results obtained after serialization will be relatively small; while JSON is generally used for debugging systems or web-based applications.

 

 

 

Avro relies on schemas. The read and write operations of Avro data are very frequent, and these operations require the use of schemas, which reduces the overhead of writing each data material, making serialization fast and lightweight. This self-description of data and its schema facilitates the use of dynamic scripting languages.

When Avro data is stored in a file, its schema is also stored so that any program can process the file. This is also easy to fix if the data needs to be read in a different schema, since both schemas are known.

When using Avro in RPC, the server and client can exchange schemas when handshaking the connection. The server and client have all schemas for each other, so the consistency issues that need to be addressed in the communication of information such as identically named fields, missing fields, and redundant fields can be easily resolved

Also, Avro schemas are defined in JSON (a lightweight data interchange schema), which makes it easy to implement for languages ​​that already have a JSON library.

 

 

Avro provides similar functionality to systems such as Thrift and Protocol Buffers, but differs in some fundamental ways, mainly:

1 Dynamic type : Avro does not need to generate code, the schema and data are stored together, and the schema makes the entire data processing process not generate code, static data types, and so on. This facilitates the construction of data processing systems and languages.

2 Unlabeled data : Since the schema is known when the data is read, there is very little type information that needs to be encoded with the data, so the serialization scale is small.

3 Does not require the user to specify field numbers : Even if the schema changes, both the old and new schemas are known when processing data, so discrepancies can be resolved by using field names.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327037053&siteId=291194637