avro serialization and deserialization

avro is a sub-project of hadoop under apache, with serialization, deserialization, and RPC functions. The serialization efficiency is higher than jdk, comparable to Google's protobuffer, and better than facebook's open source Thrift (later managed by apache).

Because avro uses schema, if you are serializing a large number of objects of the same type, you only need to save a copy of the class structure information + data, which greatly reduces the amount of network communication or data storage

example:

Create a new maven project

pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>com.jv</groupId>
	<artifactId>avro</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<packaging>jar</packaging>

	<name>avro</name>
	<url>http://maven.apache.org</url>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<compiler-plugin.version>2.3.2</compiler-plugin.version>
		<avro.version>1.7.5</avro.version>
	</properties>
	<dependencies>
		<dependency>
			<groupId>junit</groupId>
			<artifactId>junit</artifactId>
			<version>4.10</version>
			<scope>test</scope>
		</dependency>
		<dependency>
			<groupId>org.slf4j</groupId>
			<artifactId>slf4j-simple</artifactId>
			<version>1.6.4</version>
			<scope>compile</scope>
		</dependency>
		<dependency>
			<groupId>org.apache.avro</groupId>
			<artifactId>avro</artifactId>
			<version>1.7.5</version>
		</dependency>
		<dependency>
			<groupId>org.apache.avro</groupId>
			<artifactId>avro-ipc</artifactId>
			<version>1.7.5</version>
		</dependency>
	</dependencies>
	<build>
		<plugins>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-compiler-plugin</artifactId>
				<version>${compiler-plugin.version}</version>
			</plugin>
			<plugin>
				<groupId>org.apache.avro</groupId>
				<artifactId>avro-maven-plugin</artifactId>
				<version>1.7.5</version>
				<executions>
					<execution>
						<id>schemas</id>
						<phase>generate-sources</phase>
						<goals>
							<goal>schema</goal>
							<goal>protocol</goal>
							<goal>idl-protocol</goal>
						</goals>
						<configuration>
							<sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
							<outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
						</configuration>
					</execution>

				</executions>
			</plugin>
		</plugins>
	</build>
</project>

This section configures the code generation plugin, you need to create a src/main/avro source file directory for the project

The specific steps are:

Write the schema file user.avsc:

{
"namespace": "com.jv.avro",
 "type": "record",
 "name": "User",
 "fields": [
     {"name": "username","type": "string"},
     {"name": "age","type": ["int", "null"]},
     {"name": "address","type": ["string", "null"]}
 ]
}

namespace: namespace, when using the plugin to generate code, the package name of the User class is it

type: There are records, enums, arrays, maps, unions , fixed values, records are equivalent to ordinary classes

name: class name, the full name of the class consists of namespace+name

doc: Comments

aliases: aliases taken, other places can use aliases to refer to

fields: attribute

    name: attribute name

    type: attribute type, which can be used ["int", "null"] or ["int", 1] to execute the default value

    default: You can also use this field to specify a default value

    doc: Comments

Generate code from schema definitions

Follow the steps circled in the picture

Observe whether SUCCESS is output in the console, if yes, it means success

test code

package com.jv.test;

import java.io.File;
import java.io.IOException;

import org.apache.avro.file.DataFileReader;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.specific.SpecificDatumReader;
import org.apache.avro.specific.SpecificDatumWriter;

import com.jv.avro.User;

public class TestAvro {
	public static void main(String[] args) throws IOException {
		//实例化代码方式1
		User user1 = new User();
		user1.setUsername(new String("Messi"));
		user1.setAddress("Barcelona");
		user1.setAge(30);
		
		//实例化代码方式3
		User user2 = new User(new String("Messi"),30,"巴塞罗那");
		//实例化代码方式3
		User user3 = new User().newBuilder().setUsername("Havi").setAge(34).setAddress("卡塔尔").build();
		
		//序列化对象并保存到文件中
		DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User>(User.class);
		DataFileWriter<User> dataFileWriter = new DataFileWriter<User>(userDatumWriter);
		dataFileWriter.create(user1.getSchema(), new File("users.avro"));
		dataFileWriter.append(user1);
		dataFileWriter.append(user2);
		dataFileWriter.append(user3);
		dataFileWriter.close();
		
		//从文件中反序列化对象输出
		DatumReader<User> userDatumReader = new SpecificDatumReader<User>(User.class);
		DataFileReader<User> dataFileReader = new DataFileReader<User>(new File("users.avro"), userDatumReader);
		User user = null;
		while (dataFileReader.hasNext()) {
		user = dataFileReader.next(user);
		System.out.println(user);
		}
	}
}

output after running

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326508215&siteId=291194637