Avro and protobuf serialization

Serialization:
    Interprocess communication and persistent storage

    Features:
        compact
        fast
        Scalability
        Interoperable, cross-language


    java serialization:
        ObjectInput(Output)Stream
    
    hadoop的writable:
        PersonWritable         // java, not cross-language

    euro
        Created by doug cutting, the father of hadoop


avro and hadoop serialization comparison:
===============================
    writable: not cross-language
    avro: Cross-language, the supported languages ​​are as follows

    c/    
    cpp/    
    c#/    
    java/    
    js/    
    perl/
    php/    
    py/    
    py3/    
    ruby/    

1. Create the emp.avsc file with the following contents

{
    "namespace": "tutorialspoint.com",
    "type": "record",
    "name": "Emp",
    "fields": [
    {"name": "name", "type": "string"},
    {"name": "id", "type": "int"},
    {"name": "salary", "type": "int"},
    {"name": "age", "type": "int"},
    {"name": "address", "type": "string"}
    ]
}

2. Put the avro-1.8.2.jar and avro-tools- 1.8.2.jar files in the same directory as emp.avsc

3. Compile the schema file
    java -jar avro-tools-1.8.2.jar compile schema emp.avsc .

4. View the generated file
    tutorialspoint\com\Emp.java文件

    content include:
        Constructor
        builder
        get && set
        Serialization and Deserialization Methods

5. Load this file into the ide,
     1. Modify the pom file
         <dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro</artifactId>
            <version>1.8.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro-tools</artifactId>
            <version>1.8.2</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
    2. Create a new package, named tutorialspoint.com

    3. Copy the Emp.java file into the package

    4. Solve the code error

6. Start writing serialization code
    @Test
    public void testAvroSerial() throws Exception {
        Emp e = new Emp();
        e.setId( 10 );
        e.setName("tom");
        e.setAge(20);
        e.setSalary ( 1000 );
        e.setAddress( "shahe" );
         // Initialize writer 
        DatumWriter<Emp> dw = new SpecificDatumWriter<Emp>(Emp.class ) ;
         // Initialize file writer 
        DataFileWriter<Emp> dfw = new DataFileWriter<Emp> (dw );
         // Start serializing the file 
        dfw.create(Emp.SCHEMA$, new File("F:/avro/emp.avro" ));
         // Append the object to the sequence file 
        dfw.append(e);
        dfw.close();
        System.out.println("ok");
    }
}


7. Test java, hadoop, avro to compare the serialization speed and size of 1,000,000 objects

    java        writable        avro
-------------------------------------------------------------
size    4,883kb        23,438kb        13,677kb
serial    3025ms        29410ms            1384ms


8. Write deserialization code
    @Test
    public void testAvroDeSerial() throws Exception {
        long start = System.currentTimeMillis();
        //初始化reader
        DatumReader<Emp> dr = new SpecificDatumReader<Emp>(Emp.class);
        //初始化文件阅读器
        DataFileReader<Emp> dfr = new DataFileReader<Emp>(new File("F:/avro/emp.avro"),dr);
        while (dfr.hasNext()){
            Emp emp = dfr.next();
            //System.out.println(emp.toString());
        }
        System.out.println(System.currentTimeMillis() - start);
    }
    


9. Test java, hadoop, avro to compare the deserialization speed of 1,000,000 objects

    java        writable        avro
-------------------------------------------------------------
size    4,883kb        23,438kb        13,677kb
serial    3025ms        29410ms            1384ms    1802ms
dessert 3860ms 26232ms 1972ms 1689ms



10. avro serializes objects by directly using schema without generating code
    



Google Protobuf
================================    
    Simple and efficient serialization technology, published by Google in 2008

    Cross-language support:
        Java, C++, and Python
        C, C#, Erlang, Perl, PHP, Ruby

    

java    -    avro    -    pb(protobuf)
    
javaBean    schema(json)    proto


1. Create emp.proto self-describing file (non-java file)

    package tutorial; 
    option java_package = "tutorialspoint.com"; 
    option java_outer_classname = "Emp2"; 
    message Emp {
        required int32 id = 1; 
        required string name = 2; 
        required int32 age = 3; 
        required int32 salary = 4; 
        required string address = 5; 
    }

2. Put emp.proto and protobuf\src\protoc.exe in the same folder (F:/ avro)
    
3. Enter cmd and compile emp.proto
    protoc --java_out=. emp.proto

4. Place Emp2.java under F:\avro\tutorialspoint\com in idea, package name tutorialspoint.com

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324848489&siteId=291194637