Protobuffer and json depth comparison

Reprinted from: http://cxshun.iteye.com/blog/1974498

1. The data transmission protocol used when springboot is distributed.

I believe everyone knows what JSON is. If you don't know it, it's really OUT , GOOGLE it. Nothing is introduced here.

It is estimated that Protobuffer is rarely heard of, but if it is said that it is made by GOOGLE , I believe everyone will be interested in trying it out. After all, GOOGLE exports are mostly high-quality products.

Protobuffer is a transmission protocol similar to JSON . In fact, it cannot be said to be a protocol, it is just a data transmission thing.

So how is it different from JSON ?

Cross-language, which is one of its advantages. It comes with a compiler, protoc , you only need to use it to compile, it can be compiled into JAVA , python , C++ code, there are only these three for the time being, don't think about the others for the time being, and then you can use it directly, no need to write again any other code. Even the parsed ones are already included. JSON is of course also cross-language, but this cross-language is based on writing code.

If you want to know more, you can go to:

https://developers.google.com/protocol-buffers/docs/overview

Well, without further ado, let's take a look at why we need to compare protobuffer (hereinafter referred to as GPB ) and JSON .

1. Because JSON has a certain format and exists in characters, there is still room for compression in the amount of data. When there is a large amount of data on GPB , the space is much smaller than JSON , as we can see in an example later.

2. The efficiency of each JSON library is quite different. There is about a 5-10 gap between the jackson library and GSON (this has only been tested once, if it is wrong, please tap it). And GPB only needs one, there is no so-called difference between multiple libraries. Of course, this point is just to make up the number, and it can be ignored.

Talk is cheap,Just show me the code。

In the programming world, code is always king, so let's go directly to the code.

Before entering the code, you need to download the protobuffer first , here:

https://code.google.com/p/protobuf/downloads/list

Note that you need to download two, one is the compiler , the other is the source code , I believe this is not difficult for everyone, skip it here.

1. First of all, GPB needs to have a file similar to class definition, called proto file.

Let's take the example of students and teachers as an example:

We have the following two files: student.proto

teacher.proto

import "student.proto";
option java_package = "com.shun";
option java_outer_classname = "TeacherProto";

message Teacher {
	required int32 id = 1;
	optional string name = 2;

	repeated Student student_list = 3;
}

Here we come across something rather odd:

import,int32,repated,required,optional,option等

Come one by one:

1) import means to introduce other proto files

2) required, optional indicate whether the field is optional, which determines what the protobuffer will do if the field has a value or not. If required is marked , but when processing, the field is not passed a value, an error will be reported ; if optional is marked , there will be no problem if no value is passed.

3) I believe that repeated should be understandable, that is, whether it is repeated, similar to the list in JAVA

4) message is equivalent to class

5）option表示选项，其中的java_package表示包名，即生成JAVA代码时使用的包名，java_outer_classname即为类名，注意这个类名不能跟下面的message中的类名相同。

至于还有其他的选项和相关类型的，请参观官方文档。

2、有了这几个文件，我们能怎么样呢？

记得上面下载的编译器了吧，解压出来，我们得到一个protoc.exe，这当然是windows下的，我没弄其他系统的，有兴趣的同学去折腾下罗。

加到path（加不加可以随便，只是方不方便而已），然后就可以通过上面的文件生成我们需要的类文件了。

protoc --java_out=存放源代码的路径 --proto_path=proto文件的路径 proto具体文件

--proto_path指定的是proto文件的文件夹路径，并不是单个文件，主要是为了import文件查找使用的，可以省略

如我需要把源代码放在D:\protobufferVsJson\src，而我的proto文件存放在D:\protoFiles

那么我的编译命令就是：

protoc --java_out=D:\protobufferVsJson\src

D:\protoFiles\teacher.proto D:\protoFiles\student.proto

注意，这里最后的文件，我们需要指定需要编译的所有文件

编译后可以看到生成的文件。

代码就不贴出来了，太多了。大家可以私下看看，代码里面有一大堆Builder，相信一看就知道是建造者模式了。

这时可以把代码贴到你的项目中了，当然，错误一堆了。

记得我们前面下载的源代码吗？解压它吧，不要手软。然后找到src/main/java/复制其中的一堆到你的项目，当然，你也可以ant或者maven编译，但这两个东西我都不熟，就不献丑了，我还是习惯直接复制到项目中。

代码出错，哈哈，正常。不知道为何，GOOGLE非要留下这么个坑给我们。

翻回到protobuffer目录下的\java看到有个readme.txt了吧，找到一句：

看来看去，感觉这个代码会有点奇怪的，好像错错的感觉，反正我是没按那个执行，我的命令是:

protoc --java_out=还是上面的放代码的地方 proto文件的路径（这里是descriptor.proto文件的路径）

执行后，我们可以看到代码中的错误木有了。

3、接下来当然就是测试了。

我们先进行GPB写入测试：

package com.shun.test;

import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import com.shun.StudentProto.Student;
import com.shun.TeacherProto.Teacher;

public class ProtoWriteTest {

	public static void main(String[] args) throws IOException {
		
		Student.Builder stuBuilder = Student.newBuilder();
		stuBuilder.setAge(25);
		stuBuilder.setId(11);
		stuBuilder.setName("shun");
		
		//构造List
		List<Student> stuBuilderList = new ArrayList<Student>();
		stuBuilderList.add(stuBuilder.build());
		
		Teacher.Builder teaBuilder = Teacher.newBuilder();
		teaBuilder.setId(1);
		teaBuilder.setName("testTea");
		teaBuilder.addAllStudentList(stuBuilderList);
		
		//把gpb写入到文件
		FileOutputStream fos = new FileOutputStream("C:\\Users\\shun\\Desktop\\test\\test.protoout");
		teaBuilder.build().writeTo(fos);
		fos.close();
	}

}

我们去看看文件，如无意外，应该是生成了的。

生成了之后，我们肯定要读回它的。

package com.shun.test;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;

import com.shun.StudentProto.Student;
import com.shun.TeacherProto.Teacher;

public class ProtoReadTest {

	public static void main(String[] args) throws FileNotFoundException, IOException {
		
		Teacher teacher = Teacher.parseFrom(new FileInputStream("C:\\Users\\shun\\Desktop\\test\\test.protoout"));
		System.out.println("Teacher ID:" + teacher.getId() + ",Name:" + teacher.getName());
		for (Student stu:teacher.getStudentListList()) {
			System.out.println("Student ID:" + stu.getId() + ",Name:" + stu.getName() + ",Age:" + stu.getAge());
		}
	}

}

代码很简单，因为GPB生成的代码都帮我们完成了。

上面知道基本的用法了，我们重点来关注GPB跟JSON生成文件大小的区别，JSON的详细代码我这里就不贴了，之后会贴出示例，大家有兴趣可以下载。

这里我们用Gson来解析JSON，下面只给出对象转换成JSON后写出文件的代码：

两个类Student和Teacher的基本定义就不弄了，大家随意就行，代码如下：

package com.shun.test;

import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import com.google.gson.Gson;
import com.shun.Student;
import com.shun.Teacher;

public class GsonWriteTest {

	public static void main(String[] args) throws IOException {
		Student stu = new Student();
		stu.setAge(25);
		stu.setId(22);
		stu.setName("shun");
		
		List<Student> stuList = new ArrayList<Student>();
		stuList.add(stu);
		
		Teacher teacher = new Teacher();
		teacher.setId(22);
		teacher.setName("shun");
		teacher.setStuList(stuList);
		
		String result = new Gson().toJson(teacher);
		FileWriter fw = new FileWriter("C:\\Users\\shun\\Desktop\\test\\json");
		fw.write(result);
		fw.close();
	}

}

接下来正式进入我们的真正测试代码了，前面我们只是在列表中放入一个对象，接下来，我们依次测试100,1000,10000,100000,1000000,5000000这几个数量的GPB和JSON生成的文件大小。

改进一下之前的GPB代码，让它生成不同数量的列表，再生成文件：

package com.shun.test;

import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import com.shun.StudentProto.Student;
import com.shun.TeacherProto.Teacher;

public class ProtoWriteTest {

	public static final int SIZE = 100;
	
	public static void main(String[] args) throws IOException {
		
		//构造List
		List<Student> stuBuilderList = new ArrayList<Student>();
		for (int i = 0; i < SIZE; i ++) {
			Student.Builder stuBuilder = Student.newBuilder();
			stuBuilder.setAge(25);
			stuBuilder.setId(11);
			stuBuilder.setName("shun");
			
			stuBuilderList.add(stuBuilder.build());
		}
		
		Teacher.Builder teaBuilder = Teacher.newBuilder();
		teaBuilder.setId(1);
		teaBuilder.setName("testTea");
		teaBuilder.addAllStudentList(stuBuilderList);
		
		//把gpb写入到文件
		FileOutputStream fos = new FileOutputStream("C:\\Users\\shun\\Desktop\\test\\proto-" + SIZE);
		teaBuilder.build().writeTo(fos);
		fos.close();
	}

}

这里的SIZE依次改成我们上面据说的测试数，可以得到如下：

然后我们再看看JSON的测试代码：

package com.shun.test;

import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import com.google.gson.Gson;
import com.shun.Student;
import com.shun.Teacher;

public class GsonWriteTest {

	public static final int SIZE = 100;
	
	public static void main(String[] args) throws IOException {
		
		List<Student> stuList = new ArrayList<Student>();
		for (int i = 0; i < SIZE; i ++) {
			Student stu = new Student();
			stu.setAge(25);
			stu.setId(22);
			stu.setName("shun");
			
			stuList.add(stu);
		}
		
		
		Teacher teacher = new Teacher();
		teacher.setId(22);
		teacher.setName("shun");
		teacher.setStuList(stuList);
		
		String result = new Gson().toJson(teacher);
		FileWriter fw = new FileWriter("C:\\Users\\shun\\Desktop\\test\\json" + SIZE);
		fw.write(result);
		fw.close();
	}

}

同样的方法修改SIZE，并作相应的测试。

可以明显得看到json的文件大小跟GPB的文件大小在数据量慢慢大上去的时候就会有比较大的差别了，JSON明显要大上许多。

上面的表应该可以看得比较清楚了，在大数据的GPB是非常占优势的，但一般情况下客户端和服务端并不会直接进行这么大数据的交互，大数据主要发生在服务器端的传输上，如果你面对需求是每天需要把几百M的日志文件传到另外一台服务器，那么这里GPB可能就能帮你的大忙了。

说是深度对比，其实主要对比的是大小方面，时间方面可比性不会太大，也没相差太大。

文章中选择的Gson解析器，有兴趣的朋友可以选择Jackson或者fastjson，又或者其他的，但生成的文件大小是一样的，只是解析时间有区别。

这神一般的iteye博客编辑器，无语了，插入代码后还要带些标签，大家将就看吧。代码就打包在下面了。

Protobuffer and json depth comparison

Guess you like