Reprint FastJSON implementation details

http://www.csdn.net/article/2014-09-25/2821866

 

==================

[Question] Quiet: FastJSON Implementation Details

Published on 2014-09-26 11:38 21222 times read | source CSDN 33  comments |
<iframe src="http://hits.sinajs.cn/A1/weiboshare.html?url=http%3A%2F%2Fwww.csdn.net%2Farticle%2F2014-09-25%2F2821866&type=3&count=&appkey=&title=%E2%80%9C%E5%BF%AB%E2%80%9D%E4%BD%9C%E4%B8%BA%E7%A8%8B%E5%BA%8F%E5%91%98%E8%BF%BD%E9%80%90%E7%9A%84%E7%BB%88%E6%9E%81%E7%9B%AE%E6%A0%87%E4%B9%8B%E4%B8%80%EF%BC%8C%E8%80%8CFastJSON%E5%88%99%E5%BE%88%E5%A5%BD%E7%9A%84%E8%AF%81%E6%98%8E%E4%BA%86%E8%BF%99%E4%B8%80%E7%89%B9%E6%80%A7%E3%80%82%E6%9C%AC%E6%9C%9F%E3%80%8A%E9%97%AE%E5%BA%95%E3%80%8B%EF%BC%8C%E9%9D%99%E8%A1%8C%E5%B0%86%E5%B8%A6%E5%A4%A7%E5%AE%B6%E8%A7%81%E8%AF%81%E5%AE%83%E5%BA%8F%E5%88%97%E5%8C%96%E5%92%8C%E5%8F%8D%E5%BA%8F%E5%88%97%E5%8C%96%E7%9A%84%E5%AE%9E%E7%8E%B0%E8%BF%87%E7%A8%8B%EF%BC%8C%E4%B8%80%E8%B5%B7%E9%A2%86%E7%95%A5%E5%AE%83%E7%9A%84%E2%80%9C%E5%BF%AB%E2%80%9D%E6%84%9F%E3%80%82&pic=&ralateUid=&language=zh_cn&rnd=1467720340758" frameborder="0" scrolling="no" width="22" height="16"></iframe> Abstract: "Fast" is one of the ultimate goals pursued by programmers, and FastJSON is a good proof of this feature. In this issue of "Question", Jingxing will take you to witness the realization process of its serialization and deserialization, and enjoy its "pleasure" together.

I still remember the words of Huoyun Evil God in the movie "Kung Fu": In the world of Kung Fu, all strengths are unbreakable, but fast is unbreakable. In the programmer's world, "fast" has always been one of the ultimate goals of everyone's hard work and chasing, even to the point of "unscrupulous" and "comparison with money".

I have always used json to be free between various programming languages ​​and systems. I came across Fastjson by chance, and was deeply attracted by his features such as independence, ease of use, and wide application, and at the same time I was shocked by his surprisingly "fast". Looking back. A natural thought came to mind: why is FastJSON so fast? So the mind is drawn to pull it out, one is to worship the masterpiece of the master, the other is to steal skills with an open mind, and the third is to facilitate the learning of the comers.

The following content of this article is based on FastJSON 1.1.40, focusing on its serialization and deserialization implementation, and finally analyzes why FastJSON is so "fast".

1. Serialization

The so-called serialization is to convert various Java objects into JSON strings. Not much to say, first picture.

 

 

 serialization entry

Usually we often use the static method JSON.toJSONString() to achieve serialization. In fact, JSON is an abstract class, which implements the interfaces of JSONAware (converted to json string) and JSONStreamAware (writes json string into Appendable), as well as JSONArray (the internal implementation is a List) and JSONObject (the internal implementation is a Map) parent class. The internal implementation of the JSON.toJSONString() method is basically the same. For some specific configurations, the externally exposed interfaces may be different. The implementation of this method is actually delegated to the JSONSerializer class.

serializer

The JSONSerializer class is equivalent to a serialization combiner, which combines functions such as upper-layer invocation, serialization configuration, specific type serialization implementation, serialized string splicing, etc. to facilitate unified external calls. This class has several important members, SerializeConfig, SerializeWriter, various Filter lists, DateFormat, SerialContext, etc., as well as ObjectSerializer (non-JSONSerializer member variable) that serializes each specific object each time. The following describes their respective functions one by one.

1. SerializeConfig

SerializeConfig is globally unique. It inherits from IdentityHashMap. IdentityHashMap is a Hash bucket with a default length of 1024. Each bucket stores the Entry of the same Hash (can be regarded as a linked list node, including key, value, next pointer, hash value) made of The singly linked list, IdentityHashMap implements the function of HashMap, but can avoid the infinite loop when HashMap is concurrent.  

The main function of SerializeConfig is to configure and record the serialization class corresponding to each Java type (the implementation class of the ObjectSerializer interface), such as Boolean. ) as the serialization implementation class, and float[].class uses the FloatArraySerializer as the serialization implementation class. Some of these serialization implementation classes are implemented by default in FastJSON (such as Java basic classes), some are generated by ASM framework (such as user-defined classes), and some are even user-defined serialization classes (such as Date type framework defaults The implementation is converted to milliseconds, the application needs to convert to seconds). Of course, this involves the question of whether to use ASM to generate serialized classes or to use JavaBean's serialized classes for serialization. The judgment here is based on whether the Android environment (environment variable "java.vm.name" is "dalvik" or "lemur" "It's the Android environment), but the judgment is not only here, there are more specific judgments in the follow-up.

2. SerializeWriter

SerializeWriter inherits from Java's Writer, which is actually a StringBuilder converted to FastJSON to complete high-performance string splicing. The members of this class are as follows:

 

  • char buf[]
 It can be understood as the memory storage address of the string after each serialization.

 

 

  • static ThreadLocal> bufLocal 
Each serialization requires reallocating the buf[] memory space. And bufLocal is that the memory space of bug[] is reserved in ThreadLocal after each serialization, but the value in it is cleared to avoid frequent memory allocation and gc.

 

 

  • int features 
The feature configuration for generating a json string, the default configuration is: 
<span> QuoteFieldNames | SkipTransientField | WriteEnumUsingToString | SortField </span>

 The meaning is: double quotes filedName and ignore transientField and enum type use String to write and sort the output field. All supported features are in the SerializerFeature class, and the user can display the configuration when calling, or inject the configuration through JSONFiled or JSONType.

 

 

  • Writer 
The writer user specifies to write the generated json string directly into a writer, such as the JSONWriter class.

 

For example, writeStringWithDoubleQuote() means to write a string with double quotes, let's see how to concatenate strings.

3. Filter list

There are many Filter lists in SerializeWriter, which can be regarded as customized serialization in each stage and place of json string generation, roughly as follows:

 

  • BeforeFilter : add content to the front when serializing
  • AfterFilter : add content at the end when serializing
  • PropertyFilter : Determine whether to serialize based on PropertyName and PropertyValue
  • ValueFilter : Modify Value
  • NameFilter : modify key
  • PropertyPreFilter : Determine whether to serialize according to PropertyName

 

4.  DateFormat

Specifies the date format. If not specified, FastJSON will automatically recognize the following date formats:

 

  • ISO-8601 date format
  • yyyy-MM-dd
  • yyyy-MM-dd HH:mm:ss
  • yyyy-MM-dd HH:mm:ss.SSS
  • millisecond value
  • millisecond string
  • .Net Json date format
  • new Date()

 

5.SerialContext

Serialization context, used in references or circular references, the value will be put into the references Hash bucket (IdentityHashMap) cache.

6. ObjectSerializer  

ObjectSerializer has only one interface method, as follows:

void write(JSONSerializer serializer,Objectobject,Object
    fieldName,Type fieldType);

可见,将JSONSerializer传入了ObjectSerializer中,而JSONSerializer有SerializeWriter成员,在每个具体ObjectSerializer实现中,直接使用SerializeWriter拼接字符串即可;Object即是待序列化的对象;fieldName则主要用于组合类引用时设置序列化上下文;而fieldType主要是为了泛型处理。  

JSONSerializer中通过public ObjectSerializer getObjectWriter(Class clazz)函数获取类对应的序列化类(即实现ObjectSerializer接口的类),大致逻辑如下:

 

 

整个过程是先获取已实现基础类对应的序列化类,再通过类加载器获取自定义的AutowiredObjectSerializer序列化类,最后获取通过createJavaBeanSerializer()创建的序列化类。通过该方法会获取两种序列化类,一种是直接的JavaBeanSerializer(根据类的get方法、public filed等JavaBean特征序列化),另一种是createASMSerializer(通过ASM框架生成的序列化字节码),优先使用第二种。选择JavaBeanSerializer的条件为:

 

  • 该clazz为非public类
  • 该clazz的类加载器在ASMClassLoader的外部,或者clazz就是 Serializable.class,或者clazz就是Object.class
  • JSONType的注解指明不适用ASM
  • createASMSerializer加载失败 

 

结合前面的讨论,可以得出使用ASM的条件:非Android系统、非基础类、非自定义的AutowiredObjectSerializer、非以上所列的使用JavaBeanSerializer条件。 

具体基础类的序列化方法、JavaBeanSerializer的序列化方法和ASM生成的序列化方法可以参见代码,这里就不做一一讲解了。

2. 反序列化

所谓反序列化,就是将json串转化为对应的java对象。还是先上图。

 

 

同样是JSON类作为反序列化入口,实现了parse()、parseObject()、parseArray()等将json串转换为java对象的静态方法。这些方法的实现,实际托付给了DefaultJSONParser类。   

DefaultJSONParser类相当于序列化的JSONSerializer类,是个功能组合器,它将上层调用、反序列化配置、反序列化实现、词法解析等功能组合在一起,相当于设计模式中的外观模式,供外部统一调用。同样,我们来分析该类的几个重要成员,看看他是如何实现纷繁的反序列化功能的。

1.  ParserConfig

同SerializeConfig,该类也是全局唯一的解析配置,其中的boolean asmEnable同样判断是否为Andriod环境。与SerializeConfig不同的是,配置类和对应反序列类的IdentityHashMap是该类的私有成员,构造函数的时候就将基础反序列化类加载进入IdentityHashMap中。

2.  JSONLexer 

JSONLexer是个接口类,定义了各种当前状态和操作接口。JSONLexerBase是对JSONLexer实现的抽象类,类似于序列化的SerializeWriter类,专门解析json字符串,并做了很多优化。实际使用的是JSONLexerBase的两个子类JSONScanner和JSONLexerBase,前者是对整个字符串的反序列化,后者是接Reader直接序列化。简析JSONLexerBase的某些成员:

 

  • int token

 

由于json串具有一定格式,字符串会根据某些特定的字符来自解释所表示的意义,那么这些特定的字符或所处位置的字符在FastJSON中就叫一个token,比如"(","{","[",",",":",key,value等,这些都定义在JSONToken类中。

 

  • char[] sbuf

 

解析器通过扫描输入字符串,将匹配得到的最细粒度的key、value会放到sbuf中。

 

  • static ThreadLocal> SBUF_REF_LOCAL

 

上面sbuf的空间不释放,在下次需要的时候直接拿出来使用,从避免的内存的频繁分配和gc。

 

  • features

 

反序列化特性的配置,同序列化的feature是通过int的位或来实现其特性开启还是关闭的。默认配置是: AutoCloseSource | UseBigDecimal | AllowUnQuotedFieldNames | AllowSingleQuotes | AllowArbitraryCommas | AllowArbitraryCommas | SortFeidFastMatch | IgnoreNotMatch ,表示检查json串的完整性 and 转换数值使用BigDecimal and 允许接受不使用引号的filedName and 允许接受使用单引号的key和value and 允许接受连续多个","的json串 and 使用排序后的field做快速匹配 and 忽略不匹配的key/value对。当然,这些参数也是可以通过其他途径配置的。

 

  • hasSpecial

 

对转义符的处理,比如'\0','\'等。

词法解析器是基于预测的算法从左到右一次遍历的。由于json串具有自身的特点,比如为key的token后最有可能是":",":"之后可能是value的token或为"{"的token或为"["的token等等,从而可以根据前一个token预判下一个token的可能,进而得知每个token的含义。分辨出各个token后,就可以获取具体值了,比如scanString获取key值,scanFieldString根据fieldName获取fieldValue,scanTrue获取java的true等等。其中,一般会对key进行缓存,放入SymbolTable(类似于IdentityHashMap)中,猜想这样做的目的是:应用解析的json串一般key就那么多,每次生成开销太多,干脆缓存着,用的就是就来取,还是空间换时间的技巧。

3.  List< ExtraTypeProvider >和List< ExtraProcessor >

视为对其他类型的处理和其他自定义处理而留的口子,用户可以自己实现对应接口即可。

4.  DateFormat

同序列化的DateFormat,不多说了。

5.  ParseContext 和 List< ResolveTask >

ParseContext同序列化的SerialContext,为引用甚至循环引用做准备。   

List< ResolveTask >当然就是处理这种多层次甚至多重引用记录的list了。

6.  SymbolTable

上面提到的key缓存。

7.  ObjectDeserializer

跟ObjectSerializer也是相似的。先根据fieldType获取已缓存的解析器,如果没有则根据fieldClass获取已缓存的解析器,否则根据注解的JSONType获取解析器,否则通过当前线程加载器加载的AutowiredObjectDeserializer查找解析器,否则判断是否为几种常用泛型(比如Collection、Map等),最后通过createJavaBeanDeserializer来创建对应的解析器。当然,这里又分为JavaBeanDeserializer和asmFactory.createJavaBeanDeserializer两种。使用asm的条件如下:

 

  • 非Android系统
  • 该类及其除Object之外的所有父类为是public的
  • 泛型参数非空
  • 非asmFactory加载器之外的加载器加载的类
  • 非接口类
  • 类的setter函数不大于200
  • 类有默认构造函数
  • 类不能含有仅有getter的filed
  • 类不能含有非public的field
  • 类不能含有非静态的成员类
  • 类本身不是非静态的成员类

 

使用ASM生成的反序列化器具有较高的反序列化性能,比如对排序的json串可按顺序匹配解析,从而减少读取的token数,但如上要求也是蛮严格的。综上,FastJSON反序列化也支持基础反序列化器、JavaBeanDeserializer反序列化器和ASM构造的反序列化器,这里也不做一一讲解了。

3. Why So Fast

FastJSON真的很快,读后受益匪浅。个人总结了下快的原因(不一定完整):

1.  专业的心做专业的事

不论是序列化还是反序列化,FastJSON针对每种类型都有与之对应的序列化和反序列化方法,就针对这种类型来做,优化性能自然更具针对性。自编符合json的SerializeWriter和JSONLexer,就连ASM框架也给简化掉了,只保留所需部分。不得不叹其用心良苦。

2.  无处不在的缓存

空间换时间的想法为程序员屡试不爽,而作者将该方法用到任何细微之处:类对应的序列化器/反序列化器全部存起来,方便取用;解析的key存起来,表面重复内存分配等等。

3.  不厌其烦的重复代码

我不知道是否作者故意为之,程序中出现了很多类似的代码,比如特殊字符处理、不同函数对相同token的处理等。这样虽对于程序员寻求规整相违背,不过二进制代码却很喜欢,无形之中减少了许多函数调用。

4.  不走寻常路

对于JavaBean,可以通过发射实现序列化和反序列化(FastJSON已有实现),但默认使用的是ASM框架生成对应字节码。为了性能,无所不用其极。

5.  一点点改变有很大的差别

排序对输出仅是一点小小的改变,丝毫不影响json的使用,但却被作者用在了解析的快速匹配上,而不用挨个拎出key。

6.  从规律中找性能

上面也讲到,FastJSON读取token基于预测的。json串自身的规律性被作者逮个正着,预测下一个将出现的token处理比迷迷糊糊拿到一个token再分情况处理更快捷。

结束语

不喜欢虎头蛇尾的结局。不过写到这里,除了承认自己对FastJSON代码某些地方还没看懂或理解有偏颇之外,不敢说太多了。

关于作者:阿里巴巴集团CDO数据开发平台高级工程师,2010年加入阿里巴巴,长期从事各存储系统(mysql、oracle、odps、hadoop、sqlserver、rds、drds、hbase、oceanbase、db2、ots、tair等)间实时和离线的数据同步工作,打造阿里巴巴云上云下的数据同步通道。

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326776844&siteId=291194637