Stick package / unpacking problem description

Suppose the client send two data packet to the server, the server due to a number of bytes read uncertain, there will be four cases: 1: twice server to read two independent data packets , does not occur stick package unpacking

2: a server receiving the two data packets, two bonded together, referred to as stick package

3: the server on two separate occasions to read the two data packets, the first time a complete read packet D1 and D2 parts of the package, the second reading of the remaining contents of D2, which is referred to as TCP unpacking .

4: service terminals are read twice to two packets, the first read part to the package contents D1, the second reading to the remaining package contents D1 and D2 of the whole package bag.

If the server receives the TCP sliding window is very small, and the data packets D1 and D2 is relatively large,

It is likely to occur in a fifth possible, the server several times to the D1 and D2 packet reception completely, occur several times during the unpacking.

The causes of

1: write byte size written application is greater than the transmit socket buffer size.

2: TCP segment size MSS

3: payoload Ethernet frame exceeds the MTU fragmentation

Solving strategies

Because the underlying TCP upper layer service data can not be understood, it is not guaranteed at the bottom of the packet is not split and recombined,

This problem can only be solved by the application of the upper layer protocol stack design solutions based on the industry's mainstream protocols:

1: a message length, for example, the size of each packet is a fixed length of 200 bytes, if not, gap fill space;

2: added at the end carriage returns in the package is divided, for example, the FTP protocol

3: the message into the message header and a message body, message header comprises the total length of the message body field representation,

Typically design ideas to the first field of the header used to indicate the message length 32int

4: more complex application layer protocol

LineBasedFrameDecoder and StringDecoder Principle Analysis

LineBasedFrameDecoder is that it works sequentially convenient ByteBuffer readable byte, is determined to see if \ n or \ R & lt \ n,

If so, this is the end position, from the position of the end section to read the index of the byte on the composition of one row, which is the end mark newline decoder, supports carrying terminator or terminators to both decoder does not carry a , and support a maximum configuration of a single line, if the maximum length of continuous reading still found no line breaks will throw an exception, the same is read before ignore abnormal stream

StringDecoder function is very simple, that is received is converted into a string object, and then continue to call back Handler.

LineBasedFrameDecoder + StringDecoder composition is a text row decoder switch, which is designed to support the TCP stick package and unpacking.

Java serialization There are two main purposes:

1: Network transmission 2: persistent objects

Java serialization has been provided from JDK1.1, and need only implement java.io.Serializable ID to generate a sequence

Disadvantages:

1: Can not cross-language, because the technology is Java Java serialization internal proprietary protocol language, other languages are not supported for users, he is completely black box. For byte array of Java serialization, other languages can not be deserialized.

2: a serialized stream is too large (the size of the encoded binary array JDK serialization mechanism several times larger than the binary-coded) in the same circumstances, the larger the encoded byte array, the more the space when stored, the higher cost of storage hardware, and accounted for more broadband network transmission, resulting in reduced throughput of the system.

3: Sequence of performance is too low under the same circumstances, Java serialization performance is many times in binary code.

Codec selection frame:

1: Google's Protobuf its features are: structured data storage format (XML, JSON, etc.)

2: high performance codec

3: language-independent, platform-independent, and good scalability

4: official support for Java, C ++ and Python in three languages

Protobuf profile data and having a code generation mechanism, the advantage of using a data profile of the data structure are described:

1: textual data structure description language, the language may be realized and independent platforms, particularly suitable for integration between heterogeneous systems

Forward compatibility by sequential identifier field, a protocol may be implemented: 2

3: automatic code generation, writing the same data structure of the C ++ and Java version does not require manual

4: easy to manage and maintain, compared to the code, structured documents easier to manage and maintain

Facebook的 Thrift

For Facebook, the creation of Thrift to solve the transfer of large data traffic between various systems Facebook

As well as between different locales systems require cross-platform characteristics, it can support multiple programming languages Thrift

Thrift can be used as a high-performance communication middleware that supports the sequence of data objects and a plurality of types of RPC services.

Thrift for static data exchange, we need to determine its good data structure,

When the data structure is changed, it is necessary to re-edit the IDL file generated code and compiled.

Thrift universal tool suitable for building large data storage or exchange internal data transmission for large systems,

Json and XML with respect to the performance and transfer size has obvious advantages

There are five major components Thrift

1: Language IDL compiler system, and is responsible for a given IDL file generating corresponding interface code language by a user;

2: TProtocol: RPC protocol layer, can choose a variety of different objects serialization, such as Binary and Json

3: TTransport: RPC transport layer, can also choose to implement different transport layers, such as a socket, NIO, MemoryBuffer

4: TProcessor: as a link between the protocol layer and service implementation provided by the user, the interface is responsible for calling the service implementation

5: TServer: polymerization TProtocol, TTransport, and other objects TProcessor

Thrift supports three typical codec:

1: universal binary codecs

2: compressed binary codecs

3: Optimization option field compression codecs

JBoss Marshalling Introduction

JBoss is a Java API packages serialized object, fixes a lot of problems JDK comes with the sequence of packets, but remain compatible with the java.io.Serializable interface, while adding some adjustable parameters and additional features and these characteristics and parameters can be configured by a factory class, compared to traditional Java serialization mechanism, its advantages are:

1: pluggable parser class provides a more convenient class loading custom policies can be customized through an interface

2: Object pluggable replacement techniques, without going through inheritance manner.

3: pluggable predefined classes cache table can be reduced serialized byte array length of the target sequence to enhance the performance of conventional type.

4: no need to implement jaa.io.Serializable interfaces, you can achieve Java serialization

5: to enhance the performance of the target sequence through cache.

Google Protobug Codec features:

Cross-language, multi-language support, the encoded message is smaller, more conducive to storage and transmission, high performance codecs, support different protocol version before backward compatibility, support and define optional mandatory field.

Java technology practitioner

Published 50 original articles · won praise 2 · Views 2317

Private letter concerns

Stick package / unpacking problem description

Guess you like