Comprehensive analysis of memcached--3.Memcached deletion mechanism and development direction

Memcached is a cache, so the data will not be permanently stored on the server, which is the premise of introducing memcached into the system.
This time I introduce the data deletion mechanism of memcached, as well as the latest development direction of memcached - Binary Protocol and external engine support.

memcached efficiently utilizes resources in terms of data deletion

Data doesn't really disappear from memcached

As mentioned last time , memcached does not free allocated memory. After the record times out, the client can no longer see the record (invisible, transparent), and
its storage space can be reused.

Lazy Expiration

Memcached does not monitor whether the record expires internally, but checks the timestamp of the record when getting it to check whether the record expires.
This technique is called lazy (lazy) expiration. Therefore, memcached does not spend CPU time on expiration monitoring.

LRU: Principles of Efficiently Removing Data from the Cache

Memcached will preferentially use the space of the records that have timed out, but even so, there will be insufficient space when adding new records.
At this time, the mechanism called Least Recently Used (LRU) is used to allocate space. As the name suggests, this is the mechanism for deleting the "least recently used" records. Therefore, when the memory space of memcached is insufficient (when the new space cannot be obtained from the slab class ), it searches from the recently unused records and allocates its space to the new records. From a caching practical point of view, this model is ideal.

However, in some cases the LRU mechanism can cause trouble. LRU can be disabled by the "-M" parameter when memcached starts, as follows:

$ memcached -M -m 1024

It must be noted at startup that the lowercase "-m" option is used to specify the maximum memory size. If no specific value is specified, the default value of 64MB is used.

After starting with the "-M" parameter specified, memcached will return an error when the memory runs out. Having said that, memcached is not a memory after all, but a cache, so it is recommended to use LRU.

The latest development direction of memcached

There are two big targets on the memcached roadmap. One is the planning and implementation of the binary protocol, and the other is the loading function of the external engine.

About the binary protocol

The reason for using the binary protocol is that it does not require parsing and processing of the text protocol, which improves the performance of the original high-speed memcached and reduces the vulnerabilities of the text protocol. Most of it has been implemented so far, and the function has been included in the code base for development. There is a link to the codebase on the download page for memcached.

Format of the binary protocol

Protocol packets are 24-byte frames followed by keys and Unstructured Data. The actual format is as follows (quoted from the protocol documentation):

 Byte/     0       |       1       |       2       |       3       |   
    /              |               |               |               |   
   |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
   +---------------+---------------+---------------+---------------+
  0/ HEADER                                                        /   
   /                                                               /   
   /                                                               /   
   /                                                               /   
   +---------------+---------------+---------------+---------------+
 24/ COMMAND-SPECIFIC EXTRAS (as needed)                           /   
  +/  (note length in th extras length header field)               /   
   +---------------+---------------+---------------+---------------+
  m/ Key (as needed)                                               /   
  +/  (note length in key length header field)                     /   
   +---------------+---------------+---------------+---------------+
  n/ Value (as needed)                                             /   
  +/  (note length is total body length header field, minus        /   
  +/   sum of the extras and key length body fields)               /   
   +---------------+---------------+---------------+---------------+
  Total 24 bytes

As shown above, the package format is quite simple. It should be noted that the header (HEADER) that occupies 16 bytes is divided into two types: Request Header and Response Header. The header contains the Magic byte, command type, key length, value length and other information indicating the validity of the packet. The format is as follows:

Request Header

 Byte/     0       |       1       |       2       |       3       |
    /              |               |               |               |
   |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
   +---------------+---------------+---------------+---------------+
  0| Magic         | Opcode        | Key length                    |
   +---------------+---------------+---------------+---------------+
  4| Extras length | Data type     | Reserved                      |
   +---------------+---------------+---------------+---------------+
  8| Total body length                                             |
   +---------------+---------------+---------------+---------------+
 12| Opaque                                                        |
   +---------------+---------------+---------------+---------------+
 16| CAS                                                           |
   |                                                               |
   +---------------+---------------+---------------+---------------+

Response Header

 Byte/     0       |       1       |       2       |       3       |
    /              |               |               |               |
   |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
   +---------------+---------------+---------------+---------------+
  0| Magic         | Opcode        | Key Length                    |
   +---------------+---------------+---------------+---------------+
  4| Extras length | Data type     | Status                        |
   +---------------+---------------+---------------+---------------+
  8| Total body length                                             |
   +---------------+---------------+---------------+---------------+
 12| Opaque                                                        |
   +---------------+---------------+---------------+---------------+
 16| CAS                                                           |
   |                                                               |
   +---------------+---------------+---------------+---------------+

If you want to know the details of each part, you can checkout the code tree of the binary protocol of memcached, refer to the protocol_binary.txt document in the docs folder.

My impression after seeing the HEADER format is that the upper limit of the key is too large! In the current memcached specification, the maximum key length is 250 bytes, but the size of the key in the binary protocol is represented by 2 bytes. Therefore, in theory a key of up to 65536 bytes (2 16 ) long can be used. Although keys larger than 250 bytes are not very common, huge keys can be used once the binary protocol is released.

The binary protocol is supported from the next version 1.3 series.

External engine support

I experimented with making memcached's storage layer pluggable last year.

Brian Aker of MySQL saw the makeover and posted the code to the memcached mailing list.
The developers of memcached were also very interested and put it in the roadmap. It is now being
developed in collaboration with memcached developer Trond Norbye (spec design, implementation and testing).
The time difference between collaborative development with foreign countries is a big problem, but with the same vision, the prototype of the scalable architecture can finally be released. The codebase can be accessed from the memcached download page .

The need for external engine support

There are many derivatives of memcached in the world, and the reason is that they want to permanently save data, achieve data redundancy, etc., even at the expense of some performance. Before I developed memcached, I also considered reinventing memcached in the R&D department of mixi.

The loading mechanism of external engines can encapsulate complex processing such as network functions and event processing of memcached. Therefore, at this stage, the difficulty of cooperating with memcached and the storage engine through coercive means or redesign will disappear, and it will be easy to try various engines.

The key to the success of simple API design

What we value most in this project is API design. Too many functions will make engine developers feel troublesome; too complex, the threshold for implementing the engine will be too high. Therefore, the initial version has only 13 interface functions. The specific content is limited by space, so it is omitted here, and only the operations that the engine should complete are explained:

  • Engine information (version, etc.)
  • engine initialization
  • engine off
  • stats for the engine
  • In terms of capacity, test whether a given record can be saved
  • Allocate memory for the item (record) structure
  • free item (record) memory
  • Delete Record
  • record keeping
  • Recycling records
  • Timestamp of the updated record
  • Mathematical operation processing
  • data flush

Readers who are interested in detailed specifications can checkout the code of the engine project, engine.h in the reader.

Re-examine the current system

The difficulty with memcached supporting external storage is that the code related to networking and event processing (core server) is closely tied to the code stored in
memory . This phenomenon is also known as tightly coupled (tightly coupled). In-memory code must be isolated from the core server to flexibly support external engines. Therefore, based on the API we designed, memcached was refactored to look like this:

memcached-0003-001.png

After the refactoring, we compared the performance with version 1.2.5, binary protocol support version, etc., and confirmed that it will not cause performance impact.

When considering how to support external engine loading, it is the easiest solution for memcached to perform concurrency control, but for the engine, concurrency control is the essence of performance, so we adopted the multi-threading support completely handed over to engine design.

Future improvements will make memcached more widely used.

Summarize

This time, I introduced the timeout principle of memcached, how to delete data internally, etc. On top of this, I introduced the latest development direction of memcached such as binary protocol and external engine support. These features will not be supported until version 1.3, so stay tuned!

This is my last post in this series. Thank you all for reading my article!

Next time, Nagano will introduce the application knowledge and application compatibility of memcached.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325767902&siteId=291194637