OpenResty from entry to proficiency in the communication magic weapon between 18-workers: the most important data structure shareddict

18 | Communication magic weapon between workers: shared dict of the most important data structure

Hello, I am Wen Ming.

As we said earlier, in Lua, table is the only data structure. A corresponding fact is that the shared memory dictionary shared dict is the most important data structure in your OpenResty programming. It not only supports data storage and reading, but also supports atomic counting and queue operations.

Based on shared dict, you can implement caching and communication between multiple workers, as well as functions such as current and speed limit, traffic statistics, etc. You can use shared dict as a simple Redis, but the data in shared dict cannot be persisted, so you must consider the loss of data stored in it.

Several ways of data sharing

In the process of writing OpenResty Lua code, you will inevitably encounter situations where data is shared between different workers at different stages of a request, and may also need to be shared between Lua and C code.

Therefore, before officially introducing the shared dict API, let us first understand several common data sharing methods in OpenResty; and learn to choose a more appropriate data sharing method according to the actual situation.

The first is variables in Nginx . It can share data between Nginx C modules, and naturally, it can also share data between C modules and lua-nginx-modulethose , such as the following code:

location /foo {
     set $my_var ''; # this line is required to create $my_var at config time
     content_by_lua_block {
         ngx.var.my_var = 123;
         ...
     }
 }

However, using Nginx variables to share data is slow because it involves hash lookup and memory allocation. At the same time, this method has its limitations, it can only be used to store strings, and cannot support complex Lua types.

The second is that ngx.ctxdata can be shared between different stages of the same request . It is actually an ordinary Lua table, so it is very fast and can store various Lua objects. Its life cycle is at the request level, and when a request ends, it ngx.ctxwill also be destroyed.

Here is a typical usage scenario where we use ngx.ctxto cache Nginx 变量this expensive call and make it available at different stages:

location /test {
     rewrite_by_lua_block {
         ngx.ctx.host = ngx.var.host
     }
     access_by_lua_block {
        if (ngx.ctx.host == 'openresty.org') then
            ngx.ctx.host = 'test.com'
        end
     }
     content_by_lua_block {
         ngx.say(ngx.ctx.host)
     }
 }

At this time, if you use curl to access:

curl -i 127.0.0.1:8080/test -H 'host:openresty.org'

will print out test.com, which can show ngx.ctxthat the data is indeed shared at different stages. Of course, you can also modify the above example by yourself to save more complex objects such as table instead of simple strings to see if it meets your expectations.

However, it is important to note here that because the life cycle ngx.ctxof is at the request level, it cannot be cached at the module level. For example, it is wrong for me to use this in foo.luathe file :

local ngx_ctx = ngx.ctx

local function bar()
    ngx_ctx.host =  'test.com'
end

We should call and cache at the function level:

local ngx = ngx

local function bar()
    ngx_ctx.host =  'test.com'
end

ngx.ctxThere are still many details, and we will continue to discuss in the performance optimization part later.

Looking down, the third method is to use 模块级别的变量, to share data between all requests in the same worker . ngx.ctxUnlike the previous Nginx variables and , this method is somewhat difficult to understand. But don't worry, the concept is abstract, the code first, let's look at an example first to understand what is 模块级别的变量:

-- mydata.lua
 local _M = {}

 local data = {
     dog = 3,
     cat = 4,
     pig = 5,
 }

 function _M.get_age(name)
     return data[name]
 end

 return _M

The configuration in nginx.conf is as follows:

location /lua {
     content_by_lua_block {
         local mydata = require "mydata"
         ngx.say(mydata.get_age("dog"))
     }
 }

In this example, mydatait is a module that will only be loaded once by the worker process. After that, all requests processed by the worker will share the code and data of mydatathe module .

Naturally, thismydata variable in the module is located at the top level of the module, that is, at the very beginning of the module, and all functions can access it.data模块级别的变量

So, you can put data that needs to be shared between requests in top level variables of the module. However, special attention should be paid to the fact that generally we only use this method to save read-only data . If it involves writing operations, you have to be very careful, because there may be race conditions , which are very difficult to locate bugs .

We can understand this through the following simplified example:

-- mydata.lua
 local _M = {}

 local data = {
     dog = 3,
     cat = 4,
     pig = 5,
 }

 function _M.incr_age(name)
     data[name]  = data[name] + 1
    return data[name]
 end

 return _M

In the module, we have added incr_agethis function, which will modify the data in the data table.

Then, in the calling code, we added the most critical line ngx.sleep(5), this sleep is a yield operation:

location /lua {
     content_by_lua_block {
         local mydata = require "mydata"
         ngx.say(mydata. incr_age("dog"))
         ngx.sleep(5) -- yield API
         ngx.say(mydata. incr_age("dog"))
     }
 }

If there is no this line of sleep code (it can also be other non-blocking IO operations, such as accessing Redis, etc.), there will be no yield operation, and there will be no competition. Then, the final output numbers are sequential.

But after we add this line of code, even within 5 seconds of sleep, other requests may call mydata. incr_agethe function modify the value of the variable, resulting in discontinuity of the final output numbers. You must know that in the actual code, the logic will not be so simple, and the location of the bug will definitely be much more difficult.

So, unless you are very sure that there is no yield operation in the middle and will not give control to the Nginx event loop, otherwise, I suggest you keep the module-level variables read-only.

The fourth and final method uses a shared dict to share data that can be shared among multiple workers.

This method is based on a red-black tree, which has good performance, but it also has its own limitations - you must declare the size of the shared memory in the Nginx configuration file in advance, and this cannot be changed at runtime:

lua_shared_dict dogs 10m;

Shared dict also can only cache data of string type, and does not support complex Lua data types. This also means that when I need to store complex data types such as tables, I will have to use json or other methods to serialize and deserialize, which will naturally bring a lot of performance loss.

In short, again, there is no silver bullet here, and there is no perfect data sharing method. You need to combine multiple methods according to your needs and scenarios.

shared dictionary

We spent a lot of space on the data sharing part above, and some people may wonder: they seem to have no direct relationship with shared dict, is there something wrong with the text?

This is not the case, you can think for yourself, why does shared dict exist in OpenResty?

Recalling the several methods just mentioned, the scope of the first three data sharing is at the request level, or at the level of a single worker. Therefore, in the current implementation of OpenResty, only shared dict can complete the data sharing between workers and realize the communication between workers, which is also the value of its existence.

In my opinion, understanding why a technology exists, and figuring out the differences and advantages between it and other similar technologies, is far more important than just calling the API it provides proficiently. This technical vision will bring you a certain degree of foresight and insight, which can also be said to be an important difference between engineers and architects.

Going back to the shared dictionary itself, it provides more than 20 Lua APIs, but all of these APIs are atomic operations, so you don't have to worry about competition between multiple workers and high concurrency.

These APIs have official detailed documents , so I won't go into details one by one. Here I want to emphasize again that the study of any technical courses cannot replace the careful study of official documents. These time-consuming stupid efforts cannot be saved by everyone.

Continue to look at the APIs of shared dict. These APIs can be divided into the following three categories, namely dictionary reading and writing, queue operation and management.

Dictionary reading and writing class

First look at the dictionary reading and writing class. In the initial version, there are only dictionary read and write APIs, which are also the most commonly used functions for shared dictionaries. Here is the simplest example:

$ resty --shdict='dogs 1m' -e 'local dict = ngx.shared.dogs
                               dict:set("Tom", 56)
                               print(dict:get("Tom"))'

In addition to set, OpenResty also provides safe_set, add, safe_add, replacefour writing methods. safeThe meaning of the prefix here is that when the memory is full, the old data is not eliminated according to the LRU, but the writing fails and no memoryan error message is returned.

In addition to get, OpenResty also get_staleprovides the method of reading data. Compared with getthe method, it has a return value of expired data:

value, flags, stale = ngx.shared.DICT:get_stale(key)

You can also call deletethe method to delete the specified key, which set(key, nil)is equivalent to .

Queue operation class

Let's look at the queue operation again, which is a new function added to OpenResty and provides an interface similar to Redis. Each element in the queue is described ngx_http_lua_shdict_list_node_tby :

typedef struct {
    ngx_queue_t queue;
    uint32_t value_len;
    uint8_t value_type;
    u_char data[1];
} ngx_http_lua_shdict_list_node_t;

I posted the PR of these queue operation APIs in the article. If you are interested in this, you can follow the documentation, test cases and source code to analyze the specific implementation.

However, there are no corresponding code examples in the documentation for the following five queue APIs, so let me briefly introduce them here:

  • lpush/rpush, which means adding elements at both ends of the queue;
  • lpop/rpop, means to pop elements at both ends of the queue;
  • llen, indicating the number of elements in the returned queue.

Don't forget another powerful tool we talked about in the last class-test cases. If not in the documentation, we can usually find the corresponding code in the test case. Queue related tests are in 145-shdict-list.tthis file:

=== TEST 1: lpush & lpop
--- http_config
    lua_shared_dict dogs 1m;
--- config
    location = /test {
        content_by_lua_block {
            local dogs = ngx.shared.dogs

            local len, err = dogs:lpush("foo", "bar")
            if len then
                ngx.say("push success")
            else
                ngx.say("push err: ", err)
            end

            local val, err = dogs:llen("foo")
            ngx.say(val, " ", err)

            local val, err = dogs:lpop("foo")
            ngx.say(val, " ", err)

            local val, err = dogs:llen("foo")
            ngx.say(val, " ", err)

            local val, err = dogs:lpop("foo")
            ngx.say(val, " ", err)
        }
    }
--- request
GET /test
--- response_body
push success
1 nil
bar nil
0 nil
nil nil
--- no_error_log
[error]

Management

Finally, the management API is also added later, which belongs to the demand of the community with relatively high voice. Among them, the usage of shared memory is the most typical example. For example, if the user has applied for 100M space as a shared dict, is this 100M enough? How many keys are stored in it? What are the specific keys? These are very real problems.

For such problems, OpenResty's official attitude is to hope that users use flame graphs to solve them, that is, non-intrusive, and keep the code base efficient and clean, rather than providing intrusive APIs to directly return results.

But from the perspective of user-friendliness, these management APIs are still very necessary. After all, open source projects are used to solve product needs, not to demonstrate the technology itself. So, let's take a look at these subsequent management APIs.

First of all get_keys(max_count?), it only returns the first 1024 keys by default; if you max_countset it to 0, it will return all keys.

Then there are capacityand free_space, both of these two APIs belong to the lua-resty-core warehouse, so you need to require them before they can be used:

require "resty.core.shdict"

 local cats = ngx.shared.cats
 local capacity_bytes = cats:capacity()
 local free_page_bytes = cats:free_space()

They respectively return the size of the shared memory (that is, the size configured lua_shared_dictin ) and the number of bytes of free pages. Because the shared dict is allocated according to the page, even if free_spaceit returns 0, there may be space in the allocated page, so its return value cannot represent the actual occupied situation of the shared memory.

write at the end

In actual development, we often use multi-level caches, and OpenResty's official project also encapsulates caches. Can you find out which projects it is? Or do you know some other lua-resty library that wraps caching?

Welcome to leave a message to share with me, and you are welcome to share this article with your colleagues and friends. We can communicate and make progress together.

Guess you like

Origin blog.csdn.net/fegus/article/details/130720444