OpenResty from entry to proficiency 12-master tips: identify the unique concepts and pits of Lua

12 | Expert Tips: Identify Lua's Unique Concepts and Pitfalls

Hello, I am Wen Ming.

In the previous section, we learned about the table-related library functions in LuaJIT. In addition to these commonly used functions, today I will introduce you some unique or less commonly used concepts of Lua, as well as common Lua pitfalls in OpenResty.

Weak form

The first is 弱表(weak table), which is a unique concept in Lua, related to garbage collection. Like other high-level languages, Lua is automatically garbage collected, you don't need to care about the specific implementation, and you don't need to explicitly GC. Spaces that are not referenced will be automatically reclaimed by the garbage collector.

But simple reference counting is not enough, sometimes we need a more flexible mechanism. For example, we insert a Lua object Foo(table or function) into the table tb, which will generate a reference to this Fooobject . Even if there is no reference elsewhere Foo, tbthe reference to it still exists, so the GC has no way to Fooreclaim the memory occupied by it. At this point, we only have two options:

  • One is manual release Foo;
  • The second is to make it resident in memory.

For example, the following code:

$ resty -e 'local tb = {}
tb[1] = {red}
tb[2] = function() print("func") end
print(#tb) -- 2

collectgarbage()
print(#tb) -- 2

table.remove(tb, 1)
print(#tb) -- 1

However, you definitely don't want the memory to be occupied by unused objects all the time, especially since there is an upper limit of 2G memory in LuaJIT. The timing of manual release is not easy to grasp, and it will increase the complexity of the code.

Then at this time, it is the turn of the weak watch to show its talents. Look at its name, weak table, first of all it is a table, and then all elements in this table are weak references. Concepts are always abstract, let's look at a slightly modified code first:

$ resty -e 'local tb = {}
tb[1] = {red}
tb[2] = function() print("func") end
setmetatable(tb, {__mode = "v"})
print(#tb)  -- 2

collectgarbage()
print(#tb) -- 0
'

As you can see, all unused objects are GCed. Among them, the most important is the following line of code:

setmetatable(tb, {__mode = "v"})

Is it familiar? Isn't this the operation of the metatable! That's right, when a __modefield , the table is a weak table.

  • If the value __modeof is k, it means that this table is a weak reference.
  • If the value __modeof is v, it means that this table is a weak reference.
  • Of course, you can also set it to kvindicate that the keys and values ​​of this table are weak references.

Any of the three weak tables, as long as its or is recycled, the corresponding entire 键值 object will be recycled.

In the above code example, __modethe value of isv an array, and the value of the array is table and function objects, so it can be recycled automatically. However, if you change the value to , there will be no GC. For example, look at the following code:tbvalue__modek

$ resty -e 'local tb = {}
tb[1] = {red}
tb[2] = function() print("func") end
setmetatable(tb, {__mode = "k"})
print(#tb)  -- 2

collectgarbage()
print(#tb) -- 2
'

Please note that here we only valuedemonstrate weak tables that are weak references, that is, weak tables of array type. Naturally, you can also use the object as keya weak table of the hash table type, for example, write as follows:

$ resty -e 'local tb = {}
tb[{color = red}] = "red"
local fc = function() print("func") end
tb[fc] = "func"
fc = nil

setmetatable(tb, {__mode = "k"})
for k,v in pairs(tb) do
     print(v)
end

collectgarbage()
print("----------")
for k,v in pairs(tb) do
     print(v)
end
'

After manually collectgarbage()calling to force GC, tbthe elements in the entire table have all been recycled. Of course, in the actual code, we don't have to call it manually collectgarbage(), it will run automatically in the background, so we don't need to worry about it.

However, since I mentioned collectgarbage()this function, I will say a few more words. This function can actually pass in a number of different options, and the default is collectfull GC. Another useful thing is countthat it can return the size of the memory space occupied by Lua. This statistic is very useful, it can let you see if there is a memory leak, and it can also remind us not to approach the upper limit of 2G.

The code related to the weak table will be more complicated to write in actual application, and it is not easy to understand. Correspondingly, more bugs will be hidden. What are the specific ones? Don't worry, in the following content, I will specifically introduce the memory leak problem caused by using weak tables in an open source project.

Closures and upvalues

Let's look at closures and upvalues. Earlier I emphasized that in Lua, all values ​​are first-class citizens, and so are containing functions. This means that functions can be stored in variables, passed as arguments, and returned by another function. For example, the sample code that appears in the weak table above:

tb[2] = function() print("func") end

In fact, an anonymous function is stored as the value of the table.

In Lua, the definitions of the two functions in the following code are completely equivalent. But note that the latter is to assign a function to a variable, which is also a way we often use:

local function foo() print("foo") end
local foo = fuction() print("foo") end

In addition, Lua supports writing a function inside another function, that is, nested functions, such as the following sample code:

$ resty -e '
local function foo()
     local i = 1
     local function bar()
         i = i + 1
         print(i)
     end
     return bar
end

local fn = foo()
print(fn()) -- 2
'

As you can see, barthis function can read a local variable fooinside iand modify its value, even if the variable is not defined barinside . This feature is called lexical scoping.

In fact, these features of Lua are the basis of closures. The so-called 闭包, simply understood, it is actually a function, but it accesses variables in the lexical scope of another function.

If you look at the definition of closures, all functions in Lua are actually closures, even if you don't nest them. This is because the Lua compiler wraps the Lua script with a main function. For example, the following simple code snippet:

local foo, bar
local function fn()
     foo = 1
     bar = 2
end

After compilation, it will look like this:

function main(...)
     local foo, bar
     local function fn()
         foo = 1
         bar = 2
     end
end

The function fncaptures two local variables of the main function, so it is also a closure.

Of course, we know that the concept of closure exists in many languages. It is not unique to Lua, and you can also compare it to deepen your understanding. Only by understanding closures can you understand the upvalue we are going to talk about next.

upvalue is a unique concept in Lua. Literally, it can be translated into 上面的值. In fact, upvalue is the variable captured in the closure outside its own lexical scope. Or continue to look at the above code:

local foo, bar
local function fn()
     foo = 1
     bar = 2
end

You can see that the function fncaptures two local variables fooand which are not in its own lexical scope bar, and these two variables are actually the upvalue fnof .

common pit

After introducing several concepts in Lua, let me talk about the pits related to Lua encountered in the development of OpenResty.

In the previous content, we mentioned some differences between Lua and other development languages, such as subscript starting from 1, default global variables and so on. In the actual code development of OpenResty, we will encounter more problems related to Lua and LuaJIT, and I will talk about some of the more common ones below.

Here I would like to remind you that even if you know everything , it is inevitable that you will not be impressed until you step on it yourself. Of course, the difference is that you can climb out of the pit more quickly and find the crux of the matter.

Whether the subscript starts from 0 or 1

The first pitfall, Lua's subscript starts from 1, which we mentioned repeatedly before. But I have to say, this is not the whole truth.

Because in LuaJIT, using the ffi.newcreated array, the subscript starts from 0:

local buf = ffi_new("char[?]", 128)

So, if you want to access bufthe , please remember that the subscript starts from 0, not 1. When using FFI to interact with C, we must pay special attention to this place.

regular pattern matching

The second pit is the problem of regular pattern matching. There are two sets of string matching methods in parallel in OpenResty: Lua's own stinglibrary , and ngx.re.*the API provided by OpenResty.

Among them, Lua regular pattern matching is its own unique format, which is different from PCRE. Here is a simple example:

resty -e 'print(string.match("foo 123 bar", "%d%d%d"))'  — 123

This code extracts the number part from the string, you will find that it is completely different from our familiar regular expressions. The regular matching library that comes with Lua not only has high code maintenance costs, but also has low performance—it cannot be JITed, and patterns that have been compiled once will not be cached.

Therefore, when you use Lua's built-in string library to do find and match operations, if you have a need for regular expressions, don't hesitate to use OpenResty ngx.reinstead Only when looking for a fixed string, we consider using the plain mode to call the string library.

Here I have a suggestion: In OpenResty, we always use the OpenResty API first, then the LuaJIT API, and use the Lua library with caution .

Array and dict cannot be distinguished when encoding json

The third pitfall is that array and dict cannot be distinguished when encoding json. Since there is only a data structure of table in Lua, when json encodes an empty table, it is naturally impossible to determine whether the encoding is an array or a dictionary:

resty -e 'local cjson = require "cjson"
local t = {}
print(cjson.encode(t))
'

For example, the above code, its output is {}, it can be seen that the cjson library of OpenResty encodes the empty table as a dictionary by default. Of course, we can use encode_empty_table_as_objectthis function to modify the global default value:

resty -e 'local cjson = require "cjson"
cjson.encode_empty_table_as_object(false)
local t = {}
print(cjson.encode(t))
'

This time, the empty table is encoded as an array: [].

However, the impact of this global setting is relatively large, so can you specify the encoding rules of a certain table? The answer is naturally yes, and we have two ways to do it.

The first method is to assign cjson.empty_arraythis userdata to the specified table. In this way, when json encoding, it will be treated as an empty array:

$ resty -e 'local cjson = require "cjson"
local t = cjson.empty_array
print(cjson.encode(t))
'

However, sometimes we are not sure whether the specified table is always empty. We want to encode it as an array when it is empty, so we need to use cjson.empty_array_mtthis function, which is our second method.

It marks the specified table and encodes it as an array when the table is empty. From cjson.empty_array_mtthe naming, you can also see that it is set through metatable, such as the following code operation:

$ resty -e 'local cjson = require "cjson"
local t = {}
setmetatable(t, cjson.empty_array_mt)
print(cjson.encode(t))
t = {123}
print(cjson.encode(t))
'

You can execute this code locally to see if the output is as you expected.

Variable number limit

Let's look at the fourth pit, the problem of limiting the number of variables. In Lua, the number of local variables of a function and the number of upvalues ​​are both limited. You can get confirmation from the source code of Lua:


/*
@@ LUAI_MAXVARS is the maximum number of local variables per function
@* (must be smaller than 250).
*/
#define LUAI_MAXVARS            200

/*
@@ LUAI_MAXUPVALUES is the maximum number of upvalues per function
@* (must be smaller than 250).
*/
#define LUAI_MAXUPVALUES        60

These two thresholds are hard-coded to 200 and 60, respectively. Although you can manually modify the source code to adjust these two values, the maximum can only be set to 250.

Under normal circumstances, we will not exceed this threshold, but when writing OpenResty code, you still have to pay attention to this matter, do not use too much local variables and upvalue, but use as much as possible to do .. endmake The number of variables and upvalues.

For example, let's look at the following pseudocode:

local re_find = ngx.re.find
  function foo() ... end
function bar() ... end
function fn() ... end

If only the function foois used re_find, then we can modify it like this:

do
     local re_find = ngx.re.find
     function foo() ... end
end
function bar() ... end
function fn() ... end

In this way, at mainthe function level, re_findthis local variable is missing. This is an optimization trick within a single large Lua file.

write at the end

From the perspective of "asking a few more reasons", where does the threshold of 250 in Lua come from? This is our thinking question today. You are welcome to leave a message to express your opinion, and you are also welcome to share this article with your colleagues and friends. We can communicate and make progress together.

Guess you like

Origin blog.csdn.net/fegus/article/details/130720243