12 | Expert Tips: Identify Lua's Unique Concepts and Pitfalls
Hello, I am Wen Ming.
In the previous section, we learned about the table-related library functions in LuaJIT. In addition to these commonly used functions, today I will introduce you some unique or less commonly used concepts of Lua, as well as common Lua pitfalls in OpenResty.
Weak form
The first is 弱表
(weak table), which is a unique concept in Lua, related to garbage collection. Like other high-level languages, Lua is automatically garbage collected, you don't need to care about the specific implementation, and you don't need to explicitly GC. Spaces that are not referenced will be automatically reclaimed by the garbage collector.
But simple reference counting is not enough, sometimes we need a more flexible mechanism. For example, we insert a Lua object Foo
(table or function) into the table tb
, which will generate a reference to this Foo
object . Even if there is no reference elsewhere Foo
, tb
the reference to it still exists, so the GC has no way to Foo
reclaim the memory occupied by it. At this point, we only have two options:
- One is manual release
Foo
; - The second is to make it resident in memory.
For example, the following code:
$ resty -e 'local tb = {}
tb[1] = {red}
tb[2] = function() print("func") end
print(#tb) -- 2
collectgarbage()
print(#tb) -- 2
table.remove(tb, 1)
print(#tb) -- 1
However, you definitely don't want the memory to be occupied by unused objects all the time, especially since there is an upper limit of 2G memory in LuaJIT. The timing of manual release is not easy to grasp, and it will increase the complexity of the code.
Then at this time, it is the turn of the weak watch to show its talents. Look at its name, weak table, first of all it is a table, and then all elements in this table are weak references. Concepts are always abstract, let's look at a slightly modified code first:
$ resty -e 'local tb = {}
tb[1] = {red}
tb[2] = function() print("func") end
setmetatable(tb, {__mode = "v"})
print(#tb) -- 2
collectgarbage()
print(#tb) -- 0
'
As you can see, all unused objects are GCed. Among them, the most important is the following line of code:
setmetatable(tb, {__mode = "v"})
Is it familiar? Isn't this the operation of the metatable! That's right, when a __mode
field , the table is a weak table.
- If the value
__mode
of isk
, it means that this table键
is a weak reference. - If the value
__mode
of isv
, it means that this table值
is a weak reference. - Of course, you can also set it to
kv
indicate that the keys and values of this table are weak references.
Any of the three weak tables, as long as its 键
or 值
is recycled, the corresponding entire 键值
object will be recycled.
In the above code example, __mode
the value of isv
an array, and the value of the array is table and function objects, so it can be recycled automatically. However, if you change the value to , there will be no GC. For example, look at the following code:tb
value
__mode
k
$ resty -e 'local tb = {}
tb[1] = {red}
tb[2] = function() print("func") end
setmetatable(tb, {__mode = "k"})
print(#tb) -- 2
collectgarbage()
print(#tb) -- 2
'
Please note that here we only value
demonstrate weak tables that are weak references, that is, weak tables of array type. Naturally, you can also use the object as key
a weak table of the hash table type, for example, write as follows:
$ resty -e 'local tb = {}
tb[{color = red}] = "red"
local fc = function() print("func") end
tb[fc] = "func"
fc = nil
setmetatable(tb, {__mode = "k"})
for k,v in pairs(tb) do
print(v)
end
collectgarbage()
print("----------")
for k,v in pairs(tb) do
print(v)
end
'
After manually collectgarbage()
calling to force GC, tb
the elements in the entire table have all been recycled. Of course, in the actual code, we don't have to call it manually collectgarbage()
, it will run automatically in the background, so we don't need to worry about it.
However, since I mentioned collectgarbage()
this function, I will say a few more words. This function can actually pass in a number of different options, and the default is collect
full GC. Another useful thing is count
that it can return the size of the memory space occupied by Lua. This statistic is very useful, it can let you see if there is a memory leak, and it can also remind us not to approach the upper limit of 2G.
The code related to the weak table will be more complicated to write in actual application, and it is not easy to understand. Correspondingly, more bugs will be hidden. What are the specific ones? Don't worry, in the following content, I will specifically introduce the memory leak problem caused by using weak tables in an open source project.
Closures and upvalues
Let's look at closures and upvalues. Earlier I emphasized that in Lua, all values are first-class citizens, and so are containing functions. This means that functions can be stored in variables, passed as arguments, and returned by another function. For example, the sample code that appears in the weak table above:
tb[2] = function() print("func") end
In fact, an anonymous function is stored as the value of the table.
In Lua, the definitions of the two functions in the following code are completely equivalent. But note that the latter is to assign a function to a variable, which is also a way we often use:
local function foo() print("foo") end
local foo = fuction() print("foo") end
In addition, Lua supports writing a function inside another function, that is, nested functions, such as the following sample code:
$ resty -e '
local function foo()
local i = 1
local function bar()
i = i + 1
print(i)
end
return bar
end
local fn = foo()
print(fn()) -- 2
'
As you can see, bar
this function can read a local variable foo
inside i
and modify its value, even if the variable is not defined bar
inside . This feature is called lexical scoping.
In fact, these features of Lua are the basis of closures. The so-called 闭包
, simply understood, it is actually a function, but it accesses variables in the lexical scope of another function.
If you look at the definition of closures, all functions in Lua are actually closures, even if you don't nest them. This is because the Lua compiler wraps the Lua script with a main function. For example, the following simple code snippet:
local foo, bar
local function fn()
foo = 1
bar = 2
end
After compilation, it will look like this:
function main(...)
local foo, bar
local function fn()
foo = 1
bar = 2
end
end
The function fn
captures two local variables of the main function, so it is also a closure.
Of course, we know that the concept of closure exists in many languages. It is not unique to Lua, and you can also compare it to deepen your understanding. Only by understanding closures can you understand the upvalue we are going to talk about next.
upvalue is a unique concept in Lua. Literally, it can be translated into 上面的值
. In fact, upvalue is the variable captured in the closure outside its own lexical scope. Or continue to look at the above code:
local foo, bar
local function fn()
foo = 1
bar = 2
end
You can see that the function fn
captures two local variables foo
and which are not in its own lexical scope bar
, and these two variables are actually the upvalue fn
of .
common pit
After introducing several concepts in Lua, let me talk about the pits related to Lua encountered in the development of OpenResty.
In the previous content, we mentioned some differences between Lua and other development languages, such as subscript starting from 1, default global variables and so on. In the actual code development of OpenResty, we will encounter more problems related to Lua and LuaJIT, and I will talk about some of the more common ones below.
Here I would like to remind you that even if you know everything 坑
, it is inevitable that you will not be impressed until you step on it yourself. Of course, the difference is that you can climb out of the pit more quickly and find the crux of the matter.
Whether the subscript starts from 0 or 1
The first pitfall, Lua's subscript starts from 1, which we mentioned repeatedly before. But I have to say, this is not the whole truth.
Because in LuaJIT, using the ffi.new
created array, the subscript starts from 0:
local buf = ffi_new("char[?]", 128)
So, if you want to access buf
the , please remember that the subscript starts from 0, not 1. When using FFI to interact with C, we must pay special attention to this place.
regular pattern matching
The second pit is the problem of regular pattern matching. There are two sets of string matching methods in parallel in OpenResty: Lua's own sting
library , and ngx.re.*
the API provided by OpenResty.
Among them, Lua regular pattern matching is its own unique format, which is different from PCRE. Here is a simple example:
resty -e 'print(string.match("foo 123 bar", "%d%d%d"))' — 123
This code extracts the number part from the string, you will find that it is completely different from our familiar regular expressions. The regular matching library that comes with Lua not only has high code maintenance costs, but also has low performance—it cannot be JITed, and patterns that have been compiled once will not be cached.
Therefore, when you use Lua's built-in string library to do find and match operations, if you have a need for regular expressions, don't hesitate to use OpenResty ngx.re
instead Only when looking for a fixed string, we consider using the plain mode to call the string library.
Here I have a suggestion: In OpenResty, we always use the OpenResty API first, then the LuaJIT API, and use the Lua library with caution .
Array and dict cannot be distinguished when encoding json
The third pitfall is that array and dict cannot be distinguished when encoding json. Since there is only a data structure of table in Lua, when json encodes an empty table, it is naturally impossible to determine whether the encoding is an array or a dictionary:
resty -e 'local cjson = require "cjson"
local t = {}
print(cjson.encode(t))
'
For example, the above code, its output is {}
, it can be seen that the cjson library of OpenResty encodes the empty table as a dictionary by default. Of course, we can use encode_empty_table_as_object
this function to modify the global default value:
resty -e 'local cjson = require "cjson"
cjson.encode_empty_table_as_object(false)
local t = {}
print(cjson.encode(t))
'
This time, the empty table is encoded as an array: []
.
However, the impact of this global setting is relatively large, so can you specify the encoding rules of a certain table? The answer is naturally yes, and we have two ways to do it.
The first method is to assign cjson.empty_array
this userdata to the specified table. In this way, when json encoding, it will be treated as an empty array:
$ resty -e 'local cjson = require "cjson"
local t = cjson.empty_array
print(cjson.encode(t))
'
However, sometimes we are not sure whether the specified table is always empty. We want to encode it as an array when it is empty, so we need to use cjson.empty_array_mt
this function, which is our second method.
It marks the specified table and encodes it as an array when the table is empty. From cjson.empty_array_mt
the naming, you can also see that it is set through metatable, such as the following code operation:
$ resty -e 'local cjson = require "cjson"
local t = {}
setmetatable(t, cjson.empty_array_mt)
print(cjson.encode(t))
t = {123}
print(cjson.encode(t))
'
You can execute this code locally to see if the output is as you expected.
Variable number limit
Let's look at the fourth pit, the problem of limiting the number of variables. In Lua, the number of local variables of a function and the number of upvalues are both limited. You can get confirmation from the source code of Lua:
/*
@@ LUAI_MAXVARS is the maximum number of local variables per function
@* (must be smaller than 250).
*/
#define LUAI_MAXVARS 200
/*
@@ LUAI_MAXUPVALUES is the maximum number of upvalues per function
@* (must be smaller than 250).
*/
#define LUAI_MAXUPVALUES 60
These two thresholds are hard-coded to 200 and 60, respectively. Although you can manually modify the source code to adjust these two values, the maximum can only be set to 250.
Under normal circumstances, we will not exceed this threshold, but when writing OpenResty code, you still have to pay attention to this matter, do not use too much local variables and upvalue, but use as much as possible to do .. end
make The number of variables and upvalues.
For example, let's look at the following pseudocode:
local re_find = ngx.re.find
function foo() ... end
function bar() ... end
function fn() ... end
If only the function foo
is used re_find
, then we can modify it like this:
do
local re_find = ngx.re.find
function foo() ... end
end
function bar() ... end
function fn() ... end
In this way, at main
the function level, re_find
this local variable is missing. This is an optimization trick within a single large Lua file.
write at the end
From the perspective of "asking a few more reasons", where does the threshold of 250 in Lua come from? This is our thinking question today. You are welcome to leave a message to express your opinion, and you are also welcome to share this article with your colleagues and friends. We can communicate and make progress together.