Lua layer message processing of skynet source code analysis

The Lua layer message processing mechanism is in lualib/skynet.lua, which provides most of the Lua layer APIs (which will eventually call the c layer APIs), including the processing of the Lua layer when starting a snlua service, creating new services, registering service agreements, how to Sending messages, how to deal with the messages sent by the other party, etc. This article mainly introduces the message processing mechanism to understand how skynet achieves high concurrency.

For simplicity, the coroutine_resume and coroutine_yield used in the code can be regarded as coroutine.resume and coroutine.yield.

local coroutine_resume = profile.resume
local coroutine_yield = profile.yield

1. Coroutine

coroutine.create, create a co, the only parameter is the closure f to be executed by co, and the closure f will not be executed at this time

coroutine.resume, execute a co, the first parameter is the handle of co, if it is the first execution, other parameters are passed to the closure f. After co is started, it continues to execute until it terminates or yields. Normal termination, return true and the return value of closure f; if an error occurs, abnormal termination, false and error message are returned

coroutine.yield, to suspend co and surrender the right of execution. Corresponding to the most recent resume will return immediately, returning true and yield parameters. The next time you resume the same co, the execution will continue from the point of yield. At this time, the yield call will return immediately, and the return value will be resume parameters other than the first parameter

Quoting Lua documents to introduce the classic example of coroutine (referred to as co), it can be seen that co can be continuously suspended and restarted. Skynet widely uses co. When sending an rpc request, it will suspend the current co and restart it when the other party returns.

 

2. How skynet creates a coroutine

Let me first explain how skynet creates a coroutine (co), and create a coroutine through the api of co_create(f). This code is very interesting. For performance, skynet puts the created co in the cache (line 9). When the coroutine finishes executing the process (closure f), it will not terminate, but will pause (line 10). When the caller calls the co_create api, if it is not in the cache, create a co through coroutine.create. At this time, the closure f will not be executed, and then at a certain moment (usually when a message is received, the skynet.dispatch_message is called) Will restart (with the required parameters) this co, and then execute the closure f (line 6), and finally pause to wait for the next use, corresponding to the most recent resume return true and "EXIT" (line 10); if it is one Reuse the co, restart co (line 15, the parameter is the closure f to be executed), yield will immediately return and assign the closure to f (line 10), and pause again at line 11, and at some point it will Restart (with the required parameters) this co, then co executes the closure f (line 11), and finally pauses on line 10 for the next use.

 1 -- lualib/skynet.lua
 2 local function co_create(f)
 3     local co = table.remove(coroutine_pool)
 4     if co == nil then
 5         co = coroutine.create(function(...)
 6             f(...)
 7             while true do
 8                 f = nil
 9                 coroutine_pool[#coroutine_pool+1] = co
10                 f = coroutine_yield "EXIT"
11                 f(coroutine_yield())
12             end
13         end)
14     else
15         coroutine_resume(co, f)
16     end
17     return co
18 end

Recommend a Skynet video explanation: https://ke.qq.com/course/2806743?flowToken=1030833 , the explanation is detailed and there are documentation materials for learning, novices and veterans can see it.

3. How to deal with Lua layer messages  

After understanding the principle of co_create, let’s take service A sending a message to service B as an example to illustrate how skynet processes Lua layer messages:

-- A.lua
local skynet = require "skynet"

skynet.start(function()
    print(skynet.call("B", "lua", "aaa"))
end)
-- B.lua
local skynet = require "skynet"
require "skynet.manager"

skynet.start(function()
    skynet.dispatch("lua", function(session, source, ...)
        skynet.ret(skynet.pack("OK"))
    end)
    skynet.register "B"
end)

 At the end of the service start, skynet.start will be called, skynet.start will call skynet.timeout, and a co (line 12) will be created in the timeout, which is called the main coroutine co1 of the service. At this time, co1 will not be executed.

 1  -- lualib/skynet.lua
 2  function skynet.start(start_func)
 3      c.callback(skynet.dispatch_message)
 4      skynet.timeout(0, function()
 5          skynet.init_service(start_func)
 6      end)
 7  end
 8  
 9  function skynet.timeout(ti, func)
10      local session = c.intcommand("TIMEOUT",ti)
11      assert(session)
12      local co = co_create(func)
13      assert(session_id_coroutine[session] == nil)
14      session_id_coroutine[session] = co
15  end

When the timer is triggered (because the timer is set to 0, the next frame will be triggered) will send a "RESPONSE" type (PTYPE_RESPONSE=1) message to the service

// skynet-src/skynet_timer.c
static inline void
dispatch_list(struct timer_node *current) {
    ...
    message.sz = (size_t)PTYPE_RESPONSE << MESSAGE_TYPE_SHIFT;
    ...
}

 After the service receives the message, it calls the message distribution api. Since the message type is RESPONSE, it will eventually execute to line 7. Restart the main coroutine co1 and execute the closure f of co1 (here is skynet.init_service(start_func)). If there is no suspended operation in the closure f, after the closure f is successfully run, co1 is suspended, and resume will return true and " EXIT", next, line 7 becomes, suspend(co, true, "EXIT")

1 -- luablib/skynet.lua
2 local function raw_dispatch_message(prototype, msg, sz, session, source)
3     -- skynet.PTYPE_RESPONSE = 1, read skynet.h
4     if prototype == 1 then
5         local co = session_id_coroutine[session]
6         ...
7         suspend(co, coroutine_resume(co, true, msg, sz))
8     ...
9 end

Then, call suspend, because the type is "EXIT", just do some cleanup work.

-- lualib/skynet.lua
function suspend(co, result, command, param, size)
    ...
    elseif command == "EXIT" then
        -- coroutine exit
        local address = session_coroutine_address[co]
        if address then
            release_watching(address)
            session_coroutine_id[co] = nil
            session_coroutine_address[co] = nil
            session_response[co] = nil
        end
    ...
end

When there is a pause operation in closure f, for example, service A sends the message skynet.call("B", "lua", "aaa") to service B, here are how to handle service A and service B:

For service A:

First send the message in the c layer (line 14, push the message to the secondary message queue of the destination service), then pause co1, resume returns true, "CALL" and session value

 1 -- lualib/skynet.lua
 2 local function yield_call(service, session)
 3     watching_session[session] = service
 4     local succ, msg, sz = coroutine_yield("CALL", session)
 5     watching_session[session] = nil
 6     if not succ then
 7         error "call failed"
 8     end
 9     return msg,sz
10 end
11 
12 function skynet.call(addr, typename, ...)
13     local p = proto[typename]
14     local session = c.send(addr, p.id , nil , p.pack(...))
15     if session == nil then
16         error("call to invalid address " .. skynet.address(addr))
17     end
18     return p.unpack(yield_call(addr, session))
19 end

 Then call suspend(co, true, "CALL", session), the type is "CALL", session is the key, co is the value and stored in session_id_coroutine, so that when the B service returns from the request of A, it can find the corresponding according to the session co, so you can restart co

1 -- lualib/skynet.lua
2 function suspend(co, result, command, param, size)
3     ...
4     if command == "CALL" then
5         session_id_coroutine[param] = co
6     ...
7 end

When A receives the return message from B, it calls the message distribution api, finds the corresponding co (that is, the main coroutine co1) according to the session, and restarts it from the last pause point. The following line of code yield will return immediately and print out the return of B The result of print(...)(A.lua), at this time, after executing the entire process of co1, return true and "EXIT" to suspend, and do some cleanup work on co1.

local succ, msg, sz = coroutine_yield("CALL", session)

Change A.lua a bit. In the process of co1 executing closure f, a coroutine (called co2) is created by fork. Since co1 is not suspended, the entire process will always be executed. At this time, co2 is not executed. 

1 -- A.lua
2 local skynet = require "skynet"
3 
4 skynet.start(function()
5     skynet.fork(function()
6         print(skynet.call("B", "lua", "aaa"))
7     end)
8 end)
1 -- lualib/skynet.lua
2 function skynet.fork(func,...)
3     local args = table.pack(...)
4     local co = co_create(function()
5         func(table.unpack(args,1,args.n))
6     end)
7     table.insert(fork_queue, co)
8     return co
9 end

The second thing the message distribution api does is to process the co in fork_queue. So the second thing to do after receiving the message sent back by the timer is to restart co2, and then pause co2 after sending a message to the B service, and then restart co2 again when B returns.

1 -- lualib/skynet.lua
2 function skynet.dispatch_message(...)
3     ...    
4     local fork_succ, fork_err = pcall(suspend,co,coroutine_resume(co))
5     ...
6 end

For service B:

 After receiving the message of service A, call the message distribution api to create a co (line 12), the closure f to be executed by co is the registered message callback function p.dispatch (line 4), and then restart it through resume ( Line 15)

 1 -- lualib/skynet.lua
 2 local function raw_dispatch_message(prototype, msg, sz, session, source)
 3     ...    
 4     local f = p.dispatch
 5     if f then
 6         local ref = watching_service[source]
 7         if ref then
 8             watching_service[source] = ref + 1
 9         else
10             watching_service[source] = 1
11         end
12             local co = co_create(f)
13        session_coroutine_id[co] = session
14             session_coroutine_address[co] = source
15             suspend(co, coroutine_resume(co, session,source, p.unpack(msg,sz)))
16     ...
17 end

Execute skynet.ret(skynet.pack("OK")), call yield to suspend it (line 4), the most recent resume returns, line 15 above becomes suspend(co, true, "RETURN", msg, sz)

1 -- lualib/skynet.lua
2 function skynet.ret(msg, sz)
3     msg = msg or ""
4     return coroutine_yield("RETURN", msg, sz)
5 end

 When command=="RETURN", do two things: 1. Send a return message to the source address (ie A service) (line 5); 2. Restart co (line 7), and co returns from skynet.ret, Then the message callback function (p.dispatch) of the B service is executed, and all the closure f of co is executed and put into the cache, returning true and "EXIT" to suspend

1 -- lualib/skynet.lua
2 function suspend(co, result, command, param, size) 
3     ...     
4     elseif command == "RETURN" then
5         ret = c.send(co_address, skynet.PTYPE_RESPONSE, co_session, param, size) ~= nil
6         ...
7         return suspend(co, coroutine_resume(co, ret))
8     ...
9 end

So far, it is the whole process of Lua layer message processing.

4. Exception handling

In some cases, exception handling is required, such as not registering the protocol corresponding to the message type, not providing a message callback function, and an error occurred during the execution of co. When an exception occurs in the process of a service processing a message, two things must be done: 1. Abnormally terminate the current co; 2. Notify the sender of the message, instead of keeping the other party busy waiting.

When an error occurs during the execution of co, the first return value of resume is false, suspend is called, and a PTYPE_ERROR type message is sent to the other party (line 9), and then an exception is thrown to terminate the current co (line 14).

 1 -- lualib/skynet.lua
 2 function suspend(co, result, command, param, size)
 3     if not result then
 4         local session = session_coroutine_id[co]
 5         if session then -- coroutine may fork by others (session is nil)
 6             local addr = session_coroutine_address[co]
 7             if session ~= 0 then
 8                 -- only call response error
 9                 c.send(addr, skynet.PTYPE_ERROR, session, "")
10             end
11             session_coroutine_id[co] = nil
12             session_coroutine_address[co] = nil
13         end
14         error(debug.traceback(co,tostring(command)))
15     end
16     ...
17 end

In most abnormal situations, a PTYPE_ERROR type message will be sent to the other party to notify the other party. When a PYTPE_ERROR type message is received, _error_dispatch will be called, error_source will be recorded in dead_service, and error_session will be recorded in error_queue

 1 -- lualib/skynet.lua
 2 local function _error_dispatch(error_session, error_source)
 3     if error_session == 0 then
 4         -- service is down
 5         --  Don't remove from watching_service , because user may call dead service
 6         if watching_service[error_source] then
 7              dead_service[error_source] = true
 8         end
 9         for session, srv in pairs(watching_session) do
10             if srv == error_source then
11                 table.insert(error_queue, session)
12             end
13         end
14     else
15         -- capture an error for error_session
16         if watching_session[error_session] then
17             table.insert(error_queue, error_session)
18         end
19     end
20 end

At the end of suspend, dispatch_error_queue is called to process error_queue, the waiting co is found through the session, and then it is forcibly terminated to ensure that the co will not be busy waiting all the time.

1 -- lualib/skynet.lua
2 local function dispatch_error_queue()
3     local session = table.remove(error_queue,1)
4     if session then
5         local co = session_id_coroutine[session]
6         session_id_coroutine[session] = nil
7         return suspend(co, coroutine_resume(co, false))
8     end
9 end

5. Summary

The flow of a synchronized rpc request is as follows. When the current co of a service is suspended, the processes of other cos in the service can be executed. N cos can be cross-executed. The suspension of one co will not affect the execution of other cos, maximizing the provision of computing power and achieving high concurrency.

 

 

Guess you like

Origin blog.csdn.net/Linuxhus/article/details/111669559