IoT mass device heartbeat registration, off-line clearing-multi-threaded highly concurrent mutex lock landing

 

1. Application Background

In the Internet of Things application scenario, it is necessary to maintain the connection of many devices, such as a long connection based on TCP socket communication, in order to obtain the information collected by the device, and control the digital switch or analog of the device in reverse. We put these long TCP connections into the ConcurrentDictionary activation dictionary table based on thread safety, with the IP address as the key and the device box domain model as the value. We need to maintain the dictionary table of the activated device box, and we need to call the device that has no heartbeat when it times out, we can call it the offline device, clean up the activated dictionary table, and write it into the offline dictionary table. When the off-network device has a heartbeat next time, it can be moved to the activation dictionary table again to generate a recovery alarm and perform a series of other actions.

2. Overall framework

2.1. Heartbeat registration framework

2.1.1. Massive equipment

Because we want to simulate the TCP scenario of massive devices, we used the simulator to generate 12,000 simulated devices. 8 real devices.

2.1.2. Handler process for reporting heartbeat

The detailed heartbeat reporting process is detailed in the above frame diagram

Suddenly found that I can write a series of Internet of things collection system, organize a directory. I hope to stick to it.

2.2. Offline cleaning framework

2.2.1. Activate dictionary table to clean off-line equipment

The principle is very simple. Iterate over the detection period set in the dictionary table, filter to the IEnumerable of a dictionary, and delete the corresponding timeout key (in this case, the IP address) in the activated dictionary table. Of course, the _internal cycle here can be * N, multiple cycles can be set in the configuration file, the configuration file is as follows:

 "ipboxNumStaticInternal": 12
    public static void DeleteDeadBoxFromActiveBox(in _internal) { { var outTime = DateTime.Now.AddSeconds(-_internal); var iboxTimeOutList = iboxActiveDictionary.Where(q => (outTime > q.Value.UpdateTime));//.Select(x=> iboxActiveDictionary[x.Key]) ; foreach (var item in iboxTimeOutList) { iboxActiveDictionary.Remove(item.Key); } } } 

2.2.2. Flowchart of offline cleaning

Here, a system timer is mainly started, and the method of cleaning off-network devices will be actively called, and the calling interval is ipboxNumStaticInternal. code show as below:

    public void systemTimerStart() { var interval = ReadTheInternalFromSetting(); _systemTimer = new Timer(state => { IBoxActiveDicManager.DeleteDeadBoxFromActiveBo(_internal); Console.WriteLine("{1},激活设备数量:{0}\n",IBoxActiveDicManager.iboxActiveDictionary.Count,DateTime.Now); }, null, interval, interval); Console.WriteLine("PemsCom采集系统时钟已经开启"); LoggerHelper.Info("PemsCom采集系统时钟已经开启"); } /// <summary> /// 配置文件读入时间间隔方法 /// </summary> /// <returns></returns> private int ReadTheInternalFromSetting() { _internal = int.Parse(Appsettings.app(new string[] {"ipboxNumStaticInternal" })); Console.WriteLine("PemsCom采集系统时钟配置参数已经读"); LoggerHelper.Info("PemsCom采集系统时钟配置参数已经读"); return Convert.ToInt32(TimeSpan.FromSecond(_internal).TotalMilliseconds); } 

3. Multithreading and high concurrency instructions

3.1. Multithreading instructions

There will be many threads for the CPU to execute in rounds, such as:

  • 12008 Receive events trigger threads;
  • Clear off-line device thread regularly;
  • Main thread, monitor command line input, and execute corresponding commands;

To give a practical example, take the picture as proof

For 12008 devices, the peak-to-peak value of the received network packets per second is 9218 packets, that is, in a certain second, the CPU executed 9218 threads in total. For example, if it is a dual-core 4 thread, 9218/4 = 2304.5. That is, the CPU performs 2305 rounds in one second. That is, 0.43 milliseconds will be executed once.

3.2. High concurrency description

Actually 3.1 has explained high concurrency. In a certain second, there are nearly 10,000 receiving events to be processed. The execution order at this moment is unordered. With so many threads in 9218, we don't know which to execute first and which to execute later. If you don't think about adding some logic control, such as the mutex lock we are going to introduce today, there will be some abnormal phenomena.

4. Abnormal phenomena caused by high multi-thread concurrency

Only the phenomenon is described here, the reason will be described in the following 5. Analyze the cause of the abnormality.

4.1. Null reference

The location of the exception: the heartbeat processing class is as follows.

    public class HeartHandler
    {
        static string _deviceIndex = Appsettings.app(new string[] { "DeviceIndex" }); private static IBoxActive iboxActive; public static void Register(TcpHeartPacket heartPacket,int sessId) { UInt32 IP; UInt64 mac; if (_deviceIndex == "IP") { IP =(UInt32)BitConverter.ToUInt32(heartPacket.IP, 0); if (IBoxActiveDicManager.GetBoxActive(IP, out iboxActive) != true) { IBoxActiveDicManager.iboxActiveDictionary.TryAdd(IP, iboxActive); iboxActive.SessID = sessId; } } else { mac = (UInt64)BitConverter.ToUInt64(heartPacket.Mac, 0); if (IBoxActiveDicManager.GetBoxActive(mac, out iboxActive) != true) { IBoxActiveDicManager.iboxActiveDictionary.TryAdd(mac, iboxActive); iboxActive.SessID = sessId; } } //引用类型,智能指针,使用方便 iboxActive.UpdateTime = DateTime.Now; } } 

4.2. Unsuccessful assignment of elements in dictionary table

        /// <summary>
        /// 查询激活设备箱字典中是否有存在上报的设备箱, /// 存在返回true,不存在返回false,并且新建好设备箱模型 /// </summary> /// <param name="mac"></param> /// <param name="iboxActive"></param> /// <returns></returns> public static bool GetBoxActive(UInt32 IP, out IBoxActive iboxActive) { if (iboxActiveDictionary.TryGetValue(IP, outiboxActive)) { return true; } iboxActive = new IBoxActive(); iboxActive.IP = IP; if (iboxActive.IP != IP) { LoggerHelper.Error(string.Format("实例化赋值不成功.iboxActive.IP:{0};IP{1}", iboxActive.IP, IP)); } return false; } 

Does it feel weird? The previous sentence is all assigned, and the next sentence is not equal. However, this is possible in multi-threaded concurrency, which will be analyzed in detail below.

4.3. The total number of statistical devices is incorrect

Because it was easy to make mistakes during the concurrency of 12008 National Taiwan University, it was changed to 1,000. The following statistics will cause errors, which is also due to errors caused by high multi-thread concurrency.

5. Analysis of abnormal causes

5.1. Causes of null references

In fact, the three reasons of the fourth point are all caused by the same reason, so we will elaborate in 5.1, and 5.2, and 5.3 will only be briefly explained. Knock down the blackboard here and analyze the abnormal problem of multi-thread high concurrency. The characteristic of program operation is to insert it at the seam. Just like an old driver, it is summarized as the disorder between threads. For example, when our device heartbeat thread is updating the device heartbeat time. The off-line cleaning thread will clean up the device. As a result, time cannot be assigned to an empty object (which has been cleaned up by an offline thread). Therefore, only the null reference exception can be reported, right, it is so simple, it took me a long time to debug and think about this exception.

5.2. Reasons for unsuccessful device IP assignment

Similarly, after the device instance is created, the IP assignment is completed, and the device thread is cleared just off the network to clear the device. When comparing, the original address is referenced. The original address of the dictionary has stored the IP of other device boxes, so the IP address not equal.

5.3. Reasons for incorrect statistics of the total number of equipment

The reason is actually 5.2. If you can't register successfully, of course the number is wrong.

6. Solutions

That is, when I create an activated device instance (the first heartbeat registration) or update the heartbeat time (not the first registration), don't let the unordered offline removal thread run. Knocking on the blackboard: It is to ensure the atomicity of the heartbeat processing registration process. Yes, in fact, here is very similar to the transaction of the relational database, atomicity. Atomicity is a powerful weapon against abnormalities caused by program disorder. We can add a mutual exclusion lock on the registered heartbeat processing method to let the compiler arrange a more reasonable execution order with the runtime.

7. Code implementation

The code is simple.

    //定义一把锁
    public static Mutex activeIpboxDicMutex = new Mutex();
    //设备箱注册加锁。异常全部消除 IBoxActiveDicManager.activeIpboxDicMWaitOne(); HeartHandler.Register(tcpHeartPacsessionId); IBoxActiveDicManager.activeIpboxDicMReleaseMutex(); 

It is very similar to insert the use of transactions here. Adding our main business to the middle, the analogy is convenient for everyone to understand and remember. It's like a sandwich cookie.

            unitOfWork.BeginTransaction();

            // Adds new device
            unitOfWork.DeviceRepository.Add(device); // Commit transaction unitOfWork.Commit(); 

Of course, it is also possible to lock the device box off-line removal thread.

     IBoxActiveDicManager.activeIpboxDicMutex.WaitOne();
     IBoxActiveDicManager.DeleteDeadBoxFromActiveBox(_internal); IBoxActiveDicManager.activeIpboxDicMutex.ReleaseMutex(); 

Considering that the off-line clearing thread will consume part of the performance, I also tested the situation of removing the lock, and there will be no 3rd exception of the fourth, so far all the problems have been solved.

8. Summary

  • The small number of analog devices cannot detect this problem, so the importance of mass devices is seen, because the above three problems will definitely occur in reality, and they are all very serious and fatal problems. Good test methods can kill the problem in the cradle;

  • When multi-threading and high concurrency are prone to all kinds of abnormalities, we must think in awe and solve problems;

Guess you like

Origin www.cnblogs.com/guoguo251/p/12709946.html