HaaS100 development and debugging series using AliOS Things to diagnose and debug components to locate bugs

1. Background

In embedded development, a problem we often encounter is that if we write code carelessly, we create a bug. Everyone is well aware of the power of bugs in the C language-you can directly hang the system!

That is, the common system crashes, system restarts, etc.; and the source or root cause of the problem often makes us helpless, so we have to use the "printing" method and add printf over and over again.

And every time I change the code, I have to go through the painful process of "compile-flash-run-reproduce", unknowingly, one day has passed and the bug has not been solved yet.

 

So we often think that if the system can directly tell us where the bug is and what the error is, it will be fine, and the code can be changed directly in minutes, which can save a lot of development time!

This is also the reason why we often use some emulators (such as JLINK). After the system hangs up, we can hang up the emulator, check where the PC is, and check the call stack through bt to help us locate.

And this has certain requirements for the hardware-to be able to support the emulator connection, and sometimes developers have to toss the environment.

Although our HaaS100 also supports hardware connection emulators (refer to the previous post: HaaS100 Development and Debugging Series How to Use J-Link Emulator to Debug Code ).

Here we tell you a more convenient method to locate abnormal system crashes- the diagnostic and debugging component of AliOS Things .

 

2. Introduction to diagnostic and debugging components

Diagnosis and debugging contains a lot of content. We have introduced some debugging commands before, see the article " An Easy Introduction to HaaS100 Diagnosis and Debugging System ".

In this article, we focus on how the diagnostic and debugging components of AliOS Things help solve code bugs .

 

Diagnosis and debugging components can shorten bug locating time.

If a bug appears and causes a system exception, the user can quickly find the cause of the bug without connecting the emulator, printing, or opening gdb for single-step debugging.

Or help users point out possible abnormal points, and then fix them to save development time.

 

for example:

  • Illegal memory is accessed in the code (for example, data is written at an unwritable address, such as address 0), which causes the system to crash. The AliOS Things diagnostic and debugging component can record the pc value when accessing the illegal memory and tell the user that it is hanging. Which line
  • The code ran away (pc=0), the AliOS Things diagnostic and debugging component recorded the function call stack, and according to the stack backtracking, the function call process of A->B->C can be found, which is similar to the bt command in the emulator;
  • Malloc failed during user memory application. The AliOS Things diagnostic and debugging component can record how much memory the user has applied for at this time, how much memory is available for application in the system at this time, in which task the user applied for memory, and memory application from the start of the system Situation and other information, which can help developers locate whether there is a memory leak.
  • ......

The diagnostic and debugging components of AliOS Things can do a lot of things, and we will introduce them in future articles.

Today we are only looking at one problem-a bug occurs and the system hangs abnormally, so what will AliOS Things do?

To answer in one sentence, output important logs to help you locate the problem, which is also the most important part of the diagnostic and debugging component of AliOS Things.

 

3. What is the abnormal log of AliOS Things?

Directly upload the log output by HaaS100

!!!!!!!!!! Exception  !!!!!!!!!!
========== Regs info  ==========  异常现场寄存器信息
R0      0x00000000
R1      0x34027F20
R2      0x34027F30
R3      0x340251B4
R4      0xFFFFFFFF
R5      0x00000000
R6      0x2C0D2C72
R7      0x00000001
R8      0x2C0D2C86
R9      0x2C0D236B
R10     0x00000000
R11     0x00000000
R12     0x0000C000
LR      0x1C5D6CC3
PC      0x1C5D6CC2
xPSR    0x61000000
SP      0x34025118
EXC_RET 0xFFFFFFBC
EXC_NUM 0x00000006
PRIMASK 0x00000000
FLTMASK 0x00000000
BASEPRI 0x00000000
CFSR    0x01000000
HFSR    0x00000000
MMFAR   0xE000ED34
BFAR    0xE000ED38
AFSR    0x00000000
========== Stack info ==========  异常现场栈信息
stack(0x34025118): 0x34027D20 0x340251B4 0x00000000 0x34022F98 
stack(0x34025128): 0x00000000 0x34682380 0x1C5D6BBD 0x00000005 
stack(0x34025138): 0x00000006 0x1C5D621F 0x00000003 0x2C0D2A0C 
stack(0x34025148): 0x340230C4 0x00000013 0x34682280 0x00000000 
stack(0x34025158): 0x000000F7 0x00000005 0x00000003 0x00000000 
stack(0x34025168): 0x00000000 0x00000000 0x00000001 0x34682380 
stack(0x34025178): 0x34022F98 0x0000000B 0x34022FA8 0x00000000 
stack(0x34025188): 0x000000F6 0x00000001 0x2C0D294D 0x1C5D63D3 
stack(0x34025198): 0x00000000 0x2C0D1BD0 0x00000000 0x0D000000 
stack(0x340251A8): 0x78300070 0x66666666 0x66666666 0x00003100 
stack(0x340251B8): 0x00000000 0x00000000 0x00000000 0x00000000 
stack(0x340251C8): 0x00000000 0x00000000 0x00000000 0x00000000 
stack(0x340251D8): 0x00000000 0x00000000 0x00000000 0x00000000 
stack(0x340251E8): 0x00000000 0x00000000 0x00000000 0x00000000 
stack(0x340251F8): 0x00000000 0x00000000 0x00000000 0x00000000 
stack(0x34025208): 0x00000000 0x00000000 0x00000000 0x00000000 
========== Call stack ==========  栈回溯信息,可以得出函数调用过程
backtrace : 0x1C5D6CC2 
backtrace : 0x1C5D621C 
backtrace : 0x1C5D63CE 
backtrace : ^task entry^
========== Heap Info  ==========  系统此时的内存信息,可以看出内存申请了多少,还剩多少
---------------------------------------------------------------------------
[HEAP]| TotalSz    | FreeSz     | UsedSz     | MinFreeSz  | MaxFreeBlkSz  |
      | 0x00680000 | 0x0065A300 | 0x00025D00 | 0x00659E20 | 0x0065A300    |
---------------------------------------------------------------------------
========== Task Info  ==========   系统当前任务状态信息,可以看出任务栈是否过小
--------------------------------------------------------------------------
TaskName             State    Prio       Stack      StackSize (MinFree)
--------------------------------------------------------------------------
dyn_mem_proc_task    PEND     0x00000006 0x2004B938 0x00000400(0x0000035C)
idle_task            RDY      0x0000003D 0x2004BE0C 0x00001000(0x00000F94)
DEFAULT-WORKQUEUE    PEND     0x00000014 0x2004F1E8 0x00000C00(0x00000B7C)
timer_task           PEND     0x00000005 0x2004D0D8 0x00002000(0x00001F48)
main                 SLP      0x00000021 0x2015A000 0x00005000(0x000044C4)
transq_msg           PEND     0x0000001F 0x3469A4C4 0x00001000(0x00000680)
apps_recover         SLP      0x00000021 0x2004A588 0x00001000(0x00000F64)
temp_main            SLP      0x00000021 0x346A15D8 0x00001000(0x00000F80)
main_task            SLP      0x00000020 0x34002668 0x00020000(0x0001F6C4)
cli                  RDY      0x0000003C 0x340232D0 0x00002000(0x0000180C)
ulog                 PEND     0x0000003C 0x34026890 0x00000C00(0x00000A58)
========== Queue Info ==========   AliOS Things kernel queue使用信息
-------------------------------------------------------
QueAddr    TotalSize  PeakNum    CurrNum    TaskWaiting
-------------------------------------------------------

======== Buf Queue Info ========  AliOS Things kernel buf queue使用信息
------------------------------------------------------------------
BufQueAddr TotalSize  PeakNum    CurrNum    MinFreeSz  TaskWaiting
------------------------------------------------------------------
0x2004FDE8 0x000001E0 0x00000000 0x00000000 0x000001E0 timer_task          
0x34025420 0x00001400 0x00000000 0x00000000 0x00001400 ulog

=========== Sem Info ===========  AliOS Things kernel semphore使用信息
--------------------------------------------
SemAddr    Count      PeakCount  TaskWaiting
--------------------------------------------
0x2004CF60 0x00000000 0x00000000 dyn_mem_proc_task   
0x2004F1B8 0x00000000 0x00000000 DEFAULT-WORKQUEUE   
0x340023A0 0x00000001 0x00000001                     
0x34002478 0x00000000 0x00000000                     
0x340025D0 0x00000001 0x00000001                     
0x34682C34 0x00000000 0x00000000                     
0x34682C58 0x00000000 0x00000000                     
0x340275B0 0x00000000 0x00000000                     
!!!!!!!!!! dump end   !!!!!!!!!!

3.1 Log analysis

The above log is the log output by AliOS Things after a system exception occurred on HaaS100. Log can be divided into:

  • Exception field register: general register and some special register information related to arch;
  • Exception stack information: stack information of the task that generated the exception;
  • Stack traceback information: the call stack that generated the exception, similar to the bt command in the emulator, this is the most important part of the exception log;
  • Memory information: The memory status of the system at this time is useful for locating some memory leaks;
  • Task information: The current task status information of the system is more useful for locating the problem of task stack overflow;
  • Kernel information: Contains the queue, buf_queue and sem status in the kernel.

 

The memory shown in the log contains a lot of kernel-related content, and we will also launch an article to introduce the kernel of AliOS Things in the future.

 

3.2. How to open the diagnostic debugging component

Users only need to include the debug component in aos.mk, recompile, burn and power on.

$(NAME)_COMPONENTS += debug

 

3.3 How to generate a system exception

In theory, after any system exception, a log similar to the above will appear. If the developer is interested in generating system exceptions, the following simple methods can be used:

m 0xffffffff 1

Even using the cli command provided by the system, rewrite the memory value of the system at 0xfffffff to 1, and address 0xfffffff is an unwritable area on HaaS100. Rewriting this value can trigger a system exception and print the above log.

For the method of using the cli command, please refer to another article, " An Easy Entry to the HaaS100 Diagnostic and Debugging System "

 

3.4, the value of the call stack

The information output of the call stack is the core of the AliOS Things diagnostic and debugging component. After we generate an exception through the above command, we use the arm-none-eabi-addr2line command that comes with the toolchain to parse the address in the call stack call stack in the above log. , The method of use is:

 arm-none-eabi-addr2line -pfiCe xxx.elf addr

Take the call stack address output in the log as an example :

./build/compiler/gcc-arm-none-eabi/Linux64/bin/arm-none-eabi-addr2line -pfiCe out/debug_demo@haas100/binary/[email protected] 0x1C5D6CC2 0x1C5D621C 0x1C5D63CE

The code location corresponding to the call stack can be parsed, such as:

pmem_cmd at /workspace/hass/AliOS-Things/core/cli/cli_default_command.c:224

proc_onecmd at /workspace/hass/AliOS-Things/core/cli/cli.c:173
 (inlined by) cli_handle_input at /workspace/hass/AliOS-Things/core/cli/cli.c:290

cli_main at /workspace/hass/AliOS-Things/core/cli/cli.c:781

We can clearly see the abnormal function call process, and point out the path and line number of the function code.

cli_main  -- >  proc_onecmd ---> pmem_cmd

 

4. The author's words

Do you think that this method is used to locate the bug so that the location of the exception is clear at a glance. We quickly found this line of code and modified it to solve the bug in minutes. You can continue working happily again!

However, the AliOS Things diagnostic and debugging component is just to help you save time to solve bugs as much as possible, and the occurrence of some bugs will not cause system abnormalities, but will bury the system instability. At this time, no matter how good the diagnostic tool is useless.

You still have to practice more code internal skills, not to produce bugs is our pursuit!

 

5. Developer technical support

If you need more technical support, you can join the Dingding Developer Group

For more technology and solution introduction, please visit the Aliyun AIoT homepage https://iot.aliyun.com/

Guess you like

Origin blog.csdn.net/HaaSTech/article/details/111603128