30 solutions to common problems with single-chip microcomputers! Normal people I don't tell them

Click on "Uncle Wheat" above and select "Top/Star Public Account"

Welfare dry goods, delivered as soon as possible

Hello everyone, my name is Wheat.

In the normal project development process, various problems will be encountered. The following article will share some ideas and solutions for common problems.

1. Problem recurrence

Only by stably reproducing the problem can the problem be correctly located, solved and verified. In general, the easier the reproducible problem is, the easier it is to solve.

1.1 Simulation reproduction conditions

Some problems exist under specific conditions and can be reproduced only by simulating the conditions under which the problem occurs. For conditions that depend on external input, if the conditions are complex and difficult to simulate, you can consider the preset in the program to directly enter the corresponding state.

1.2 Increase the frequency of execution of related tasks

For example, an exception occurs when a task runs for a long time, and the execution frequency of the task can be increased.

1.3 Increase the test sample size

If the program is abnormal after running for a long time, the problem is difficult to reproduce. You can build a test environment with multiple sets of equipment to test at the same time.

2. Problem location

Narrow the scope of the investigation to identify the task, function, and statement that introduced the problem.

2.1 Print LOG

According to the phenomenon of the problem, add LOG output at the code in question, so as to track the execution flow of the program and the values ​​of key variables, and observe whether it is in line with expectations.

2.2 Online debugging

Online debugging can play a similar role as printing LOG. In addition, this method is especially suitable for troubleshooting bugs such as program crashes. When the program falls into an abnormal interrupt (HardFault, watchdog interrupt, etc.), you can directly STOP to view the call stack and kernel registers. value to quickly locate the problem point.

2.3 version rollback

When using version management tools, you can continuously roll back versions and test and verify to locate the version that introduced the problem for the first time, and then you can check the code added and changed in this version.

2.4 Dichotomous Notes

二分注释即Comment out part of the code in a manner similar to binary search to determine whether the problem is caused by the commented out part of the code.

The specific method is to comment out half of the code that is irrelevant to the problem, see if the problem is solved, comment the other half if it is not solved, continue to reduce the scope of the comment by half, and so on to gradually reduce the scope of the problem.

2.5 Save kernel register snapshot

When the Cortex M core is caught in an abnormal interrupt, it will push the values ​​of several core registers onto the stack, as shown in the following figure:

82fef47af66702e0960d8743439222d3.png

We can write the kernel register value on the stack into the area where the default value is retained after a period of reset when we are caught in an abnormal interrupt, and then read out and analyze the information from the RAM after the reset operation, and confirm the execution at that time through PC and LR. The function of , through R0-R3 to analyze whether the variable processed at that time is abnormal, through SP to analyze whether a stack overflow may occur, etc.

3. Problem analysis and handling

Combine the problem phenomenon and the location of the problem code to analyze the cause of the problem.

3.1 The program continues to run

3.1.1 Numerical exception

3.1.1.1 Software problems

1. Array out of bounds

When writing the array, the subscript exceeds the length of the array, causing the content of the corresponding address to be modified. as follows:

55f53c43546b4ae75119b2b06e7f3210.png

Such problems usually need to be analyzed in combination with the map file. Observe the array near the address of the tampered variable through the map file, check whether there is an unsafe code as shown in the above figure for the write operation to the array, and modify it to a safe code.

2, stack overflow

0x20001ff8 g_val
0x20002000 bottom of stack
………… stack space
0x20002200 top of stack

As shown in the figure above, such problems also need to be analyzed in conjunction with the map file. Assuming that the stack grows from a high address to a low address, if a stack overflow occurs, the value of g_val will be overwritten by the value on the stack.

When a stack overflow occurs, the maximum usage of the stack should be analyzed. There are too many function call layers, function calls in the interrupt service function, and large temporary variables declared inside the function, which may cause stack overflow.

There are the following ways to solve such problems:

  • In the design stage, memory resources should be allocated reasonably, and an appropriate size should be set for the stack;

  • Convert the larger temporary variable in the function to a static variable by adding the "static" keyword, or use malloc() to dynamically allocate it and put it on the heap;

  • Change the function calling method and reduce the number of calling layers.

3. Judging the condition of the sentence is wrong

387c07aeb092e3e981d6337ab648cfe7.png

The condition of the judgment statement is easy to write the equality operator "==" as the assignment operator "=", which will cause the value of the variable being judged to be changed. This type of error will not be reported at compile time and will always return true.

It is recommended to write the variable to be judged to the right side of the operator, so that an error will be reported at compile time if it is written as an assignment operator. You can also use some static code inspection tools to find such problems.

4. Synchronization problem

For example, when the queue is operated, an interrupt (task switching) occurs during the execution of the dequeue operation, and the queue structure may be destroyed when the enqueue operation is performed in the interrupt (the task after the switch). mutex synchronization).

5. Optimization problem

2a0349ec19617e697b6b491cb7e6ea8a.png

As shown in the above program, the original intention is to not execute the foo() function after waiting for the irq interrupt, but after being optimized by the compiler, flg may be loaded into the register during the actual operation and the value in the register is judged every time without reloading from the ram. Reading the value of flg causes foo() to keep running even if the irq interrupt occurs. Here, you need to add the "volatile" keyword before the declaration of flg to force the value of flg to be obtained from ram every time.

3.1.1.2 Hardware Problems

1. Chip BUG

There is a bug in the chip itself. In some specific cases, an incorrect value is returned to the microcontroller. The program needs to judge the read back value and filter out the abnormal value.

2. Communication timing error

fa073f5eb7cebb392c2bb72f805fbca6.png

For example, the power management chip Isl78600, assuming that two chips are cascaded now, when reading the voltage sampling data of the two chips at the same time, the high-end chip will transmit the data to the low-end chip through the daisy chain at a fixed cycle, and there is only one cache on the low-end chip. Area.

If the microcontroller does not read the data on the low-end chip within the specified time, the new data will overwrite the current data when it arrives, resulting in data loss. Such problems require careful analysis of the chip's data sheet to strictly meet the timing requirements of chip communication.

3.1.2 Abnormal action

3.1.2.1 Software problems

1. Design issues

There are errors or omissions in the design, and the design documents need to be re-evaluated.

2. The implementation does not match the design

If the implementation of the code does not match the design document, it is necessary to add unit tests to cover all conditional branches and conduct code cross-review.

3. The state variable is abnormal

For example, the variable that records the current state of the state machine is tampered with, and the method for analyzing this type of problem is the same as the previous section on numerical anomalies.

3.1.2.2 Hardware Problems

1. Hardware failure

The target IC fails and does not act after receiving the control command, and the hardware needs to be checked.

2. Communication abnormality

If the communication with the target IC is wrong, and the control command cannot be executed correctly, it is necessary to use an oscilloscope or a logic analyzer to observe the communication sequence and analyze whether the signal sent is incorrect or subject to external interference.

3.2 Program crashes

3.2.1 Stop running

3.2.1.1 Software problems

1、HardFault

The following conditions can cause a HardFault:

  • operate the peripheral's registers when the peripheral clock gate is not enabled;

  • The jump function address is out of bounds, which usually occurs when the function pointer is tampered with. The troubleshooting method is the same as the numerical exception;

  • Alignment issues when dereferencing pointers:

Taking little endian as an example, if we declare a structure that enforces alignment as follows:

89fa9abf5345c46c24695969c69849ca.png

address 0x00000000 0x00000001 0x00000002 0x00000003
variable name Val0 Val1_low Val1_high Val2
value 0x12 0x56 0x34 0x78

At this time, the address of a.val1 is 0x00000001. If you dereference this address with uint16_t type, it will enter HardFault due to alignment problems. If you must use pointers to manipulate the variable, you should use memcpy().

2. The interrupt flag is not cleared in the interrupt service function

The interrupt service function does not clear the interrupt flag correctly before exiting. When the program execution exits from the interrupt service function, it will immediately enter the interrupt service function, showing the phenomenon of "suspended death" of the program.

3. NMI interrupt

When debugging, I encountered the MISO pin of SPI multiplexing the NMI function. When the peripheral connected through SPI is damaged, MISO is pulled high, which causes the microcontroller to directly enter the NMI interrupt before the NMI pin is configured as the SPI function after reset. Hanging on NMI interrupt. In this case, the NMI function can be disabled in the NMI's interrupt service function to make it exit the NMI interrupt.

3.2.1.2 Hardware Problems

1. The crystal oscillator does not start to vibrate

2. Insufficient supply voltage

3. Pull the reset pin low

3.2.2 Reset

3.2.2.1 Software problems

1. Watchdog reset

In addition to the reset caused by the dog feeding timeout, pay attention to the special requirements of the watchdog configuration. Taking the Freescale KEA microcontroller as an example, the watchdog of the microcontroller needs to perform an unlock sequence during configuration (write two different value), the unlock sequence must be completed within 16 bus clocks, timeout will cause the watchdog reset. This kind of problem can only be familiar with the data sheet of the microcontroller, and pay attention to similar details.

3.2.2.2 Hardware Problems

1. The power supply voltage is unstable

2. Insufficient power load capacity

4. Regression testing

After the problem is solved, a regression test needs to be carried out, on the one hand to confirm whether the problem does not recur, and on the other hand to confirm that the modification will not introduce other problems.

V. Experience Summary

Summarize the cause of this problem and the method to solve the problem, think about how to prevent similar problems in the future, and whether it is worth learning from the same platform products, so as to draw inferences from one case and learn from failures.

Original text: https://www.cnblogs.com/jozochen/p/8541714.html

—— The End ——

Recommended in the past

Musk's brain-computer interface can be made with a Raspberry Pi?

3 open source libraries commonly used by experts, making MCU development more efficient

Seemingly simple code, but hidden secrets...

How to prevent cracking? MCU encryption technology revealed

The university in the header file asks that the C language needs to pay attention to these principles...

Click on the card above to follow me

d81ad6975864b56ab5bf489fe621b99d.png

Everything you ordered looks good , I take it seriously as I like it

Guess you like

Origin blog.csdn.net/u010632165/article/details/123244240