rockchip rk3566 debugging notes

Pay attention to the following settings before using rockchip to compile.
Please note that you need to set environment variables before using the one-key compile command, and select the platform you need to compile. For example:
source build/envsetup.sh
lunch rk3566_rgo-userdebug
========= ==================================================== =
make installclean -j24;make -j24
rm rockdev/Image-tab10_rk66/*;./mkimage.sh;rm RKTools/linux/Linux_Pack_Firmware/rockdev/Image/*;cp rockdev/Image-tab10_rk66/* RKTools/linux/Linux_Pack_Firmware /rockdev/Image/


Network test tool
ifconfig eth0 192.168.9.132 netmask 255.255.255.0
iperf -s -i 10
iperf -c 192.168.1.133 -t 50

/************display debugging related commands************/
//By printing the current display frame rate, judge whether the playback is abnormal
setprop debug.sf.fps 1
logcat -c ;logcat | grep mFps
//Check whether the synthesis strategy is normal through SurfaceFlinger Services
dumpsys SurfaceFlinger
//If not normal, print the HWC log to check the abnormal reason
adb shell "setprop sys.hwc.log 511"
adb shell "logcat - c ;logcat" > hwc.log
/****************************************** /  

cat /sys/fs/pstore/console-ramoops-0  
prints out the device information before the last system reset. In case of abnormal copying or abnormal power failure, you can use this command to print out the log of the last system running status

/****************debugging disassembly command************/
Crash log
We construct this error and add the following in the appropriate position of the HWC code Code:
struct test_t{ int a = 0; int b = 0; int c = 0; void add(){return;}; }; struct test_t *test_a; //construct test_a test_a = NULL; //set to NULL test_a ->c = 1; //Access members of NULL pointer











Use the addr2line command to decompile to the on-site error location: //000000000004195c Crash first address
addr2line -e $OUT/symbols/system/lib64/hw/hwcomposer.rk30board.so 000000000004195c
to get the error address

Let's further analyze the problem. The following command can be used to disassemble and output the corresponding assembly source code:
prebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/bin/aarch64-linux-android-objdu
mp -S -D $OUTsymbols/system/lib64/hw/hwcomposer.rk30board.so > hwcomposer.dump
In the output file hwcomposer.dump query 4195c stack print address location

/****************************Spinlock****************** *****/
Type of interface API Definition in spinlock Definition of raw_spinlock Define
spin lock and initialize DEFINE_SPINLOCK DEFINE_RAW_SPINLOCK
Dynamic initialization of spin lock spin_lock_init raw_spin_lock_init
Get the specified spin lock spin_lock raw_spin_lock
Get the specified spin lock and disable the CPU interrupt at the same time spin_lock_irq raw_spin_ lock_irq
save The current irq status of the CPU, disable the CPU interrupt and obtain the specified spin lock spin_lock_irqsave raw_spin_lock_irqsave
obtain the specified spin lock and disable the bottom half spin_lock_bh raw_spin_lock_bh
release the specified spin lock spin_unlock raw_spin_unlock
release the specified spin lock and enable the CPU interrupt spin_unlock_irq raw_spin_unock_irq
releases the specified spin lock and restores the interrupt status of the CPU at the same time spin_unlock_irqstore raw_spin_unlock_irqstore
Get the specified spin lock and enable the bottom half spin_unlock_bh of this CPU at the same time raw_spin_unlock_bh
try to get the spin lock, if it fails, it will not spin, but return a non-zero value spin_trylock raw_spin_trylock
to judge whether the spin lock is locked, if other threads have already obtained lock, then return a non-zero value, otherwise return 0 spin_is_locked raw_spin_is_locked
************************************* ***********************

/**********************Put log information before sleep******************** *********/
echo N >sys/module/printk/parameters/console_suspend

/***********************************************************************/

grep -nr rockchip_suspend | grep ipc search


/**************************oops analysis********************** */

In the process of kernel development, kernel crashes are often encountered, such as null pointer exceptions and memory access out of bounds. Usually we can only rely on the abnormal call stack information printed out after the crash to locate the location and cause of the crash. Summarize the analysis methods and steps.

Usually after oops occurs, you will see the following log in the serial port console or dmesg log output. Take the crash of the linux kernel under a certain arm as an example.

<2>[515753.310000] kernel BUG at net/core/skbuff.c:1846!
<1>[515753.310000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
<1>[515753.320000] pgd = c0004000
<1>[515753.320000] [00000000] *pgd=00000000
<0>[515753.330000] Internal error: Oops: 817 [#1] PREEMPT SMP
<0>[515753.330000] last sysfs file: /sys/class/net/eth0.2/speed
<4>[515753.330000] module:  http_timeout     bf098000    4142
...
<4>[515753.330000] CPU: 0    Tainted: P             (2.6.36 #2)
<4>[515753.330000] PC is at __bug+0x20/0x28
<4>[515753.330000] LR is at __bug+0x1c/0x28
<4>[515753.330000] pc : [<c01472d0>]    lr : [<c01472cc>]    psr: 60000113
<4>[515753.330000] sp : c0593e20  ip : c0593d70  fp : cf1b5ba0
<4>[515753.330000] r10: 00000014  r9 : 4adec78d  r8 : 00000006
<4>[515753.330000] r7 : 00000000  r6 : 0000003a  r5 : 0000003a  r4 : 00000060
<4>[515753.330000] r3 : 00000000  r2 : 00000204  r1 : 00000001  r0 : 0000003c
<4>[515753.330000] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
<4>[515753.330000] Control: 10c53c7d  Table: 4fb5004a  DAC: 00000017
<0>[515753.330000] Process swapper (pid: 0, stack limit = 0xc0592270)
<0>[515753.330000] Stack: (0xc0593e20 to 0xc0594000)
<0>[515753.330000] 3e20: ce2ce900 c0543cf4 00000000 ceb4c400 000010cc c8f9b5d8 00000000 00000000
<0>[515753.330000] 3e40: 00000001 cd469200 c8f9b5d8 00000000 ce2ce8bc 00000006 00000026 00000010
...
<4>[515753.330000] [<c01472d0>] (PC is at __bug+0x20/0x28)
<4>[515753.330000] [<c01472d0>] (__bug+0x20/0x28) from [<c0543cf4>] (skb_checksum+0x3f8/0x400)
<4>[515753.330000] [<c0543cf4>] (skb_checksum+0x3f8/0x400) from [<bf11a8f8>] (et_isr+0x2b4/0x3dc [et])
<4>[515753.330000] [<bf11a8f8>] (et_isr+0x2b4/0x3dc [et]) from [<bf11aa44>] (et_txq_work+0x24/0x54 [et])
<4>[515753.330000] [<bf11aa44>] (et_txq_work+0x24/0x54 [et]) from [<bf11aa88>] (et_tx_tasklet+0x14/0x298 [et])
<4>[515753.330000] [<bf11aa88>] (et_tx_tasklet+0x14/0x298 [et]) from [<c0171510>] (tasklet_action+0x12c/0x174)
<4>[515753.330000] [<c0171510>] (tasklet_action+0x12c/0x174) from [<c05502b4>] (__do_softirq+0xfc/0x1a4)
<4>[515753.330000] [<c05502b4>] (__do_softirq+0xfc/0x1a4) from [<c0171c98>] (irq_exit+0x60/0x64)
<4>[515753.330000] [<c0171c98>] (irq_exit+0x60/0x64) from [<c01431fc>] (do_local_timer+0x60/0x74)
<4>[515753.330000] [<c01431fc>] (do_local_timer+0x60/0x74) from [<c054f900>] (__irq_svc+0x60/0x10c)
<4>[515753.330000] Exception stack(0xc0593f68 to 0xc0593fb0)

Here, we focus on the following points:

Oops information kernel BUG at net/core/skbuff.c:1846! Unable to handle kernel NULL pointer dereference at virtual address 00000000, here you can briefly tell what problem triggered oops, if the code directly calls BUG()/BUG_ON () type, but also give the trigger line number in the source code.

The value of the register PC/LR PC is at __bug+0x20/0x28 LR is at __bug+0x1c/0x28 , where PC is the instruction to send oops, and the caller of the function can be found through LR

CPU number and CPU register value sp ip fp r0~r10,

When oops, the Process Process swapper (pid: 0, stack limit = 0xc0592270) of the application layer, if the crash occurs in the context of the kernel call, this can be used to locate the corresponding user mode process

The most important thing is the call stack, you can analyze the error location through the call stack

It needs to be explained here, skb_checksum+0x3f8/0x400 After disassembly, you can find the skb_checksum function entry address offset 0x3f8 to precisely locate the execution point

When we need to accurately locate the error location, we need to use the disassembly tool objdump. Here is an example,

    objdump -D -S xxx.o > xxx.txt

For example, if we need to find the stack (et_isr+0x2b4/0x3dc [et]) from [<bf11aa44>] (et_txq_work+0x24/0x54 [et]), here we can know that this function is in the [et] obj file , then we can go directly to et.o, and then disassemble objdump -D -S et.o > et.txt, and then et.txt contains the disassembled instructions. Of course, just looking at the assembly instructions will be very troublesome. We need a one-to-one correspondence between the disassembly instructions and the source code to analyze the problem. This requires us to add the -g parameter when compiling, and add the symbol and debugging information during the compilation process to the final obj file, so that the objdump disassembled file contains the embedded source code file.

For kernel compilation, it is necessary to modify KBUILD_CFLAGS in Makefile and add -g compilation option in the root directory of kernel compilation.

    KBUILD_CFLAGS   := -g -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \                       
               -fno-strict-aliasing -fno-common \
               -Werror-implicit-function-declaration \
               -Wno-format-security \
               -fno-delete-null-pointer-checks -Wno-implicit-function-declaration \
               -Wno-unused-but-set-variable \
               -Wno-unused-local-typedefs

The following is a partial interception of the decompiled file. We can see that here 0x1f0 is the entry entry of the <et_isr> function, the source code of c is in the front, and the assembly code that follows is the corresponding disassembly instruction

f0 <et_isr>:
et_isr(int irq, void *dev_id)
#else
static irqreturn_t BCMFASTPATH
et_isr(int irq, void *dev_id, struct pt_regs *ptregs)
#endif
{
f0:   e92d40f8    push    {r3, r4, r5, r6, r7, lr}
f4:   e1a04001    mov r4, r1
    struct chops *chops;
    void *ch;
    uint events = 0;

    et = (et_info_t *)dev_id;
    chops = et->etc->chops;
f8:   e5913000    ldr r3, [r1]
    ch = et->etc->ch;

    /* guard against shared interrupts */
    if (!et->etc->up)
fc:   e5d32028    ldrb    r2, [r3, #40]   ; 0x28
    struct chops *chops;
    void *ch;
    uint events = 0;

    et = (et_info_t *)dev_id;
    chops = et->etc->chops;
:   e5936078    ldr r6, [r3, #120]  ; 0x78
    ch = et->etc->ch;
:   e593507c    ldr r5, [r3, #124]  ; 0x7c

    /* guard against shared interrupts */
    if (!et->etc->up)
:   e3520000    cmp r2, #0
c:   1a000001    bne 218 <et_isr+0x28>
:   e1a00002    mov r0, r2
:   e8bd80f8    pop {r3, r4, r5, r6, r7, pc}
        goto done;

    /* get interrupt condition bits */
    events = (*chops->getintrevents)(ch, TRUE);
:   e5963028    ldr r3, [r6, #40]   ; 0x28
c:   e1a00005    mov r0, r5
:   e3a01001    mov r1, #1
:   e12fff33    blx r3
:   e1a07000    mov r7, r0

    /* not for us */
    if (!(events & INTR_NEW))
c:   e2100010    ands    r0, r0, #16
:   08bd80f8    popeq   {r3, r4, r5, r6, r7, pc}

    ET_TRACE(("et%d: et_isr: events 0x%x\n", et->etc->unit, events));
    ET_LOG("et%d: et_isr: events 0x%x", et->etc->unit, events);

    /* disable interrupts */
    (*chops->intrsoff)(ch);
:   e5963038    ldr r3, [r6, #56]   ; 0x38
:   e1a00005    mov r0, r5
c:   e12fff33    blx r3
        (*chops->intrson)(ch);
    }

After objdump disassembles the instructions, we can find the corresponding precise call point according to the entry offset on the call stack. For example, (et_isr+0x2b4/0x3dc [et]) from [<bf11aa44>] (et_txq_work+0x24/0x54 [et]) , we can know that the call point is at the et_isr entry position +0x2b4 offset, and just now we saw et_isr's The entry position is 0x1f0, that is to say at the offset position of 0x1f0+0x2b4=0x4a4. Let's take a look, the following instruction 4a4: e585007c str r0, [r5, #124] ; 0x7c, the corresponding source code is the above c code, skb->csum = skb_checksum(skb, thoff, skb->len - thoff, 0); . And we also know that the next calling function is indeed skb_checksum, indicating that the precise calling instruction is accurate.

        ASSERT((prot == IP_PROT_TCP) || (prot == IP_PROT_UDP));
        check = (uint16 *)(th + ((prot == IP_PROT_UDP) ?
c:   e3580011    cmp r8, #17
:   13a0a010    movne   sl, #16
:   03a0a006    moveq   sl, #6
            offsetof(struct udphdr, check) : offsetof(struct tcphdr, check)));
        *check = 0;
:   e18720ba    strh    r2, [r7, sl]
    thoff = (th - skb->data);
    if (eth_type == HTON16(ETHER_TYPE_IP)) {
        struct iphdr *ih = ip_hdr(skb);
        prot = ih->protocol;
        ASSERT((prot == IP_PROT_TCP) || (prot == IP_PROT_UDP));
        check = (uint16 *)(th + ((prot == IP_PROT_UDP) ?
c:   e087200a    add r2, r7, sl
:   e58d2014    str r2, [sp, #20]
            offsetof(struct udphdr, check) : offsetof(struct tcphdr, check)));
        *check = 0;
        ET_TRACE(("et%d: skb_checksum: \n", et->etc->unit));
        skb->csum = skb_checksum(skb, thoff, skb->len - thoff, 0);
:   e5952070    ldr r2, [r5, #112]  ; 0x70
:   e58dc008    str ip, [sp, #8]
c:   e0612002    rsb r2, r1, r2
a0:   ebfffffe    bl  0 <skb_checksum>
a4:   e585007c    str r0, [r5, #124]  ; 0x7c
        *check = csum_tcpudp_magic(ih->saddr, ih->daddr,
a8:   e5953070    ldr r3, [r5, #112]  ; 0x70

static inline __wsum
csum_tcpudp_nofold(__be32 saddr, __be32 daddr, unsigned short len,
           unsigned short proto, __wsum sum)
{     
    __asm__(
ac:   e59dc008    ldr ip, [sp, #8]

There are a few points to be aware of when comparing geeks:

The call of the function call stack is not necessarily accurate (I don’t know why? It may be because the call process is reversed through LR, and LR may be modified during execution?), but one thing can be confirmed, the call point is accurate , that is to say, the calling function is not necessarily accurate, but the calling function + offset can find the exact
function that is called into the instruction inline and the optimized function may not appear on the call stack. Due to the need for optimization during compilation, Will expand the code in place, so that there will be no call stack frame (stack frame) exists here
/****************************oops end *******************************/
1. The delay related to Linux kernel
is divided into busy delay and sleep delay hour. The busy delay is used in occasions where the waiting time is extremely short, and both processes and interrupts can use the busy delay. The sleep delay is applied to occasions with a long waiting time, and it can only be used in processes that cannot be interrupted.
And if the process needs to be able to sleep and wake up anytime and anywhere, this must use the waiting queue mechanism of the Linux kernel. This is generally used in peripherals that do not need to work all the time, and is generally used to reduce power consumption. It is generally used in sensors. . And the semaphore itself can make the process sleep itself is also based on its waiting queue.

2. The problem of uncontrollable pins encountered in rockchip;
the configuration in the device tree is normal, and the driver has no error reporting, no pin multiplexing, and there are corresponding device nodes in the system, but through the echo and cat commands Check the status of the software is always 0, which is a low state, but from the performance of the hardware pin, the voltage of the pin is always low, that is, the voltage is 0, use echo to set the voltage to high through the device node Usually, there is no error reported in the log, but the software cat checks that the node value remains at 0, that is, the voltage is 0. It was discovered later that when dts configures the pins, the pins need to be set to the default state, that is: pinctrl-names = "default"; (some pins do not need to be set to the default state, because the default is default The pin function of gpio, so it is controllable without this configuration. This is uncontrollable. In fact, GPIO is in other functions, but we don’t know what function it is in at present) otherwise the pin will be uncontrollable.

3. TPL (Tiny Program Loader) and SPL (Secondary Program Loader) are Loaders at an earlier stage than U-Boot:
TPL: runs in sram and is responsible for completing ddr initialization;
SPL: runs in ddr and is responsible for completing the lowlevel of the system Initialization, post-level firmware loading (trust.img and uboot.img);
U-Boot proper: run in ddr, which is what we usually call "U-Boot", which is responsible for booting the kernel;
Description: U-Boot proper and This statement is mainly to distinguish it from SPL. And U-Boot proper we all call it U-Boot for short.

4. Which page of virtual memory is mapped to which page frame of physical memory is described by the page table (Page Table). The page table is stored in physical memory, and the MMU will look up the page table to determine that a VA (virtual address) should be mapped to what PA (Physical Address). When the MMU is enabled, the addresses used in our program are all virtual memory addresses, and these will cause the MMU to perform table lookup and address translation operations. The converted virtual address (MVA, modified virtual address)
segmentation fault sometimes causes the process to crash in the program, and it is generated in this way:
(1) When the user program wants to access a VA, it has no access after checking by the MMU .
(2) The MMU generates an exception, the CPU switches from the user mode to the privileged mode, and jumps to the kernel code to execute the exception service program.
(3) The kernel interprets this exception as a segment fault and terminates the process that caused the exception.
5. The essence of delayed_work in the kernel is also realized through work queues and timers. Linux's interrupt processing is divided into two halves, the top half handles urgent hardware operations, the bottom half handles non-urgent time-consuming operations, and tasklets and work queues are good mechanisms for scheduling interrupts in the bottom half, and tasklets are Based on soft interrupt implementation, the kernel timer also relies on soft interrupt to achieve.

/****************** Generate .so library command *********************/
 gcc -fPIC -shared dl.c -o libdl.so

dl.c file name libdl.so library name to be generated In android, the .so file is generally generated and compiled in android.mk.

 After compiling its opendl.c into an executable file
 gcc -rdynamic -o opendl opendl.c -ldl
 
 c language, in order to make the program easy to expand and versatile, it can be in the form of a plug-in. The asynchronous event-driven model is adopted to ensure that the logic of the main program remains unchanged, and each business is loaded in the form of a dynamic link library, which is the so-called plug-in. Linux provides system calls for loading and processing dynamic link libraries, which is very convenient.
The c language provides api to let us load the dynamic link library file android, that is, the .so file on it

The dlopen function opens the specified dynamic link library file in the specified mode, and returns a handle to the calling process of dlsym() dlclose
unloads the opened library
dlerror returns the error that occurred
dlsym dlsym gets the function name or variable name through the handle and the connector name

/******************************************************/

Guess you like

Origin blog.csdn.net/qq_48709036/article/details/123042648