Cortext-M3 Series: Debugging Components (9)

1. Introduction to debugging components

        There are many debug components in CM3, using which you can perform various debugging functions: breakpoints, data watchpoints, flash address reloading, various traces, etc. Software developers may never need to know the details of debug groups, since they are usually only used by the debugger and its surrounding tools.

        This article makes a basic introduction to each debugging component. If you need to know more detailed information about them, such as the programming model, please refer to "Cortex-M3 Technical Reference Manual (Ref1)". All debug and trace components, as well as the FPB, can be programmed via the CM3's private peripheral bus. In most cases, only the debug host will program these components. Applications are strongly discouraged from attempting to access debug components (other than access to the stimulus port registers in the ITM), as doing so can easily conflict with the debugger.

Tracking system for Cortex-M3

        The tracking system of CM3 is based on the CoreSight architecture, the tracking data is packed into data packets, and their length is variable. The Trace component uses the Advanced Trace Bus (ATB) to send these packets to the TPIU, which then formats them into packets that conform to the Trace Bus Interface Protocol. Formatted packets are sent off-chip where they can be captured using a device such as a Trace Port Analyzer (TPA). The route of the entire data flow is shown in the figure.

                                                        Schematic diagram of the digestive system of Cortex-M3

2. Tracking component: Data Watchpoint and Tracking (DWT)

The provided debugging features include:

1. It contains 4 comparators, which can be configured to perform the following actions when a comparison occurs:

a) Hardware watchpoint (generate a watchpoint debug event and use it to invoke debug mode, including shutdown mode and debug monitor mode

b) ETM trigger, which can trigger ETM to send a data packet and import it into the instruction tracking data stream

c) Program Counter (PC) Sampler Event Trigger

d) Data address sampler trigger

e) The first comparator can also be used to compare the clock cycle counter (CYCCNT), which is used to replace the comparison of the data address

2. As a counter, DWT can count the following items:

a) Clock cycle (CYCCNT)

b) Folded instructions

c) Operations on load/store units (LSUs)

d) Sleep clock cycle

e) Cycles per instruction (CPI)

f) Interrupt overhead (overhead)

3. Sampling the value of PC at a fixed period

4. Interrupt event tracking

        When used as a hardware watchpoint or ETM trigger, the comparator can compare both data addresses and the program counter PC. When used for other functions, the comparator can only compare data addresses. ​ Each comparator has 3 registers: COMP register, MASK register, and FUNCTION control register.

        Among them, the COMP register is a 32-bit register used to store the value to be compared. The MASK register can be used to mask some bits of the data address, and the masked bits are not involved in the comparison.

        The comparator's FUNCTION register is used to determine the function of the comparator. To avoid potentially unpredictable behavior, MASK and COMP must be programmed first and RUNCTION last. If you want to change the function of a certain comparator, you must first clear the FUNCTION——disable the comparator, and then reconfigure it again, and the FUNCTION is still the last configuration.

        There are remaining counters in the DWT, which are typically used for "profiling" of program code. They can be programmed to emit an event (in the form of a trace packet) when the counter overflows. Most typically, the CYCCNT register is used to measure the number of cycles it takes to execute a certain task, which can also be used for time-based purposes (it can be used to count CPU usage in the operating system).

3. Tracking component: Instrumented Tracking Macrocell (ITM)

        ITM has the following functions:

Software can write console messages directly to the ITM stimulus port, thereby outputting them as trace data.

DWT can generate trace packets and output them through ITM.

The ITM can generate timestamp packets and insert them into the trace stream to help the debugger figure out when events occurred.

        Because the ITM uses the trace port to output data, there must be a TPIU unit on the chip, otherwise the output will not work—confirm this before using the ITM. If you are unlucky enough to not have a TPIU, you can also use the NVIC debug registers, or, as a last resort, resort to the UART to output console messages.

        To use ITM, the DEMCR.TRCENA bit must be set, otherwise the ITM is disabled and cannot be used. Additionally, there is a lock in the ITM register. Before programming the ITM, an access key value 0xC5AC_CE55 (ACCESS for CoreSight) must be written to this unlock register. Otherwise, all writes to the ITM register are ignored. Finally, ITM itself is another control register (maybe the name of the control register is also "ITM"), which is used to control the independent enablement of each function.

        The ATID field is included in the control register as the ID value of the ITM in the ATB. This ID must be unique - each trace source must have a unique ID value so that the debug host can separate the data for each trace source from the received trace packets.

3.1 ITM-based software tracking

        One of the main purposes of ITM is to support the output of debugging messages (for example, output in printf format) ITM contains 32 stimulus (stimulus) ports, allowing different software to output data to different ports, so that the debugging host can send them The news is separated. By programming the "trace enable register", each port can be enabled/disabled independently, and user processes can also be allowed or prohibited from writing to it.

        Unlike UART-based text output, using the ITM output does not introduce significant delays to the application. Inside the ITM there is a FIFO which buffers written output messages. However, to be on the safe side, it's best to check how full the FIFO is before writing.

        Outgoing messages are sent to the TPIU, where they can then be collected via the "trace port interface" or "serial wire interface". In the final code, there is no need to remove the code that generates the debug message. Instead, the TRCENA bit can be cleared, so that the ITM is disabled and the debug message will not be output. You can also enable the message in a "live" system. output. In addition, the allowed ports can be limited by setting the trace enable register.

3.2 Hardware Tracking Based on ITM and DWT

        ITM can also be used to output hardware trace data, which is generated by DWT, and ITM acts as the merging unit of trace data packets, as shown in Figure 16.2. To use the DWT trace, the DWTEN bit needs to be set in the ITM control register, and the rest of the DWT trace settings are done in the DWT.

3.3 ITM Timestamp

ITM also comes with a timestamp function: when a new trace data packet enters the ITM FIFO, ITM will insert a differential timestamp packet into the trace data flow. After obtaining these time stamps, the trace capture device can find out the time-related information among trace data. Additionally, timestamp packets also occur when the timestamp counter overflows.

4. Tracking component: Embedded tracking macrocell

        The ETM function block is used to provide instruction tracking (that is, the history record of instruction execution). It is an optional part and may not appear on all CM3 products. When it is enabled, and after the trace operation starts, it will generate instruction trace packets. There is also a FIFO buffer in the ETM to provide enough time for tracking data flow capture.

         In order to reduce the amount of data generated, ETM does not keep busy outputting the address that the processor is currently executing. Usually it only outputs information about the program's execution flow, and only when needed, the full address (for example, when a jump occurs). Since the debug host also has a copy of the binary image, it can use this copy to reconstruct the execution sequence of instructions. ETM also interacts with other debugging components. For example, it is related to the comparator of DWT: The comparator of DWT can be used to generate the trigger signal of ETM, or control the start and stop of trace.

        Different from the ETM of the traditional ARM processor, the ETM of CM3 does not have its own address comparator, but is completed by the comparator of DWT. In fact, the ETM of CM3 is very different from that of traditional ARM.

        To use ETM, the following building steps must be performed (completed by the debugger and its peripheral tools)

1. Set the DEMCR.TRCENA bit to 1 (see Table 15.2 or D.37 for the definition of the DEMCR register).

2. Unlock the ETM to program its registers: write 0xC5AC_CE55 to the ETMLOCK_ACCESS register.

3. Program the ATBID register (ATID) to give ETM a unique identifier to separate its trace packets from those of other trace sources.

4. The NIDEN input signal of ETM must be high level. The implementation of this signal depends on the specific device, and you need to refer to the data sheet of the device.

5. Program the ETM control register set to generate trace data.

5. Trace component: Trace Port Interface Unit (TPIU)

        Tracking data for ITM, DWT and ETM are all aggregated at TPIU. The TPIU is used to format and output these trace data off-chip for reception by devices such as trace port analyzers. The TPIU of CM3 supports two output modes:

With clock mode (Clocked mode), using up to 4-bit parallel data output port​

Serial Wire Viewer (SWV) mode, using a single-bit SWV output

        In clocked mode, the actual number of bits used by the data output port is programmable. It depends on two points. One is the package of the chip; the other is how many signal pins are provided for tracking output in the application. In a specific chip, the maximum size of the trace port can be determined by checking the registers of the TPIU. In addition, the speed of trace data output is also programmable.

        In SWV mode, the SWV protocol is used. It reduces the number of output signals required, but the maximum bandwidth of the tracking output is also reduced.

        To use TPIU, you need to set DECMR.TRCENA first, and also program the "protocol selection register" and "trace port size" register. This work is done by the trace capture software.

6 Flash address reload and breakpoint unit (FPB)

        FPB has two functions:

        Hardware breakpoint support. Generate a breakpoint event, so that the processor enters the debug mode (shutdown or debug monitor exception) Load the instruction or literal value (literal data) in the code address space, and reload it into the address space of SRAM. FPB has 8 comparators, namely: 6 instruction comparators, 2 literal value comparators

        There is a flash address reload control register in the FPB, which contains the enable bit of the FPB. In addition, each comparator has its own enable bit in its own control register—the former is the master switch. Both enable bits must be 1 to enable the comparator.

        The address of the instruction space can be reloaded (remapped) to the SRAM address space by programming the comparator. When using this function, the REMAP register needs to be programmed to provide the base address of the content to be remapped. The highest 3 bits [31:29] of the REMAP register are hard-wired to 0b001, so the remapped address range is limited to 0x2000_0000-0x3FFF_FF80, and this address just falls in the SRAM address space.

        When the address of the instruction or the address of the literal value matches the value in the comparator, the read access will be remapped according to the setting of REMAP.

        Using this remapping feature, it is possible to create tests of the form "what if" - by substituting the original instruction or literal value for another. And even code running in ROM or flash can participate in this kind of testing. Another usage is essentially the same as this one, but it is replaced by a jump instruction, so the behavior is very similar to "civet cat for prince": for a subroutine located in flash, provide a subroutine in SRAM that pretends to be it. . By reloading the address of the flash memory, when the instruction (BL) calling the subroutine is executed, what is actually executed is the "packaged" BL located in the SRAM, and the latter jumps to the "civet cat". This mechanism enables ROM-based devices to be debugged (modified subroutines are temporarily placed in SRAM). The figure below demonstrates the effect of remapping.

        In addition to address reloading, another function of the instruction address comparator is to generate hardware breakpoints (6 in total), and the processor enters the debug mode when the address matches.

7. AHB access port

        AHB-AP is located between the memory system of CM3 and the debug interface module (SWJ-DP/SW_DP), acting as a bus bridge. For most basic data transfers between the debug host and the CM3 system, only 3 registers in the AHB-AP need to be used, they are: Control and Status Word (CSW), Transfer Address Register (TAR), Data Read/ Write (DRW).

        The connection method of AHB-AP is shown in the figure:

                                                  AHB-AP connection in Cortex-M3

        The CSW register can control the transfer direction (read/write), transfer size, and transfer type. The TAR register instructs the transfer address, while the DRW register holds the data to be transferred (the transfer is started when the register is accessed). The data in DRW is consistent with what is actually displayed on the bus, so for halfword and byte transfers, the resulting data must be properly shifted by the debug hardware to align to the LSB. For example, if you want to perform a halfword transfer at address 0x1002, you need to put the data in [31:16] of DRW. AHB-AP can generate unaligned transfers, but it will not automatically perform circle shifts on the target data according to the address offset. This hole must be plugged by the debugging software: either manual circle shifts, or decompose the unaligned access into several aligned access.

        There are other registers in the AHB-AP which provide additional functionality. For example, AHB-AP provides 4 bannked registers and the address auto-increment function to speed up data access in a small range of continuous addresses.

         In the CSW register, there is also a bit called MasterType. Usually it needs to be set to 1 to inform the hardware participating in the AHB-AP data transmission that the data transmission is initiated by the debugger. However, a debugger can also clear this bit to masquerade as the processor core. In this way, the hardware receiving data on the AHB will think that it is a data transfer initiated by the kernel, and thus operate normally. This feature can be used for testing purposes, especially for peripherals with FIFOs, to know how it behaves differently when it is accessed by a debugger.

8. ROM table

        The debugging system of CM3 also includes a ROM table, which is used to automatically detect which debugging components are included in a certain CM3 chip. Although, as the first implementer of v7-M, CM3 has a predefined memory map and contains standard debug components, new Cortex-M devices can contain different debug components, and chip manufacturers implement CM3 Modifications can also be made to the debug component. In order to enable the debugging tool to detect the specific components included in the debugging system, this ROM table is provided, which records the addresses of the NVIC and each debugging function block.

        The ROM table is located at 0xE00F_F000. By analyzing the contents of the ROM table, the location of the system and debug components in the memory system can be calculated. After detecting debug components, the debugger can then look at their ID registers to determine which components are available in the system. In the ROM table of CM3, the content of the first entry should be: the offset of the entry address of NVIC relative to the entry address of the ROM table. The default value of the first entry in the ROM table is 0xFFF0F003, and the role of the bit field [1:0] is quite special: it indicates that the device corresponding to this entry exists, and there are subsequent entries behind this entry (that is, this entry entry is not the last entry). In this way, through the first entry, we know that there is an NVIC in the system, and there is a second entry, and the address of the NVIC can be calculated as 0xE00F_F000+0xFFF0_F000=0xE000_E000.

        The default ROM table is shown in Figure 16.2. But because the chip manufacturer can add, remove and replace some optional components with other CoreSigth debugging components, the ROM table of the chip will be different from the default one to reflect the corresponding changes. The table is the default ROM table of Cortex-M3.

        The lowest two bits of the value are used to indicate whether the device exists (bit[1]) and whether there are other entries behind it (bit[0]). Under normal circumstances, NVIC, DWT and FPB must always exist, so the last two bits are always 1. However, TPIU and ETM can be eliminated and possibly replaced by other debug components in the CoreSight family.

        The high-order part of the value is used to give the offset of the entry address of the corresponding component relative to the entry address of the ROM table. For example, NVIC entry address = 0xE00F_F000 + 0xFFF0_F000 = 0xE000_E000 (the carry bit is ignored)​

        When developing debugging tools, it is necessary to check the debugging components one by one from the ROM table, because it is inevitable that some alternative CM3 chips will customize the debugging components and modify the ROM table, and the address obtained by calculating the ROM table can be obtained. To make a decision.

Guess you like

Origin blog.csdn.net/zichuanning520/article/details/131311697