CPU, don't worry! Leave the chores to the younger brothers

1fdd2e4fd7632b52748e6ab1b37ea67f.jpeg

"CPU runtime is a precious resource, and we need to devote limited CPU time to more meaningful things."

In the process of our embedded development, you must have done these things: use GPIO模拟某种通信接口, such as SPI, etc.; use 空循环来实现延时delay; key status bits of registers such as empty. Maybe it is out of helplessness, such as the chip used does not have hardware SPI or channels are not enough, or at this time the CPU has nothing to do except idling, but we must have this awareness: 这是在浪费CPU资源.

The CPU is the heart of an embedded system, but it 不必深入参与到每一个细节中去. Remember: the CPU is all the hardware resources on the chip 统领者, not a coolie who does everything by himself. We must learn to make full use of on-chip hardware resources as much as possible, and even expand some specialized hardware circuits outside the chip to complete the functional design.

let's see

In this article, Zhang Zhennan will explain to you through several examples 如何减轻CPU负担, and use on-chip and off-chip hardware to realize the functions we want.

Petroleum logging instrument

1.1 Background knowledge

During my career, I have been doing petroleum instrumentation for more than 5 years. This is a very traditional industry, but it is also a very comprehensive and technology-intensive industry.

Someone said: "You seem to be talking about CPU utilization in this chapter, why are you talking about petroleum instruments again?" Don't worry, Zhennan has his own intentions.

Please see Figure 1.1.

854eaa421f40e89d6a2f80cb37d5f527.png

Figure 1.1 Schematic diagram of petroleum logging system

The figure above shows a simple topology diagram of the petroleum logging system. When working, the logging vehicle pulls the steel cable up through the wheel, and at the same time, the instrument transmits signals (electricity or ultrasound) to the outside, and receives the return signal. After calculation, the result is uploaded to the ground system through the coaxial Ethernet, and drawn by the host computer. out of the curve. The final curves are handed over to interpreting engineers to determine the location of hydrocarbon reservoirs.

The speed of lifting is certain. Of course, we hope to collect more data at a certain depth, that is, to increase the sampling rate as much as possible. In this way, more details can be reflected in the final log curve.

OK, this is the most basic principle and background.

1.2 Realization of logging data collection and transmission

The circuit is relatively clear, as shown in Figure 1.2.

3b21cd99cfac08b5eae5ea64644beaa2.png

Figure 1.2 Principle block diagram of logging tool data acquisition and transmission

1.2.1 The most direct primary solution

The most direct solution is the one that everyone can think of, which is to collect, calculate, and send step by step, as shown in Figure 1.3.

f60d303a323b36a893e7a2d3a71f64c2.png

Figure 1.3 The most direct implementation scheme of logging data collection and transmission

The thing to do in each cycle is: "ADC collects a waveform, and then performs calculations, mainly digital signal processing such as digital filtering, FFT, DPSD, etc., and finally packs the result data according to the protocol format through McBSP (TI DSP proprietary communication interface) to the coaxial Ethernet communication module. Of course, we hope that this cycle is as short as possible, which requires optimization and compression of some steps.”

1.2.2 Add DMA optimization scheme

If you take a closer look at the above scheme, you will find 它的所有操作都是需要CPU参与的,大量的时间都在等待外设。how to reduce the participation of the CPU, and not waste its precious time on idle waiting, but on the calculation of the core algorithm, as shown in Figure 1.4.

5969f737f7fb9ef2a2239a862052b828.png

Figure 1.4 Data acquisition and transmission optimization scheme with DMA added

We firstly use the CPU to participate in the completion of a waveform acquisition, and then start calculations on the acquired data. Because digital signal processing involving a large amount of floating point data is involved, the calculation process will take a long time, and a calculation takes about 10ms. At the same time, we start the ADC conversion in a timely manner, and calculate it in the time gap of its conversion. We 然后直接启动SPI-DMA传输来读取ADC的转换数据,而CPU不用去等DMA传输完成,can use the time of DMA transmission for calculation, and finally go back and immediately perform the next calculation, because the new waveform is ready at this time. up. In this way, the time of one cycle can be compressed to 10ms, and the sampling rate is doubled than before.

Zhennan wants to tell everyone through this example: "The running time of the CPU is precious, and making full use of the hardware resources on the chip will free up more CPU time to do more meaningful things. Some skills and DMA The rational use of it is an effective method.”

In fact, in many cases, the hardware resources that can be used are not limited to the on-chip. "We design some simple off-chip circuits to assist, and sometimes we can achieve unexpected effects," please read below.

Smart drive camera

2.1 Camera Timing Analysis

I know that many people are interested in the camera module, and want to use a single-chip microcomputer to drive it to try the effect, but not many have succeeded, as shown in Figure 1.5.

ab322f4155a8cb3cda75631d38b9c218.png

Figure 1.5 More popular OV7670 camera modules and modules

There are several reasons for this:

  • The timing of the camera CMOS chip is more complicated;

  • SCCB communication and configuration of related registers;

  • The timing is too fast, and it is actively output according to its natural frequency, so it is difficult to capture and collect data.

How fast is its timing, let's look at the picture below, as shown in Figure 1.6.

d76e1389840b901f8ff1ef4d74d5e663.png

Figure 1.6 Timing diagram of OV7670

The OV7670 is reachable in VGA mode 最高帧率30fps, ie 每秒钟产生30帧640X480尺寸的图像. According to the official information, the actual number of output lines in VGA mode is 510, and the number of pixels output by each line is 784 (the extra lines and pixels are redundant, and the data is invalid. We only focus on HREF being high. level during the pixel data). In this way, the clock period of PCLK is 1/(30*510*784*2)=41.7ns. It is almost impossible to use the GPIO of a general microcontroller to directly collect pixel data, becauseIO与CPU的速度都不够快。

2.2 Using DCMI+DMA

To read such high-speed data from the camera, it must be available 专门的硬件. We can choose ST's STM32F4 series microcontroller, which is built-in DCMI(数字摄像头模块接口), and it can easily complete the image acquisition function. It needs to work with DMA, as shown in Figure 1.7.

062530869875ba5c662b3aaad6419621.png

Figure 1.7 Use DCMI+DMA to drive the camera

DCMI obtains camera data, which can 通过DMA直接将数据保存到内部RAM或外部的SDRAMbe directly written into TFT to realize real-time dynamic display of images. In the whole process, CPU只不过在作一些配置性的工作, did not participate in image data acquisition and transmission. Therefore, using high-end chips will make our development work even more powerful and get twice the result with half the effort. Just because it has more 强大的硬件外设来为我们完成特定的功能实现. Of course, more powerful hardware also means more learning costs, we need to carefully learn how to use it correctly to achieve the desired effect.

Sometimes, the hardware peripheral circuit is even more complex than the CPU core. For example, in some multimedia codec SOCs, the CPU core is only 51 or M0, and the larger area on the chip is the codec circuit such as H.264. Therefore, as an embedded development engineer, "First of all, you must fully understand what hardware resources you have, and don't rely solely on the CPU to implement all functions."

a7aa654e69c9209b524509b1f81af15b.jpeg

Scan the code to join the group

b115f44bbd8b87233810f5e8bfaa2431.jpeg

2.3 Self-built external circuit

The name of this section is "Smart Drive Camera", and none of the solutions described above can be counted as "smart". In the above scheme, it is necessary to require the single-chip microcomputer to have special hardware such as DCMI, so can it be done without DCMI? For example, it is possible to drive the camera with an ordinary 51 or low-end M0 single-chip microcomputer. The answer is yes, but this requires us to do some tricks on the external circuit, as shown in Figure 1.8.

671bc7e661665ec77cfe9dd74fe56ec0.png

Figure 1.8 Realize image acquisition through off-chip parallel FIFO+timing adjustment

With the flow chart below, everyone will know its ingenuity, as shown in Figure 1.9.

5ab9477dba78926396b526405c0e7855.png

Figure 1.9 Acquisition of one frame of data through off-chip parallel FIFO

After the program runs according to the logic described in the figure above, a frame of image is stored in the FIFO. At this point 单片机可以慢慢从读取端(the parallel FIFO is divided into a write end and a read end, corresponding to a write pointer and a read pointer respectively), the image data is read. In this way, the speed of CPU and IO is no longer a bottleneck. Through such a mechanism, any single-chip can easily realize image acquisition.

What did the CPU do during this process? 似乎只有等待帧同步信号VSYNC和操作几个IO. This method is more CPU-efficient than DCMI+DMA (DMA will actually occupy half of the on-chip data bus bandwidth, reducing the operating efficiency of the CPU), and it is more flexible and less dependent on the microcontroller hardware.

Single-chip microcomputer cleverly drives 7-inch large LCD screen

Through the above few examples, you should know the number of ways Zhennan calls "smart driving". Yes, let the hardware do the talking, and we want to be "both hard and soft" engineers.

OK, if I ask everyone: "I can use a 51 or M0 single-chip microcomputer to drive a 7-inch large-screen LCD (800 * 480), as shown in Figure 1.10, and play videos smoothly, do you believe it?" You will definitely say: "Not quite Maybe, the refresh rate is not enough.” But since I asked this question, then Zhennan must have realized it, and here I will tell you the realization process.

ab61ebb7c3c885a4460885dd5c18893e.png

Figure 1.10 7-inch TFT LCD module

First look at the schematic diagram, as shown in Figure 1.11~14.

d31fa02c35a3bb32c99272a0ba561db3.png

Figure 6.11 The MCU part of the schematic diagram of Qiaodrive 7-inch LCD screen

2020bc8b9294ad271f28a57ee9f1900e.png

Figure 6.12 The 74HC595 serial-to-parallel part of the schematic diagram of Qiaodrive 7-inch LCD screen

89325bd3b9ae55df51d3367f9d4d52ca.png

Figure 6.13 Octal 8-ary counting and timing adjustment part of the schematic diagram of Qiaodrive 7-inch LCD screen

0ae75684ebf450f5c316496437d1cf27.png

Figure 6.14 The spiFlash and 7-inch TFT interface part of the schematic diagram of Qiaodrive 7-inch LCD screen

The basic implementation logic is shown in Figure 1.15.

137f2e10a3382bfcb2b73b6772e62918.png

Figure 1.15 The logic block diagram of the basic realization of the smart drive 7-inch LCD screen

Carefully observe the schematic diagram and logic block diagram above, it is estimated that many people have already understood the meaning of Zhennan, and Zhennan will give a supporting flowchart to make the logic clearer, as shown in Figure 1.16.

c0943cbe6d46e522dcf0ca05dc4d8bb8.png

Figure 1.16 Basic flow chart of Qiaodrive 7-inch LCD screen

Two pieces of 74HC595 are used to convert 16-bit serial data into parallel, and connect with 16-bit data interface of TFT liquid crystal. The serial data input of 74HC595 is connected with two GPIOs of MCU and two serial data ports of spiFlash at the same time. When spiFlash is disabled (that is, CS is set high), its data port presents high impedance, and the 74HC595 can be operated by the MCU at this time; and when the GPIO of the MCU is set to high impedance, the two 74HC595 can respectively receive double-bit serial from spiFlash data. Such a multiplexing design, "can enable the MCU to pre-initialize the TFT liquid crystal, making it work in the pure pixel data writing mode; while in the high-speed data writing stage, the MCU exits and allows the TFT to receive data from spiFlash. "

The main point of two 74HC595s to achieve serial-to-parallel conversion is the generation of LC latch signal. Every time 8 SCK pulses are generated, a rising edge on LC is automatically generated, which is part of the timing generation and adjustment logic. The basis of realization lies in the combination of 74HC161 and 74HC27, as shown in Figure 6.13. First reset and clear 74HC161, at this time [Q2:Q0]=000, 74HC27 is a three-input NOR gate, its output is 1Y, that is, 595-LC is 1; after the clock is input, [Q2:Q0] will increase by 001 , 010 ... 595-LC is 0 before 000, and after 8 clocks, 595-LC will become 1, that is, a rising edge is generated. Here, Zhennan added two stages of 74HC1G32 to the 595-LC as a buffer, in order to increase some delay and make the latch data output of 74HC595 more stable.

Then there is the generation of the WR signal of the liquid crystal: As can be seen from Figure 6.12, the WR signal is a NOR of a GPIO and an eight-bit counter outputting the highest bit Q2 (yes, yes or not). When Q2 is 0, WR is controlled by GPIO, which can be used for MCU to pre-initialize TFT. When GPIO is 0, WR is controlled by Q2, and every 8 clocks will generate a falling edge (the previous NOR is to delay this falling edge, so that the 16-bit parallel data is written into the LCD more stable), and maintain 4 clock cycles.

The basic points have been described clearly. As for clock generation, the only requirement is that a specific number of clocks be generated, not continuously. For example, the data volume of one frame of image is 800 * 480 half words, and we need to output 3072000 clocks to display one frame of image on the LCD. So we can't use MCO or PWM, but use SPI. If it is 8-bit SPI, it needs to be written 384,000 times, and if it is 16-bit SPI, it needs to be written 192,000 times. "Of course, in order to save more CPU resources, we can use DMA. When the clock is continuously generated and the frame-by-frame images are displayed on the LCD, the video will be played smoothly."

I once told my colleagues about my experiment of "smartly driving a large screen". While they were admiring, they also said: "It's a waste if you don't make an FPGA!" In fact, I have worked on an FPGA for a while. That was when I was working as an intern at Intel China Research Institute in 2007.

Well, this chapter uses 3 examples to illustrate the sentence at the beginning of this chapter: "CPU time is precious, and we have to invest limited CPU time in more meaningful things."

In actual development, making full use of hardware resources and flexibly expanding some hardware circuits by yourself can usually achieve unexpected results, and even make the impossible possible.

A "Always remember: we do embedded software a lot of the time, but in the final analysis we do hardware."

After a year of preparation, Mr. Yu Zhennan combined with years of practical experience to create a new and systematic course " Ten Days to the Top of Embedded C Language (Master C) ", with a total of 100 lectures and a total class time of more than 2,000 minutes. Deciphered many problems that you don't know and often make mistakes in the embedded C language for us.

I hope that this set of courses can enable those who have already started C language and are still wandering [halfway up the mountain] to take it to another level, and finally reach the pinnacle of embedded C language! ! From 18:00 on 2023.07.02, it will be instakilled in the WeChat group!

376ca05af96a1e895c4b9bb54729ebac.jpeg

5b6d0a2a0b2455bdfc07da34d3caa1b9.jpeg

Past recommendation

301b5b8135ab9258b30209f3e9fdae6a.jpeg

Sister Xiaoyu will take you to combine software and hardware to experience the best posture of ADC DMA to collect multiple voltages and currents

e727b234927438923d8c224534e29778.jpeg

Dry goods | One article explains the STM32 serial port DMA transceiver mechanism

63fb98ef517f0ead699f6f57c8d64217.jpeg

In-depth long article: How does STM32 combine software and hardware, and how to run it step by step after compiling

d4ea756254de3a8e047825ae3d2a4e1b.png

Completely written in C, highly portable, super awesome software timer!

Guess you like

Origin blog.csdn.net/karaxiaoyu/article/details/131507663