Popular science: ARM's authorization method

Copyright statement: This article is an original article by the blogger, please indicate the source for reprinting https://blog.csdn.net/baidu_35679960/article/details/78446917

When chatting with classmates, I always hear the name of IP core. I don't know how to understand this IP core. Is it something like A53 or A72? Did some research today.

The IP core is a functional module with intellectual property rights, specific functions, and interface specifications that can be reused in multiple integrated circuits, and is the basic component for realizing a system chip. You can simply understand it as a well-designed functional module. (The [design] here has different forms according to the degree of perfection, which can be divided into three categories: soft core, solid core, and hard core)

  • Soft core: It is understood as [program code], which describes the functional module in a hardware description language (such as a flip-flop written in VHDL, which is in text form), and does not contain any physical implementation information. (The soft core is characterized by strong portability, short design cycle, and low cost for users. The disadvantage is that the physical implementation performance is uncertain and incomplete, and the protection of property rights is poor)
  • Solid core: In addition to the program code for realizing functional modules, it also includes design links such as gate-level circuit synthesis and timing simulation, which are generally provided to users in the form of gate-level circuit netlists . Solid core can be understood as not only including soft core program code, but also [rules between programmer module design intent and hardware physical implementation].
  • Hard core: It is based on physical description and has been verified and feasible through the process (40ns/28ns, etc.), and the performance is guaranteed. It is provided to users (chip manufacturers) in the form of circuit physical structure mask layout and a full set of process files .


The three authorization methods of the architecture[1]

In the traditional PC field, semiconductor manufacturers generally have two options. First of all, like Intel, it does everything by itself from start to finish, and does not rely on anyone for the design and production of architecture and chips. To do this requires extremely strong and all-round strength to guarantee, and must have money, people, and technology, especially in today's increasingly complex semiconductor technology, there are only a handful of people who can do this. Of course, the benefits are also obvious. Not only can you fully control your own lifeline, but the profits are also extremely considerable. Almost any product of Intel can enjoy very high profits, and you can sell as much as you want.
The other is no factory mode (Fabless). NVIDIA is like this, AMD really can't consume Intel and it has become like this. Such companies just design their own chips and leave the manufacturing to foundries, such as TSMC, UMC, GlobalFoundries, and Samsung Electronics. The benefits are obvious, the burden is very light, you can just design it yourself, and you don’t need to spend a lot of money to build a fab and develop new processes, but the disadvantages are also very prominent: you have designed it, whether it can be built, and even if it is built, it is another question. You can't decide what it looks like, it depends on the ability of the foundry partner. Of course, there are many lessons in this regard: TSMC's 40/28nm two-generation process was initially immature, and the production capacity was slow to come, which dragged down the entire industry; GlobalFoundries 32nm process did not reach AMD's expected level, the first generation FX/ The frequency and voltage of the APU processor are much worse than the design. The 28nm process has been blown for so long until now, which forced AMD to give up an entire generation of low-power APUs and had to redesign and go to TSMC.
ARM is completely different from them. It doesn't manufacture or sell any chips, it just designs its own IP, including instruction set architecture, microprocessor, graphics core, interconnect architecture, and then sells the license to whoever likes it. Customers can do whatever they want with ARM's IP. The way ARM operates is actually very simple, providing authorization to the outside world in three different modes:[1]

  • Architecture/Instruction Set Level Authorization[1]
It means that the ARM architecture can be greatly transformed, and even the ARM instruction set can be expanded or reduced. Apple is a good example. On the basis of using the ARMv7-A architecture, it has extended its own Apple swift architecture (the swift here It is a core similar to A53, because the instruction set is actually very simple, but it is very troublesome to create a core from the instruction set) ; the instruction set should be public, for example, the X86 instruction set is public, but the implementation of the instruction set The approach, that is, the architecture, is different.

  • Kernel-level authorization (that is, what you call ip core authorization)[1]
Refers to a core that can be based and then add its own peripherals, such as USART GPIO SPI ADC, etc., and finally form its own MCU, such as Samsung, Texas Instruments (TI), Broadcom, Freescale, Fujitsu And Calxeda and so on. .

  • Use level authorization (I don't know what the use level authorization is, is it something similar to POP IP?) [1]
In order to use a processor, it is the most basic to obtain the authorization of the use level, which means that you can only use the defined IP provided by others to embed in your design, you cannot change the IP of others, and you cannot use Other people's ip creates their own packaged products based on the ip;
as the lowest authorization level, users with licenses can only buy the packaged ARM processor core, and if they want to achieve more functions and features, they can only It is realized by adding a DSP core outside the package (of course, it can also be realized by re-package method of the chip). Due to concerns about the lack of intellectual property protection, ARM has adopted this level of authorization for many Chinese-backed companies.

To make a very vivid analogy: Suppose I write an article, I tell A, you can use it after modification, it is the architecture level authorization, I tell B, you can cite my article in your article, then It is a kernel-level authorization. I told C that you can only repost my article, you cannot change it, and you cannot add fuel to it. That is to use the level authorization.
To make a very vivid analogy: Suppose I wrote a document template for the application for joining the party, I told A, you can use it to modify the module (modify the file structure), add your own content and use it, it is the architecture level authorization, I Tell B, you need to ensure that the structure of the file template remains unchanged, but you can add titles and content of level 3 and below, which is the kernel-level authorization. I tell C, you can only repost my articles, you can't change them, you can't Adding oil and vinegar is the use of hierarchical authorization.

Therefore, if Huawei obtains the architecture authorization and ip core authorization respectively, it means that it can create its own kernel architecture based on the ARM instruction set as needed, and can add various on-chip peripherals such as communication interfaces, display control interfaces, GPIO and so on, thus producing its own "processor chip".



doubt:

1. What is architecture?

Regarding the word architecture, there are instruction set architecture and kernel architecture. Obviously, these are two different things. Knowing that there is a detailed explanation of the computer architecture by the great god [4]:

         First, let's take a look at some of the explanations of Architecture in the Architecture Field Bible (Computer Architecture - A Quantitative Approach 5E): "Several years ago, the term computer architecture often referred only to instruction set design. Other aspects of computer design were called implementation, often insinuating that implementation is uninteresting or less challenging." It means that many years ago, the architecture mainly refers to the instruction set, that is, the design of the ISA, and other levels are called "implementation". At that time, the ISA was considered to be the most difficult to design. . The text then points out the error (outdated) of the previous view "We believe this view is incorrect. The architect's or designer's job is much more than instruction set design, and the technical hurdles in the other aspects of the project are likely more challenging than those encountered in instruction set design.", this passage means that the difficulty of architecture/design and implementation is far greater than that of ISA.

         Everyone should understand ISA. Anyone who knows CISC and RISC will understand ISA. We mainly focus on the "implementation" mentioned in the book, that is, implementation: "The implementation of a computer has two components: organization and hardware."; that is, the implementation mainly includes two parts: organizational structure and hardware.

         "What is organizational architecture," The term microarchitecture is also used instead of organization. For example, two processors with the same instruction set architectures but different organizations are the AMD Opteron and the Intel Core i7. Both processors implement the x86 instruction set, but they have very different pipeline and cache organizations." The organizational structure is the micro-architecture that we often advance in advance, and the article also cites intel and amd as examples for readers to deepen their understanding. AMD and Intel have the same X86 instruction set, but have different pipeline and cache structures. What is hardware, "Hardware refers to the specifics of a computer, including the detailed logic design and the packaging technology of the computer.", mainly involves detailed logic design and packaging technology and so on.

         So obviously, computer architecture = ISA + microarchitecture + hardware, "the word architecture covers all three aspects of computer design—instruction set architecture, organization or microarchitecture, and hardware." In fact, the entire process of chip design is based on this idea. Go, choose the ISA, and then start the logical design ( the most common ARM IP soft core is this level, which is the rtl code written by verilog/vhdl) , of course, if you want to do soc, the logic design process will be consistent with Verify the cross-coupling, and after the rtl code is frozen, the physical design will be carried out with great fanfare, and then the tape out will be taped out. (I don't know many things about the back end, just to give a rough idea, welcome to correct me).

         Then you can now compare the difference between Godson and HiSilicon (in fact, it is completely unnecessary, the focus is completely different, and neither is a market). Loongson chose a MIPS-compatible ISA (in which many custom instructions were added), and then started from the micro-architecture until it was tape-out. The tape-out is generally SMIC and STMicroelectronics (mostly at SMIC recently) , Therefore, aside from other factors, Godson's CPU IP CORE is made by itself (similar to the IP authorized by ARM). HiSilicon's chips did not start with CORE, but chose ARM's IP CORE, which basically started with SOC. To put it simply, one has CORE and one does not. Of course, starting from the SOC is a great thing, after all, the chip is not just a CORE. So there is no difference between the two, only the difference in technical route. Please don't be black for the blackness of some of the "big Vs" above. In order to praise whoever deliberately demeans whoever, leave Zhihu a pure land and an atmosphere without hostility. Not much to say about HiSilicon's chips, mainly for smartphones. Godson's chips are mainly for PC, but it is too difficult to break the WINTEL alliance on the desktop, so there is a long way to go. Of course, Godson is still very successful in AQ and JG. . If you are interested, you can pay attention to the latest Beidou chips, mainly Loongson's 1E and 1F. The aerospace-grade chips must pass a series of radiation resistance tests before they can go to the sky.

          The above is the answer of a great god, and Bert's comment below this answer is very important: Now the realization of the CPU design micro-architecture is the difficulty, don't ISA have it, who has seen it and will disclose the micro-architecture in detail Yes, the implementation of the micro-architecture directly affects performance, area and power consumption, all of which require various tradeoffs, which vary widely. No one will disclose the micro-architecture of their chips. With a fine-grained micro-architecture, RTL can be written directly . It is said that the ARM CPU is the one that writes the RTL synthesis, and other high-performance CPUs are said to be fully custom designed. The implementation of the micro-architecture mentioned here, I think, is the implementation of the ARM cortex-Ax, that is, the Scorpion, Krait, and Kryo developed by Qualcomm itself. The implementation of these things involves the design of the pipeline, the fetching of the pipeline, The width of decoding, etc., whether it is out of order, the degree of out-of-order, etc. [5]. For Apple's self-developed institutions, Swift (based on ARMv7), Cyclone, TYphoon, Twister, etc. are shown in the following figure [6][7]:


















It can be seen that the most obvious difference between these different self-developed architectures is the difference in pipeline parameters: the difference in emission width, the size of the reordering buffer, etc.

         Regarding Bert's comments, Diga Altman explained: They are all synthesized by writing RTL, including Intel, but how much of the design process from synthesis is generated by tools and how much is done by hand? This the difference. Not using RTL and EDA tools to cooperate with the current design scale is no longer possible, and the workload is no longer affordable, just like no one can write windows in pure assembly now.

           Regarding Bert's comments, Wayne Zhu added: IP cores are divided into three levels of behavior (Behavior), structure (Structure) and physical (Physical) design, corresponding to the description of functional behaviors are divided into three categories, namely soft core (Soft Core) IP Core), a solid core (Firm IP Core) that completes the structural description, and a hard core (Hard IP Core) that is based on a physical description and has been process-proven.

  • The soft core is the RTL code we are familiar with;
  • Solid core refers to the netlist verified by synthesis constraints and FPGA;
  • The hard core refers to the verified design layout.

ARM is still dominated by soft cores. Get the RTL directly. With RTL code there is an implementation of the microarchitecture. So, giving you RTL is equivalent to giving you the core of the ARM cortex-Ax. Therefore, the authorization of RTL should be the authorization of cores such as cortex-Ax. Although RTL is given to you, you cannot directly change the RTL of ARM and then turn it into your own micro-architecture, which is not allowed [3]. Given your RTL, all you can change is the size of the cache and the configuration of the core. If you want to make your own micro-architecture, you can only buy the instruction set license, and then do it slowly.

For the ARM cortex-Ax family of microarchitectures:

  

15+ stages out-of-order pipeline, 128-bit prefetch, 3-width decoding, can dispatch up to 5 micro-operations per clock cycle, satisfy up to 7 transmit queues, and enter 8 execution pipelines.


         A73 is very similar to A17, because the optimization pipeline of the sequential front end is much shorter, the prefetch stage is only 4 stages deep (A72 5 stages), and the entire pipeline depth is only 11-12 stages. Compared to the A17, it increases the overall maximum allocation rate from 4 micros to 6. The NEON emission sequence is still 2 micro-ops, but the integer part is doubled to 4. There are still two floating-point pipelines and one prefetch monitor, but the AGU part can perform load and store operations at the same time. The integer pipeline is divided into 2 complex ALUs, responsible for multiplication and division respectively.

         A73 still adheres to the concept of four cores, that is, each cluster can have 1-4 cores, and then use SCU units to interconnect each cluster. The second level cache is up to 8MB, which is equivalent to A17 and twice that of A72, but I believe that most chip manufacturers will choose 1-2MB.
         A15/57/72 also shoulders the important task of impacting industrial and large-scale server systems. A73 is simple and only aimed at the consumer market, which makes it a lot easier. For example, the AMBA5 CHI interface is removed, and only AMBA4 ACE is supported. Level Cache also no longer supports ECC.


2. What is Huawei's K3V2?

          K3V2 is an AP, not just a CPU. There are CPU/GPU and a bunch of controllers, USB/LCD, etc. in the AP. K3V2 is a 4-core A9 AP. At present, HiSilicon is definitely the largest AP manufacturer in China, but there is a big gap between the international manufacturers. The domestic ones should be Qualcomm and MTK, accounting for more than 80%. At present, they are still self-produced and sold. Of course, in addition to the CPU and baseband business, the baseband is also needed. CPU's. In fact, HiSilicon has been making CPUs for a long time. In the past, it was only for internal supply, or for baseband/data cards, which was not known to end consumers. In recent years, the development of major business (telecom servers) has been slow, the economic crisis, and there is no need for upgrades. Huawei wants to promote its own brand as a terminal and transform it, so as to gradually let consumers know that HiSilicon has actually OEM design for domestic and international manufacturers. Over the years, this can be considered an accumulation. After the business background is over, let’s talk about technology. Now making a CPU is not just about a set of instruction sets. In fact, it is very simple to design a set of instruction sets, which are included in university textbooks. As for whether someone uses it, that is another matter. Please, even if it is good Things are not necessarily used by people. ARM and MIPS are good examples. This thing involves the construction of the entire ecosystem, such as whether there are bigwigs in the industry who use your instruction set, and whether the surrounding facilities are available. You can't expect a rice farmer to give you a full-fledged meal, and there must be an industry cooperation. The so-called instruction set, that is, the architecture IP, now domestic companies can only rely on buying, basically ARM, after all, his industry chain is relatively complete. Fortunately, ARM has also launched the architecture authorization (the architecture authorization here can be considered as the authorization of the instruction set architecture), but this thing is expensive, but if you buy it, you can change the ARM architecture implementation (the architecture implementation here can be considered as It is the implementation of cortex-Ax), Qualcomm is a precedent, and we can do some innovation. When you buy the architecture, you are only a CPU. An AP needs to be equipped with many peripheral devices. GPU, LCD, and USB all need to be integrated. As for the integration, it depends on your product positioning and what functions you need. Buy all these things, you You can start modeling. The so-called modeling is to connect these things, verify the correctness of the function, and then use the FPGA to verify the hardware. If the FPGA is no problem, put the generated netlist file.Hand it over to Taipower, and SMIC will produce it. After the tape-out comes back, start ordering chips and transplant the operating system. Thanks to Google, android has completed more than 90% of the software work, and the software ecosystem has been built. Of course, if you want to use other operating systems, you have to Measure measure workload. The latter is to use this AP to match peripherals to generate terminal consumer products. At present, the popular AP manufacturers have their own unique secrets. Qualcomm is strong in communication. When making mobile phones, especially 3G mobile phones, you have to pay some money to integrate with the baseband if you cannot bypass his patents. MTK is strong in integration, cheap ecosystem, and fast shipping. The others are small fish and shrimp, which are well advertised and have their own selling points, but the amount is not large. When it comes to completely independent research and development of chips and revitalizing the national industry, I personally feel that it is of little significance now, and the trolls are just shouting slogans. It is completely possible to transform and innovate on the basis of others. On the one hand, enterprises need to make profits, and the probability of success is relatively high. On the other hand, reinventing the wheel does not contribute much to human beings. If you want to cheat the country’s money, it is another matter.


3. About ARM's money-making model:

         As the licensor of ARM, there are two parts of money that must be paid: up front license fee and royalty. There are many other charging items, such as software tools, technical support, etc., but these two are the main ones, accounting for about 33% and 50% of ARM's total revenue respectively.

        Up-front licensing fees generally range from as little as $1 million to as much as $10 million (maybe less or more), payable in one lump sum. How much depends on the sophistication of the licensed technology purchased, for example the ancient ARM11 is much cheaper than the latest Cortex-A57.

        The royalties are paid for every chip sold, usually 1-2% of the selling price. If the chip is sold to other companies or consumers, it is easy to calculate; if it is digested internally, it is determined according to the due market price.


4、MIPS

MIPS charging IP authorization is more expensive than instruction set authorization, and allows adding instructions, which makes the big guys design MIPS cores, add instructions, and release development tools by themselves . It is far more expensive than IP authorization and controls fragmentation. At that time, ARM also designed the best and cheapest USB debugging tool in the world with great vision, which attracted a group of code farmers, thus building a huge ARM open source software library [3]. For the relationship between MIPS and Imagination, please refer to [9]. The Power VR of MIPS+Imagination is actually a CPU+GPU combination similar to ARM.



refer to:

[1]  ARM introduction 2: authorization mode (processor optimization package/physical IP package authorization (POP) introduction, ARM charging method, only CPU and GPU are charged, bus and interface are free)

[2]  How does ARM beat MIPS in terms of CPU IP authorization?

[3]  One article to understand ARM company

[4]  Is Huawei HiSilicon completely independent research and development?


[5]  What is Qualcomm processor krait architecture

[6]  How about the A7 processor independently developed by Apple? data tells you

[7] From A4 to A10 Fusion processors, Apple tells you that it’s nothing special (below)

[8]  ARM new killer, Cortex-A73 architecture interpretation


[9]  Imagination's GPU is too tempting, both MIPS and Power VR are needed in China

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324646503&siteId=291194637