Forward a well written article--ARM introduction

ARM is headquartered in Cambridge and listed in London and New York. The price-earnings ratio is close to 50. This is a very exaggerated figure in the chip industry. You must know that Intel's price-earnings ratio is only a dozen or so. There are two reasons. The first is famous, the leader of the martial arts alliance in the mobile industry, and the absolute market share. The second sales volume is low, and the net profit is not high. In 2015, the license fee and royalties together were only 1.2 billion US dollars, and the sales volume could not keep up with the domestic Spreadtrum. Inside ARM, only the processor department is really profitable.


When it comes to departments, ARM is small and complete. Four thousand people across the company, distributed in the processor division , media division (graphics processors, video and display modules), system IP and software divisions (buses, interrupt controllers, MMUs, memory controllers, security modules, debug modules, systems Design, all kinds of basic software), physical design department (physical library and processor back-end design), software tools department (compiler, debugging tools, simulation models), IoT department (Bluetooth, IoT operating system), image signal department, technical support department, sales and marketing department, and a business development department (responsible for technology and cooperation with strategic customers, BAT, etc., it looks like a product of infighting with the marketing department, and there are also British Gongfu millennia-old). Finally, there is a mysterious architecture department , and the people in it are always principles, fellows, formulating instruction sets, and developing black technologies that are not very dark. R&D centers are located in Cambridge, Sofia, France, Finland, Austin, Israel, Hungary, India and Shanghai. Such a wide distribution is related to the acquisition history.


ARM's way of making money is simple and straightforward, licensing and royalties . Whoever wants to use ARM's IP, then according to the number of tapeouts, can pay for one tapeout, or buy unlimited tapeouts within three years, or buy out permanently. After the chips are mass-produced, a percentage of royalties will be charged according to the output. This is completely different from Qualcomm's royalties. ARM's global royalties are only a few hundred million dollars a year, which is only a fraction of Qualcomm's fraction.


As the most important processor design department, there are three series of products: A, R, and M, which overlap with the ARM name , which is very interesting. A refers to Application Processor, which used to refer to mobile processors and later expanded to server and network processors. R refers to Real Time, which is mainly used in areas with strong real-time requirements such as automobiles, control, solid-state drives, and modems. M refers to MCU. As you can see from the name, it is a microcontroller, and the field is wider. Last year, the global shipment of ARM MCU exceeded 6 billion pieces, but the unit price was very cheap.

The A series is the capital of ARM's reputation. Without the A series, ARM is just an advanced 8051 level . ARM divides the A series into three levels according to the power consumption per Hertz. The low-end representatives, such as A7 and the newly released A35/A32, run at hundreds of megahertz and consume tens of milliwatts of power, which is suitable for watches and ultra-low-end mobile phones. Mid-range representatives such as A53, as a small core among large and small cores, run at 1-2Ghz and consume less than 300 mW. Of course, some people use it as a large core, with a power consumption of more than 500 mW. Going up is more abundant, A57 is a relatively poor work, and then A72, A73 have achieved the expected purpose, running above 2Ghz, the power consumption of about 500 megawatts. In ARM's plan, the performance per Hertz will be improved by 20% every year. At the end of 2018, a processor with single-core performance to compete with Intel x86 will be released. The node is on 7nm or even 5nm. I wonder if it can be achieved. I think it's a bit overhanging, the highest frequency on TSMC's 7nm is not much different from 16nm, it's amazing to press it to 3Ghz, and x86 is now approaching 4Ghz. Even if ARM has a 50% increase in performance per Hertz, it can't keep up, not to mention that Intel is slowly improving. At present, it seems that by increasing the launch width, placing the second-level cache in the core, optimizing the first-level cache hit, and using the data cache vipt, performance can still be improved on the premise of linearly increasing power consumption. However, even the most high-end processors designed by ARM take mobile phones as the application target, so the power consumption will not be very aggressive, and a single core and one watt will not be reached. Having said that, the licensing fee earned by the small core is actually higher. You can see that there are so many A53 processors running on the market, not only mobile phones, tablets, servers, and networks.


As the name suggests, the R series cores are all designed to be real-time. Some people may have a misunderstanding of real-time performance, thinking that real-time performance means high performance and fast response, which is incomplete. Hard real-time means that any operation and processing is completed in a relatively short period of time. According to this concept, virtual addresses are not suitable, because the virtual-to-real page lookup table may not match continuously, resulting in unpredictable memory access times. Even the real address access memory in interrupt processing is uncontrollable, because there may be access conflicts between multiple master devices, and the dedicated on-chip memory bound to the core must be used to ensure that the interrupt must be processed within dozens of clock cycles. Other measures, such as adding checksums to all internal buses and caches, and adopting dual-core redundant processing, are features of the R series. Automotive and automatic control are typical applications of the R series. The rest of the storage controller, 3/4G modem, etc., use the processor to do protocol control, and there is no problem of determining the response time, as long as the average delay meets the requirements. So it's no surprise that some people use the A-Series for this type of application. The trend of the R series is to realize virtualization, which is a requirement put forward by automotive electronics companies. At first glance, it seems strange, but ARM's explanation is that if there was virtualization first, then there would be no trustzone. Under the real address, the virtualization efficiency is higher, because the virtualization of the A series may need to check the page table 20 times to find the final real address, while the R series can use 3-6 times, and the page table can also be placed in the on-chip memory, Then there is no uncontrollable delay. So security on the R series can be achieved through real address virtualization.


The application of the M series is too wide. At present, the popular smart hardware and the Internet of Things basically use it . Last year, it shipped 6 billion pieces, but ARM estimated that it did not even receive 100 million US dollars in royalties. It is characterized by small chip area and extremely low power consumption. A well-made battery board can be used for several years. Of course, there are also wonderful MCUs such as M7, which have a computing power exceeding A7 or even A53, and have their own cache. Currently, they are used in places with high computing power and real-time requirements such as drone control. The trend on MCUs is also security, a fundamental need for IoT. Of course, since the chip area is inherently small, it is not suitable for virtualization, and it is more realistic to use the simplified version of the old trustzone.


In addition to the processor RTL license, ARM can also license the instruction set . This topic has always been talked about. Current customers include Apple, Qualcomm, Broadcom, Cavium, HiSilicon, Spreadtrum and more. ARM actually regards instruction set authorized customers as a potential threat, so not only is the price high (tens of millions of dollars is not uncommon), but there are also time and industry constraints, and they have to be re-purchased in three or five years. In addition, customers not only need to buy the instruction set, but also the instruction set verification model, right? Otherwise, it may not be correct to write it yourself. It's actually not that difficult to make a processor now, the microarchitecture is just like that, and 90% of the time is spent on verification. The difficulty is to achieve commercialization, which requires long-term accumulation, continuous verification, and continuous fine-tuning. Designs are also different for different performance and power goals. There is a shortcut to change the RTL of ARM, but this is not allowed in the contract.


From the market point of view, mobile phone and tablet ARM basically have no competition. Intel once wanted to enter the tablet by subsidizing money, but now it has little action with the change of strategy. It also cooperated with rockchip to authorize the hard core and let RK produce it, and now it has disappeared. The only thing that is still insisting is to invest and cooperate with Spreadtrum, authorize hard cores, provide factories, and try to let Spreadtrum spread the x86 mobile phone chip market at a low price. In fact, this is a good idea. Intel's own chips are too expensive, and only Chinese companies can do it to reduce costs. However, with the change of Intel's mobile strategy, the future is really uncertain, which shows the power of ecology. It is said that when ARM first provided mobile phone chip design to Nokia and Texas Instruments, it did not think of any ecological strategy at all.


In the server market, the situation is completely reversed for ARM and Intel. ARM suffers from the lack of real chips for ecological software optimization, and chip manufacturers also suffer from the lack of mature software like mobile phones, and everything has to be done by themselves. Thinking about how Intel used to translate instructions at the binary file level in order to break into the mobile phone market, the problems ARM faces today are not much better. Unlike mobile phones, servers and networks can be restarted at random. Even if the chip is fine, the software is not transplanted well, or the software stability of each ARM instruction set chip is inconsistent, then in the Internet production environment, no one dares to do so. heavily used. Therefore, the ecology in this market is more difficult to break, and it can only be a little bit of cabbage. Fortunately, the enterprise market is different from the consumer market. The decision-making power rests with the enterprise users. The pack of wolves moves forward one after another, focusing on low power consumption and low cost. There will always be a part of the market share.


Other smaller markets, network processors, have a global total of $2 billion, and ARM is also the general trend. MIPS and PowerPC, which were once dominant, will only appear on some low-end network devices in the future. Another application is the 2/3/4G modem. This market is actually very large, and the mobile phone cannot do without it. The R series is used a lot here, but as a protocol control, the A series can also be used, and some people do. Another is the storage controller, including mechanical and solid-state drive control, MIPS and tensilica take up a little space, and there is a rising star, that is Synopsys' ARC processor. With EDA software tying the processor, it is really attractive in the market where ARM has no ecological advantage. As one of the three major blocks of computing, storage, and network in the data center, storage can be combined with the network, integrating Ethernet and ONFI interfaces on a single chip, running Linux and its open source software, Ceph, etc. The entire network node of the NIC plus SSD array is replaced. Due to the single software, the optimization is relatively easy, coupled with high energy efficiency and low cost, it may be able to open up the server and data center market from this perspective.


Graphics processors actually represent the future more than processors. The main difference between them is that a CPU instruction only corresponds to a limited one or several data accesses, and then there are a large number of relatively random jumps and data reads between instructions. A GPU instruction may require hundreds or thousands of data reads, in order. Sequential access is a great thing for accessing memory, see the previous article for details. Desktop GPUs take advantage of high bandwidth and maximize sequential access. But it does not store intermediate results in the cache, but writes them to memory. ARM's mobile GPU is the so-called tile based GPU. After reading a certain amount of data, it will operate in the cache. This saves a lot of bandwidth and looks more power efficient.


ARM's competitors in GPUs are IMG and Qualcomm. But Qualcomm only uses it on its own chips, and IMG is basically only sold on Apple. If a chip company wants to make mobile phone chips, the only options for GPU are ARM and IMG, and the rest of vivante and the like are pits. Both HiSilicon and Freescale have suffered losses. If you use Atom or MIPS as the CPU, it is best not to choose ARM MALI, otherwise no one supports compatibility issues. Certain commands of the GPU require CPU instructions to be accelerated. Using CPUs and GPUs of hostile camps, various security, content protection, hardware consistency, and heterogeneous computing built on top of it are not easy to do well. In short, ARM's GPU is technically inferior to Qualcomm and IMG, but with the advantage of CPU, it accounts for 40% of the Android market. Moreover, now only Spreadtrum is in use in IMG's Chinese market. Ziguang bought 3% of IMG's shares with good intentions.


Next is the bus. ARM's bus is divided into several categories , one is NIC , there is no fixed topology, and it is interconnected through a simple crossbar, which is suitable for simple scenarios. One is CCI , which is a fixed topology and is also a crossbar structure, which supports consistency and is suitable for a small number of processors. Further down is the CCN, a ring structure, which is connected to a ring through a fixed cross point, with a large delay but a higher frequency, suitable for more than 16 processors. Then came the CMN , a mesh structure, which is also a fixed cross-point, forming an NxN network that supports more CPU interconnections. Another is NoC , the node is a small router with fewer connections and higher frequency. There is no fixed topology, and any number of devices can be connected.


In so many buses, CCI is an important point . The latest CCI550 design is so perfect that ARM has no follow-up product plans. It allows multiple CPUs and GPUs to be interconnected and maintains hardware coherency at a fraction of the cost. On the basis of CCI550, secure payment, video content protection (the premise of watching genuine Hollywood blockbusters), heterogeneous computing, and augmented reality can all be efficiently implemented, and mobile phone applications will be greatly enriched. Now several mobile phone chip companies in the front line will include this bus in their plans launched by the end of 2016. However, to really run the application smoothly, it still takes some time to run in.


What are the disadvantages of CCI? The biggest disadvantage is that it cannot support more CPUs and GPUs . Ring network CNN is a transitional solution, while mesh network CMN is a more mature design. It takes into account the access locality, supports new efficient atomic operations, supports stashing commonly used by network processors, and supports intelligent routing. Due to running at a higher frequency (2Ghz@16nm), although the route accessed becomes longer, the average latency is shortened. At present, CMN does not support chip interconnection, and cannot form multi-channel servers like x86, which needs to be developed. The protocol used as the bus interconnection between chips has just been decided, called CCIX, and I don't know what the future holds.


CCI/CCN/CMN all support consistency. In many cases, it is necessary to connect various devices more flexibly, and consider different data widths, interface protocols, power/clock/voltage domains, and wiring. These are combined in a network, but hardware consistency is not required. Then you can With NoC and NIC. NoC is a network characterized by routing and packet forwarding, which can be expanded freely. The NIC is simpler, there is no routing, only simple switching, scheduling, and interleaving capabilities are very limited. It is good for a simple SoC structure, but if it is complex, it needs to be a NoC. Recently, NoC has also introduced consistency support, using a structure similar to the multi-layer Snoop filter to challenge CCI/CMN. Let's wait and see how it turns out.


The rest of the system IP is gradually developing with the enrichment of mobile phone applications. Virtualization, security, and mobile computing are all future trends. It is worth mentioning that ARM's memory controller, after long-term research on system transmission, ARM has continuously optimized its scheduling algorithm, which can increase the memory bandwidth utilization to more than 90%. In many complex scenarios, such as video recording and playback, when CPU, GPU, video, display, and ISP work simultaneously, system memory access will face great pressure. A good scheduling algorithm at this point can keep memory utilization as close to the theoretical upper limit as possible. However, the shortcomings of the ARM memory controller are also obvious. It must work with someone else's DDR PHY, which has potential compatibility problems. While other companies' PHYs are very valuable, the controllers are almost free. However, with the increasing complexity of mobile phone chips, the problem of memory bandwidth utilization will become more and more serious, and this bottleneck must be solved.


By integrating CPU/GPU/system IP, many applications can be developed. One is secure payments. Secure payment can be implemented with ARM's trustzone, but this is a system solution. All processors, buses, memory and peripherals, and operating systems need to be modified accordingly. It is not a matter of using a certain IP in the system. Done. Another application is content protection. If Chinese users want to watch the latest genuine Hollywood blockbusters on their mobile phones, Hollywood will require DRM and digital copyright protection on all playback devices. This is also achieved by applying trustzone to the entire chip. I reckon we'll see soon in the next few years.


There are other applications, such as VR/AR/CV that are currently very popular. In addition to GPU performance, VR on mobile phones needs to support several features in software, such as front buffer rendering and multiple view. For GPGPU, changing the driver can do it, which is why GPGPU is more flexible than fixed pipelines. The rest of the hardware changes will be on the display, making the cache update to the display faster. Of course, for the 20ms delay, the upper-level software changes are also required, so I won't say more if I'm not familiar with it.


The more practical applications on mobile phones are actually in AR and CV. No matter what algorithm or API is used, it can be implemented with Neon or GPGPU or even DSP, and what needs to be done in hardware is to make the two-way hardware consistency of GPU, CPU and DSP well, avoid copying and refreshing, and let them Data interaction between them is more efficient. I estimate that in the next two years, there will be applications and tests involving computer vision or machine learning, and then the benchmark will be compared.


Another important department of ARM is the physical design department. It has two tasks, one is to cooperate with TSMC, UMC, SMIC and other companies to get the physical development kit of their latest process, and develop its own back-end library on top of it. The second is to have a back-end library, and to cooperate with the CPU/GPU design department to design the front-end and back-end together to optimize the frequency, area and power consumption. Then you can take it out and sell it for money. When promoting, ARM's slogan is PPA (performance, Power, Area), Synopsys's slogan is Service, and TSMC's slogan is Free. In fact, this represents three different promotion strategies. Going back to ARM's back-end products, although the front-end and back-end collaborative design slogan is very loud, sometimes the data that comes out may not be comparable to the back-end made by the chip manufacturers themselves, so it can only be a matter of opinion.


Other divisions of ARM, such as those that sell emulators and Keil, are very famous, but their turnover is very small. They also do IP models and compilers, which are a little weaker than other departments. There is also a department doing IoT and Bluetooth, and there is an IoT operating system mBed promoting. However, the concept of the Internet of Things is too broad. In the traditional sense, an IoT chip is an MCU with sensors and connections (WiFi, Bluetooth, Beidou/GPS, LTE). The focus is on the application rather than the processor. It is difficult to make an ecological or big news. . So the presence of this department is not strong. I think that doing a good job of Bluetooth will not help improve the system performance and ecology, and even I suspect that this department is an internal high-level entrepreneur who went out to start a business and then repurchased it.


When it comes to the presence of ARM, it is necessary to talk about the market sector of ARM. There are four types of market positions. The first is the product market , which is closely related to the product. When the CPU and GPU are released, they are on the scene to collect demand and help R&D define the next generation of products. The second is the regional market , North America, Europe, and China. If you have nothing to do, you can go to various chip companies, keep an eye on their latest developments, see if you can sell something, and give product market demand and feedback at the same time. The third category is the application market , for a certain type of new application, such as virtual reality, automobile, mobile, server and so on. They go not only to chip companies, but also to downstream companies in the industry to see what can be tapped at the application layer. The fourth category is the strategic customer market , mainly watching emerging Internet companies and seeing what their trends are. Not necessarily locked in a certain industry and application. For example, Ali is developing in server, mobile operating system, payment, multimedia and other fields at the same time, so ARM has to see where it can cooperate. In addition to these four categories, there is actually a government cooperation market , but this kind of thing is usually GM and VP to come forward to find out the capital and policy trends of the central government, local governments and various departments to see if they can get involved.

As a company that claims to rely on ecology to make money, the quality of the market personnel is naturally good. Just pick one out, give a report on the stage, give a speech in English, and analyze the industry dynamics. Naturally, it is no problem. Due to the large platform and wide field of vision, the way out is usually good, and there are often job-hopping upgrades or simply starting their own businesses.


However, I am still not optimistic about ARM's profit model. Its current model has come to an end, so what if the mobile computing, server, VR, IoT markets come up? What if the chips in China and India are also up? As an IP company, doubling the money you can make is too much. It is far from being able to compare with chip companies, let alone downstream Internet companies.


What is the future of ARM? The first is ecology . The current popular technologies, big data, artificial intelligence, virtual reality, autonomous driving, etc., are nothing more than two words, computing. And computing is inseparable from the chip. Whether it is CPU, GPU, DSP, or neural network chip, they all run around these two words. ARM must firmly grasp the technology trend and integrate general computing into it, which is the foundation of everything. But after the first step, it doesn't make money. It is not IP licensing that can make big money, but from investment, from those influences that cannot be seen from financial reports. With influence, what kind of government cooperation, leading investment, capital operation, can it be possible to get a share of ARM in turn. From this point of view, not producing chips, not manufacturing systems, and open source software are unparalleled advantages. ARM China has made frequent moves in the past two years. The joint establishment of Anchuang with the demon stock Zhongke Thunder is just a small test, while the establishment of an investment company with Hopu is the beginning of a red story.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325979360&siteId=291194637