Let's learn some ARM microarchitecture two?

foreword

Recently, I saw the updated articles of the predecessors on the ARM architecture on the WeChat public account. Here, let’s track and learn the articles of the ARM architecture system.

It is recommended to read the previous article before reading this article:

【Learn some ARM microarchitecture together?

The entire content of this article comes from two predecessors:

Source: Comprehensive Semiconductor Industry (ID: ICVIEWS)

https://mp.weixin.qq.com/s?__biz=Mzg3ODU3Nzk3MQ==&mid=2247499113&idx=2&sn=2d0e6afc521d5d1d20b61f0c5e925bca&chksm=cf132d35f864a4233f857173ffd790a0374d1b0ac8a1b248b741050a3f8377e1b4d7dfc06c89&scene=132#wechat_redirect

The article comes from the Internet, and the copyright belongs to the original author. If there is any infringement, please contact to delete it.

The predecessors have updated a lot of articles about ARM. If you are interested, you can pay attention to learn a lot.

Highlighted new features in Armv9.2

Arm Introduces Cortex-X4 and Immortalis GPUs to Boost Mobile Performance and Efficiency

Arm has announced its latest CPU and GPU designs that will power future generations of smartphones, tablets, IoT devices and even some laptops. Arm licenses these designs to Qualcomm, MediaTek, Samsung, and countless other chipmakers to integrate into their own solutions. (As we all know, ARM is an IP manufacturer, but recently it seems that it plans to design and produce by itself to increase profits.)

Arm's presentation revolves around the exploding demand for high-performance mobile solutions and the increasing variety of form factors and workloads. More and more developers are targeting Arm devices, expanding beyond mobile to other areas such as data centers and automotive workloads.

Arm is a company that designs the CPU cores of smartphones, and every year it makes new iterations, which are then integrated into chip SOCs, such as the flagship Snapdragon and MediaTek Dimensity of the year. In 2023, new flagship cores were released: Cortex-X4 ultra-large core, Cortex-A720 performance large core and Cortex-A520 power consumption small core. These cores form the basis of the company's new Arm v9.2-compatible design and the company's Total Computing Solution 2023, or TCS23. Beyond that, we also saw a new DynamIQ shared unit and an updated Immortalis-G720 GPU. All three new cores are microarchitectural successors from last year, with a primary focus on introducing IPC and improving efficiency.

While Arm doesn't produce chips itself, it has developed a reference Total Computing Solution (TCS) platform that gives its customers a starting point for their own implementations. **TCS23 spans three classes of CPU cores: Cortex-X4, Cortex-A720, and Cortex-520. ** Each of these core designs is tailored for a different workload slice and can work together to form a complete system solution. Each is built on the Armv9.2 architecture, which offers several performance optimizations and security enhancements . The TCS23 platform also offers GPU designs, including new flagship Immortalis-G720, Mali-G720 and Mali-G620 options.

insert image description here

Arm claims improved performance and efficiency with these latest designs. The details of these benefits vary by solution and implementation, but at a high level, the new CPU cores can deliver over 20% energy savings at comparable performance to the previous generation, or within a power budget Higher performance can stay the same. Likewise, the Immortalis-G720 flagship GPU solution can deliver 15% more performance while reducing memory bandwidth usage by 40% by changing its rendering pipeline.

**Arm is working more closely than ever with foundries like TSMC to optimize designs for its leading process nodes. **A better understanding of process technology complexities and early development feedback helps Arm's customers bring their products to market faster. As part of this, Arm has implemented the industry's first Cortex-X4 core using TSMC's N3E process. (This is the moat of the design, and the tight coupling of the manufacturer)

"Our latest collaboration with Arm is a great example of how we can leverage TSMC's state-of-the-art process technology and powerful Armv9 architecture to enable new levels of performance and efficiency for our customers. We Will continue to work closely with Open Innovation Platform (OIP) ecosystem partners like Arm to drive CPU innovation and accelerate the development of artificial intelligence, 5G and HPC technologies. "

insert image description here
Prominent additions to Armv9.2 include Pointer Authentication (PAC) and Branch Target Identifier (BTI), as well as improvements to the hardware Memory Tag Extension (MTE) for significantly enhanced security.

insert image description here
Both PAC and BTI help reduce the footprint of exploitable code. Even if an attacker manages to escape the sandbox, PAC uses cryptographic signatures on memory addresses when calling functions so that it doesn't return to the wrong location due to link register overwrites, etc. Likewise, BTI restricts entry points to functions, preventing attackers from arbitrarily executing selected sections of code as part of an attack.

insert image description here

MTE helps prevent things like sandbox breaches to begin with by generating a tag when memory is allocated and then checking it on every load/store operation. Memory safety issues have been one of the fastest growing threat vectors. Arm cites an unnamed community application that claims MTE allows it to detect 90 percent of memory safety issues before release.

insert image description here
Armv9.2 also adds support for Scalable Vector Extensions Version 2 (SVE2). SVE2 is a Single Instruction Multiple Data (SIMD) instruction set extension for AArch64, which is a superset of SVE and Neon instructions. This is useful for highly parallel workloads such as image processing. Arm is focused on making SVE2's code generation as fast as Neon, if not faster, to encourage adoption. Additionally, developers can flag that they want both Neon and SVE2 versions of their code generated, which can then be selected at runtime. This allows legacy architectures to be supported through Neon, while next-generation devices can benefit from SVE2, all without requiring significant overhead or rewriting the code base.

insert image description here
In the given example using indirect time-of-flight capture of a 3D scene compared to Neon, Arm says SVE2 is 10% better at FP32 and 23% better at FP16. This is mainly due to the efficiency of SVE2's gather-scatter addressing instructions relative to Neon.

64-bit only supported – 64-bit only: “Mission accomplished”

insert image description here

**TCS23 finally completely abandoned support for AArch32,** while the previous generation Cortex-A710 Armv9 core still supports AArch32. The transition to 64-bit has been a decade-long effort that required coordination with many players, from Google and its hardware partners to various app store operators and app developers.

One of the biggest changes in Arm's overall computing solution this year is that it has completely transitioned to 64-bit, that is, the core only supports aarch64 and no longer supports aarch32. In fact, several cores released in 2022 already only support aarch64, but this year Arm's core only supports AArch64. This means that on your Android machine with the latest architecture, you can't run 32-bit applications. Note that Google itself has required that all apps updated since 2019 be uploaded as 64-bit binaries.

The 64-bit transition is considered "mission accomplished," as Arm puts it. The reason is that the Chinese app market has hindered the transformation of the entire industry, but the vast majority of apps in the Chinese app store are now also 64-bit compatible.

The reason for the delay is the lack of a homogeneous app ecosystem, which means that different app stores require developers with different standards. However, since Arm has partnered with multiple app stores in China and has repeatedly warned of a possible switch, those app stores have been encouraging developers to switch as well. insert image description here
Now seems to be the time for this transition to be fully realized, and in any case, we are still a few months away from implementing these arm cores in new chip SOCs.insert image description here

insert image description here
The 64-bit-only solution enhances security through its larger address space, where techniques such as address space randomization reduce the possibility of bad actors snooping on running workloads. 64-bit addresses also have free space that can be used for signed pointers, MTE, and other purposes.

insert image description here
Die area savings were not a significant consideration, Arm said. The ability to drop 32 bits only provides "single-digit percentage" reductions. However, in addition to the aforementioned security advantages, it also greatly reduces complexity, testing, and other requirements. These less obvious savings can be reinvested in overall system performance and efficiency gains beyond where 32-bit architectures have stagnated.

Arm CPU cores and compute clusters

Arm has been leveraging three-tier CPU solutions for generations, replacing its big.LITTLE arrangements with DynamIQ clusters.

The X-series and A700-series cores feature out-of-order processing, which allows them to work ahead when the chip is stalled waiting for memory .

Tasks scheduled to A500-series cores are generally not time-sensitive, so sequential processing is used to make sense of each operation.

insert image description here

The efficiency curves of these designs complement each other well to provide suitable coverage for various operating points.

The high-end X-series cores do the heaviest work , the mid-range A700-series cores focus on sustained performance , and the smaller A500-series cores put efficiency first for background tasks .

Cortex-X4 - Higher Performance and Higher Efficiency

A few years ago, Arm's X-series core was spun off from the A-series, the idea being that it was an oversized core. Typically, chipset manufacturers will only include one or two of these at best, since they are very power hungry, although they are also capable.

insert image description here

As you can see from the figure above, the Cortex-X4 is by far the most powerful Arm core, but these computing powers come at the cost of power consumption. The Cortex-X4 is similar to last year's X3, and as Arm says, it can even run at the same frequency as last year's core and use up to 40% less power. It is physically less than 10% larger and is the most efficient Cortex-X core ever built.

The new Cortex-X4 core increases the number of arithmetic logic units (ALUs) from 6 to 8, adds an additional branch unit (for a total of 3), adds an additional multiply-accumulator unit, and pipelines floating-point and square root operations.

As for the backend, there are many improvements as well. Load-store address generation has been increased from 3 instructions per cycle to 4 instructions as the load-store pipeline is adopted and split. There is also a double translation lookaside buffer in L1.

All of this combines to give Arm's Cortex-X4 some impressive performance gains. All in all, you can expect an average performance improvement of 15% over the Cortex-X4. In the power and performance curve shared by Arm, the X4 leads the X3 in both performance and power consumption. In other words, a 15% increase in performance comes with considerable power consumption. However, it's also worth mentioning that this isn't a like-for-like comparison**. The Cortex-X3 came with 1MB of L2 cache last year, which means that if manufacturers stick with the same L2 cache size this year, there won't necessarily be a 15% performance increase. **

One thing's for sure, though, and that's that if you're running the X4 at max speed, it's likely to be a major power hog. This year we may see some OEMs continue to do what they did last year and cap many of this year's chip SOCs out of the box. For example, OnePlus and Oppo both do this, and to get these energy efficiency gains while running at the same performance point as the X3, these companies may benefit from continuing to do so. We may not see a 15% performance improvement across the board, but we may see further improvements in the efficiency of chip SOCs next year. (Squeeze toothpaste???????????)

As for where these IPC improvements come from, the X4 has many front-end and back-end improvements. In these front-end improvements, a lot of work was put into rewriting and improving branch prediction, because incorrect branch prediction is costly in terms of performance. Arm also promises that the 2MB L2 cache size will yield higher performance, not so much in benchmarks but in actual use.

insert image description here

insert image description here

This generation of Cortex-X4 achieved double-digit IPC growth for the fourth consecutive year . The Cortex-X4 offers a 15% increase in single-threaded performance and is the most efficient X-series core Arm has ever designed.

A key improvement is the doubling of L2 cache scalability, up to 2MB per core . The extra cache reduces system memory calls and makes the engine run better. (Instruction Per Clock)

insert image description here

Cortex-X4 is further supported by a redesigned instruction fetch transport system. It's now a 10-wide core, delivering best-in-class bandwidth for high IPC workloads.

Branch prediction itself has been further refined compared to Cortex-X3 , notably reducing stalls in real workloads . The left side of the graph below depicts a less predictive workload, while the right side shows a more synthetic workload.

insert image description here
The number of ALUs has also been increased from 6 to 8. The MCQ reorder buffer was also expanded from 320x2 to 384x2, which allows more out-of-order instructions to be traced, and it now treats load-store flushing as a branch misprediction to speed things up.

insert image description here
Arm claims an average 13% speedup in IPC across a range of workloads, but real-world scenarios will see the biggest benefits. Synthetic benchmarks benefit less from frontend and cache changes, and therefore see less improvement.

Cortex-A720 - Balancing performance and power consumption

While Arm's X-series Cores are usually a bit crazy, the A-series cores are generally designed to balance power and performance. With the Cortex-A720, Arm promises a 20% increase in core efficiency, improving performance at the same power as last year's A715.

As for where the improvements to the A720 come from this year, most of them are on the front end. The pipeline is shortened by removing one cycle from the branch misprediction engine, and this single-cycle drop is said to result in a 1% increase in the benchmark. Benchmarks generally result in the fewest branch mispredictions, which means this may improve overall real-world performance by a more significant (but largely unmeasurable) amount.

insert image description here
In out-of-order cores, we see architectural improvements that help improve performance without compromising the core's footprint or efficiency. For starters, floating point division and square root operations are now pipelined, just like in the X4. There's also faster transfers from floating point, NEON, and SVE2 numbers to integers and other overall improvements to speed things up.

insert image description here
The Cortex-A720 is 20% more energy efficient than the previous generation Cortex-A715 and has a shorter and more efficient pipeline . On the front end, it removes a cycle from the branch misprediction pipeline, allowing faster recovery in unpredictable real-world workloads. In terms of power consumption, the A720 is largely in line with last year's model, but it offers slightly higher performance at the same power level. With the A720, like the X4, Arm seems to be more focused on emphasizing how it can get better performance from last year's power constraints than on the ability to keep adding those cores. (or the same performance with lower power consumption)

insert image description here

Alternatively, Arm offers area-optimized configurations of the Cortex-A720. At a die size matching the Cortex-A78 Armv8 core, the area-optimized Cortex-A720 can deliver 10% more performance and newer Armv9.2 features.

Cortex-A520-efficiency: twice the efficiency at the same power consumption point

insert image description here
Of course, when it comes to Arm's cores, it's not all about performance. The X-Series puts everything into raw computing power, the A7xx balances computing needs and power consumption, and the A5xx-Series focuses entirely on efficient processing. It's the lowest power-per-area Arm v9.2 core and builds on the same consolidated core architecture we've seen with the A510.

The Cortex-A520 is also trying to improve performance, but only if it improves efficiency. LITTLE cores are more important in cost-constrained devices and are the main driver of battery life during idle or low-intensity usage.

insert image description here
It retains the merged core architecture used by the Cortex-A510, which puts two cores in one complex, with a shared or private L2 cache pool (up to 512KB) and SIMD engines (SVE2/Neon). It incorporates the QARMA3 PAC algorithm to reduce overhead to less than 1%, enabling it to take advantage of the latest security features without performance degradation.

This merged-core architecture means that some resources can be shared between two cores, which can be combined into a "composite" .

The L2 cache, L2 translation lookaside buffer, and vector data paths are shared within this complex.

To be clear, this doesn't mean it has to be bundled into two cores, a single core complex can be assembled for optimal performance.

In fact, one of the TCS2023 core layouts they showed us for Arm involved a single X4 core, five A720 cores, and three A520 cores, meaning at least one of the A520 cores is isolated.

insert image description here
Interestingly, the Cortex-A520 reduces its ALU count from 3 to 2 in the previous generation Cortex-A510. Doing so alone would result in lower performance, but the power and area savings allow its engineers to improve performance in other ways.

insert image description here
This ultimately results in an 8% increase in performance with a slight reduction in power consumption. The Cortex-A520 operates at 22% lower power consumption than the Cortex-A510 while maintaining the same performance.

The A520 is an efficiency-first design, and like the other cores, Arm has primarily focused on improving efficiency at the same power point as the previous generation. This includes improving branch prediction while removing or shrinking certain performance features. As a result, this performance is recovered through higher efficiency. It's also interesting that Arm removed the third ALU in the A510, saving power in posting logic and forwarding results.

In real-world results, the A520 doesn't appear to be a big leap from its predecessors like the A720 and X4. Many of its features at the lower power intervals overlap with the A510 in the picture above, and it's only at the upper tiers of performance that we see efficiency gains. The performance and power differences between the two cores are promising, but it's unclear if we'll see any actual practical advantages when comparing the A520 and A510. After all, it's hard to really properly measure the performance and efficiency difference between the two in the real world.

insert image description here

DSU-120 computing cluster

insert image description here
The cores are connected together in a DSU-120 (DynamIQ Shared Unit) cluster, which Arm says offers better scalability for more cores and efficiency than previous revisions. Each DSU-120 cluster can accommodate up to 14 cores regardless of configuration, which is the customer's design goal. Each core in the cluster has its own dedicated L2 cache, and a shared cluster-wide (up to 32MB) L3 pool.

insert image description here
While a single DSU-120 cluster is more than sufficient for most mobile designs, customers do have the option to link multiple clusters together for higher core counts, all linked via the high-bandwidth CoreLink coherent interconnect. There's no practical upper limit to the number of clusters that can be linked together, but chips can only get so big. Regardless, we don't expect this to be a popular option for ARM-based solutions.

insert image description here
The DSU-120's logic, L3 cache, and snoop filters are divided into slices (up to 8), linked with a dual-bidirectional ring-based topology. This typically reduces latency by reducing hops, and allows for higher bandwidth.

insert image description here
The DSU-120 further improves system efficiency through various energy-saving modes. RAM retention puts the L3 cache and snoop filter into a low-power state that wakes up quickly while the logic remains active. Alternatively, slice logic power down shuts down the logic for each slice while the L3 cache and snoop filters remain active. These two modes can also be combined, but controlled independently.

insert image description here
Also, RAM powerdown can shut down half or all of each L3 cache pool, but doing so will dump the contents of the shut down regions. This saves even more power when the total cache capacity is not needed. Fragmentation completely shuts down slices (logic, L3 cache, and snoop filters) down to a single active slice.
insert image description here
Collectively, these power modes can reduce the DSU's power consumption by two-thirds during idle or light workload periods.

insert image description here
A DynamIQ Shared Unit or DSU is an L3 memory system that integrates one or more cores, control logic, and external interfaces to form a multi-core cluster. It's essentially Arm's fabric that allows all these cores to communicate with each other and share resources, so it's a pretty significant piece of the puzzle for any chipset maker looking to build a chip with Arm's core design.

Building on the DSU-110, Arm has made several improvements to the DSU-120 that will benefit the entire chip that contains it. For starters, each cluster now has up to 14 cores (up from 12) and supports up to 32MB of L3 cache. It also greatly improves efficiency in a few key areas, including in the case of cache misses, while also reducing power consumption.

In a way, Arm's DSU is the backbone of the TCS23, as it forms the basis of how these cores interact with each other and share data. Any improvements here will benefit the entire cluster, but it seems most of the changes are related to power consumption and efficiency.

insert image description here

Arm's fifth-generation graphics architecture

insert image description here
TCS23 is not limited to CPU complexes, but also incorporates Immortalis-G720, Mali-G720 and Mali-G620 GPU options. Arm has ditched the designation for its GPU architecture generation (like Valhall), opting to simply call it the fifth generation.

insert image description here
Arm's design goal is to allow more immersive gaming and real-time 3D applications, and to allow these experiences to run longer without throttling or having to run to a power outlet. In particular, developers are creating scenes with higher geometric complexity, employing more high dynamic range rendering, and memory system power is becoming a major contributor to thermal constraints.

insert image description here
This last point is the biggest focus for Arm. 5th generation GPU architecture reduces memory bandwidth usage by up to 40%. This is primarily achieved through larger buffers and the implementation of deferred vertex shading (DVS) in the rendering pipeline.

insert image description here
At a high level, a visible triangular scene is classified into regions called tiles for processing. Some triangles can span tiles, which complicates the issue and requires traditional upfront vertex shading pipelines to cache large amounts of data. However, triangles that are fully contained in the block, can perform minimal work upfront to flatten the perspective, but can discard the data and only start vertex shading in the deferred stage of the pipeline. (It's a bit interesting, I watched Gong Da's video before, and I felt a little bit)

insert image description here
DVS combines vertex and fragment shading, reduces error caching, and writes back memory only once, saving significant memory bandwidth.

insert image description here
Arm's 5th generation GPU architecture uses larger block sizes (64x64 vs. 32x32), which means more opportunities for Tiler to choose DVS. Also, increased scene complexity makes triangles smaller, so there is more opportunity to defer vertex shading and save memory bandwidth.

insert image description here
The new GPU architecture also brings other improvements across the engine. It can perform variable-rate shading at higher rates, perform faster work scheduling with additional work registers, and support more fixed-function throughput for graphics. The ray tracing unit (RTU) is now also on a power island, which means less leakage for most applications that don't use an RTU at all.

insert image description here
Three GPU models are configurable, but mainly determined by the number of cores. The Immortalis-G720 has 10 or more cores, the Mali-G720 includes 6 to 9 cores, and the Mali-G620 is suitable for designs with 5 or fewer cores. In addition to a 40 percent reduction in memory bandwidth, these designs deliver an average of 15 percent higher sustained and peak performance than the previous generation.

Android Dynamic Performance Framework

insert image description here
Arm brought on Scott Carbon-Ogden, senior program manager for Android gaming at Google, to discuss its integration with Google's Android Dynamic Performance Framework (ADPF), which it says allows it to understand and respond in real time to changing performance, thermal and user conditions. This covers the ADPF Hints API, ADPF Cooling API, Game Mode and Game State API, but the selection is focused on the first two.

insert image description here

The ADPF Hint API is designed to override the default Linux scheduler to avoid latency boost performance and avoid wasting power after a workload is over. (If Huawei is not sanctioned, do you have to invite Hongmeng, but it is not authorized, and it may not be necessary.)

insert image description here

The API allows an application to better inform the operating system about the target and actual CPU duration of a workload so that it can schedule it more efficiently. For the user, the end result could be a significant reduction in frame loss and some level of power savings.

insert image description here

The ADPF Thermal API provides more ways to scale performance than simply reduce frame rate. Instead, apps can respond with other options, such as adaptive resolution, adaptive decals, or adaptive LOD (level of detail), to tweak the user experience in a less jarring way.

insert image description here
Taking Candy Clash on the Pixel 6 as an example, the ADPF Thermal API increased the average FPS by 25%, which is good, but the consistency of the displayed frame rate is more important. Additionally, it results in an 18% reduction in CPU power, which can be reallocated to the GPU or reserved for longer gaming sessions.

Arm Total Computing Solutions in 2023

The Arm TCS23 framework provides a solid foundation for its customers to develop next-generation SoCs with higher power, performance and efficiency, and the company is already looking to the road ahead.

insert image description here
Arm shared this slideshow of its future roadmap. It depicts the TCS24, its Cortex-X4 successor code-named Blackhawk, powered by A7XX and A5XX-class Chaberton and Hayes cores, respectively.

We also saw Krake listed as the codename for its next-generation GPU. Arm said: "We have never been more committed to our CPU and GPU roadmap, and over the next few years we will invest heavily in key IP, such as Krake GPU and Blackhawk CPU, to provide computing and graphics performance for our Partners are asking.” The industry looks forward to seeing how Arm’s IP continues to evolve, especially as artificial intelligence and machine learning prove to be increasingly disruptive. (It can also be seen from the side that the importance of GPU is increasing day by day)

insert image description here

efficiency

Efficiency is the new goal
insert image description here

The industry seems to have changed for a while, but the main first impression I get from these cores is that efficiency is the name of the game now . While we were told how fast the X4 core was and how it was the fastest core the company had ever built, they were quick to notice the increased efficiency of running it at last year's peak performance. (Is this so that the life cycle of this product can continue to the next step, and you can squeeze the toothpaste slowly, anyway, there is no rival~~)

Overall, every performance gain comes down to how efficient that component is, and all changes to DSUs are more or less in terms of efficiency and power consumption . Performance is important, but it does feel like the industry as a whole is trying to improve the current state of computing, rather than dramatically improving performance year-over-year. (Performance: various aspects of power consumption, computing power)

We expect these cores to form the basis of the MediaTek Dimensity 9400 and Qualcomm Snapdragon 8 Gen 3, but the exact form remains to be seen. As mentioned earlier, Arm talks about using a 1+5+3 core layout in its own internal testing, but that doesn't mean that partners like MediaTek and Qualcomm will do the same.

AI made up a surprisingly small percentage of Arm's demos this year. The company has yet to announce a follow-up to its NPU, opting instead to let customers differentiate with their own solutions . The proliferation of generative and LLM AI in particular has led to shifts that make specialized hardware solutions less attractive today. Currently, many of these workloads are handled by traditional CPUs and GPUs at the edge, at least until the industry re-aligns to new standards. At the same time, larger workloads are being pushed to data centers where economies of scale now allow for greater efficiencies.

Arm's partners are very excited about the potential of these next-generation designs, especially in gaming and enabling new use cases. Dave Burke, vice president of engineering at Android, said: "Android, together with the developer community, is committed to bringing the power of computing to as many people as possible. We're excited to see how Arm's new hardware advancements are being adopted by vendors, security and performance Improvements can benefit the entire Android ecosystem."

"Arm's innovative 2023 IP, Cortex-X4 and Cortex-A720, and Immortalis G720 provide an excellent foundation for our next-generation Dimensity flagship 5G smartphone chip, which will deliver impressive performance and efficiency. Using Arm’s industry-leading technology, MediaTek Dimensity will enable users to do more than ever before at once, and unlock incredible new experiences, longer gaming sessions and outstanding battery life,” Added Dr. JC Hsu, Senior Vice President and General Manager of Wireless Communications Business Unit, MediaTek Corporation.
Products that can be developed from the new IP will probably hit the market sometime next year.

Guess you like

Origin blog.csdn.net/weixin_45264425/article/details/131040462