What does the number of cores and threads of a cpu mean?

Let me talk about the meaning of i7-12800H 14 cores and 20 threads

answer:

The hyper-threading function is simply explained first: it enables a large core to have the ability to process two threads at the same time.

14 cores are large and small core technology, 6 large cores, 8 small cores, small cores have no hyper-threading function, and their performance is weaker than that of large cores

That is to say, there are 6 large cores, each of which can process 2 threads at the same time, and each small core can only process 1 thread at the same time. A total of 20 threads can be processed at the same time

enter text

This text is an excellent article I picked up from Zhihu, and the link to the article is at the end of this article.

The CPU is the soul of a computer and determines the overall performance of the computer. Today's mainstream CPUs are all multi-core, and some use Hyper-threading technology (Hyper-threading, HT for short). Multi-core may be easier to understand, I believe many players can tell why. But what is hyperthreading, what is its practical significance, and what is the difference between turning on and turning off HT on a CPU that supports hyperthreading? There may not be too many people who can explain it clearly. To this end, I specially opened this post to introduce dual-core, hyper-threading technology. This post combines my usual work accumulation, communication experience with the manufacturer (Intel), and private knowledge as a DIY player, and strives to be the most authoritative, accurate, and easy to understand. I hope to use a few Simple examples allow you to quickly reach the level of understanding of hardware experts.

But let me tell you in advance,
1) This is a forum post, not a paper publication, and some knowledge points can really only be clicked.
2) Some can only be as accurate as possible. In order to ensure easy understanding, it may not reach the accuracy of academic level.
3) This post emphasizes knowledge and understanding. In reality, whether to spend six to seven hundred to buy an i3, or more than one thousand to buy an i5, this depends on the specific analysis of the specific situation, and there is no fixed answer.
4) If it is a local tyrant, only the word 'cool' is pictured. Do not seek cost-effective, only the most expensive. It is recommended not to read this post, because all theories cannot explain why an i7 with 4 cores and 8 threads is needed to hang QQ.

I hope that after reading this article, you will no longer be troubled by choosing U for installation! ! ! ! !

Experienced players should know the following five most common Intel consumer-grade CPUs. They are consumer-grade to distinguish them from enterprise-grade processors Xeon (Zhiqiang): - Celeron is dual-core and does not support hyperthreading -

Beginners
- Pentium is dual-core, does not support hyper-threading- Low-end gamers-
i3 is dual-core, supports hyper-threading- Mid-end gamers-
i5 is 4-core, does not support hyper-threading- Mid-to-high-end gamers-
i7 is 4-core, supports hyper-threading- high-end players

Some low-end CPUs of Zhiqiang can also be used by ordinary players, such as
- E3 has 4 cores and supports hyperthreading - high-end players

Of course, the perverted i7 Extreme can reach 6 cores and 12 threads, and 8 cores and 16 threads, but generally Bought by enthusiasts, not common among ordinary players.
For some entry-level E3s, the basic solution is to use the i7, such as the highly respected E3 1231v3. This U is very cost-effective. In fact, it is the i7 that has gone to the integrated display and cannot be manually overclocked. But the price is much cheaper, the so-called price of i5, the performance of i7.

CPU architecture

To talk about hyperthreading and multi-core, you have to talk about the architecture and logic of the CPU. There are too many irrelevant technical details, which are omitted here. Let's focus on two related modules in the CPU:
1) Processing Unit (operation processing unit), referred to as PU
2) Architectural State (architecture state unit), referred to as AS
PU generally performs operations, such as arithmetic operations addition, subtraction, multiplication and division. AS performs some logic and scheduling operations, such as controlling memory access and so on.

Single-core CPU (start with a simple one)
Generally, there will be one PU and one AS on a traditional CPU.

For example: a small restaurant (single-core CPU), husband and wife shop, the boss and the chef cook in the kitchen, the proprietress and the waiter take orders. No, a customer came. First of all, he walked to the cashier of the proprietress, read the menu and prepared to order. After about 5 minutes, the customer finished ordering a rice bowl. The proprietress copied the order and handed it to her husband in the back kitchen. My husband started cooking. In this example, the proprietress can be understood as AS, and the boss/chef can be understood as PU (doing practical things).

multi-core CPU
The multi-core mentioned here refers to multiple physical cores, such as the dual-core of i3 and the quad-core of i5. Under this architecture, each physical core has a PU and an AS. so. For i3, there are a total of two PUs and two ASs. For i5, there are a total of 4 PUs and 4 ASs.

For example: Liezi in the small restaurant above may still be busy with 5 or 6 guests. But imagine that 16 guests come to him at once, and the team is estimated to be lined up on the street. If I tell you again, 16 new customers come to order every 10 minutes. . . It's over. The business probably won't go on - the boss and the proprietress are too busy.
At this time, we need a larger unit canteen (multi-core CPU). There are 4 waiters and 4 chefs. 4 waiters order at the same time, and 4 chefs start frying at the same time (waiter No. 1 places an order for Chef No. 1, waiter No. 2 places an order for Chef No. 2...and so on). In this way, compared with a proprietress and a queue of guests in a small restaurant, there are 4 queues here, and the efficiency is suddenly 4 times higher than that of a small restaurant. 16 guests are divided into 4 queues on average, and there are only 4 guests in each queue. Is the situation much better?

This should be relatively easy to understand.

The highlight of Hyper-Threading Technology (HT)
is here. What is Hyper-Threading? Is he the multithreading we usually say?
Hyperthreading (HT) is not what we usually call multithreading. We generally refer to multi-threading (multi-threading) refers to the program, simply put it is 'soft', code level. Hyperthreading generally refers to the hardware architecture, which is 'hard' : the 'logic core' simulated by adjusting AS .
To put it simply, hyperthreading is a physical core with two ASs and one PU. Two ASs share a PU. Why do you do this, see the following example:

Metaphor : In the canteen of the unit just now, there are 4 waiters, 4 chefs, and 4 queues. Will there be an efficiency problem?
have!
Imagine that when every customer has a menu, can you guarantee that every customer will place an order after looking at it twice? Some customers will inevitably dawdle, ask questions, and order a dish for 15 minutes. And it takes only 10 minutes for a chef to fry a dish on average. What about the remaining 5 minutes? The chef is idle in the kitchen, drinking tea and reading newspapers. All the time was wasted by the guest-waiter ordering.

Is there a workaround for that? I think everyone should be able to guess it
---  increase the waiter !

At this time, we add one more waiter to each chef, from one waiter to two waiters (AS), waiter 1A and waiter 1B open two queues, and at the same time give a chef (PU) place an order. In this way, when waiter 1A’s customer has not finished placing the order within 15 minutes, 1B’s customer’s order is likely to be placed in 3 minutes and sent to the chef for frying (PU), so that the chef will not stand The kitchen is foolishly waiting for orders from 1A customers. In this way, the chef's labor is squeezed out to the maximum extent (the chef is estimated to be scolding), and for the CPU, the CPU usage is maximized and the CPU's (IDLE) idle time is reduced. Sometimes, you really can't blame the chef (PU) for not working hard, but your waiter (AS) for calling Shan Taimoji.
 



In the image below, orange and blue indicate that Chef (PU/CPU) is working, and white grids indicate that Chef (PU) is idle. Picture A is a single core without hyperthreading, picture B is a dual-core without hyperthreading, and picture C is a single core with hyperthreading enabled. It can be clearly seen that CPU usage does not increase when increasing from single core to dual core (without hyperthreading). After using hyperthreading, the overall CPU usage has increased, although it is only one core.



The picture on the left is a single core with hyperthreading, and the picture on the right is a dual core without hyperthreading. Can you see the difference?
 



Now let's look at the related issues of multi-core and hyper-threading in practice:
1) Is i3 dual-core 4-thread and i5 4-core 4-thread the same thing?
First of all, let’s talk about i3, i3 is a dual-core, after HT is turned on, it becomes 4 logical cores (4 threads). I don't know about the latest Win10, but in Win7 the logical core is displayed as a physical core, just like i5. Are i3 and i5 the same thing? If you think it is the same thing, then all the stuff I wrote above is for nothing.
i3 has 4 waiters and two cooks, i5 has 4 waiters and 4 cooks, do you think it is the same? ? ? ?

2) Is the i5 with 4 cores and 4 threads the same as the i7 with HT enabled (4 cores and 8 threads)?
i5 has 4 waiters and 4 cooks. If the i7 is powered on with HT, there are 8 waiters and 4 cooks. Of course, from the perspective of CPU utilization, especially running multi-process/thread programs, it is better to turn on the i7 with HT.

3) Is the i5 with 4 cores and 4 threads the same as the i7 with HT turned off (4 cores and 4 threads)?
i5 has 4 waiters and 4 cooks. If i7 turns off HT, there are 4 waiters and 4 cooks. At first glance, it looks almost the same, at least in terms of the number of chefs (PU) and waiters (AS). However, the single-core processing capability of i7 is slightly stronger than that of i5, which means that the cook of i7 is a super cook, and the cook of i5 is a first-class cook. So in fact, there is still a gap between i5 and i7, but theoretically speaking, the gap is not particularly large.

Summary: In theory, the gap between i3 and i5 is quite large. The difference between i5 and i7 is mainly due to the quality of the cook (PU) and the 4 extra waiters. In fact, the gap is not as big as the gap between i5-i3.

4) What are the advantages of turning on HT for the same CPU, such as i7:
Enhanced parallelism: The ability to handle multiple processes/threads is enhanced, which is more obvious for games that support multi-threading.
Increased CPU utilization:In general theory, the overall performance is improved by almost 20%-30%. From this point of view, i3 has turned on hyper-threading, which has improved the overall level by 20%-30%. However, does this mean that it can be tied with i5? ? ? If this is true, don't sell the i5 either. Two master chefs (i3), not that I can top up 4 master chefs (i5) with a whip. . . .

5) What are the disadvantages of using HT
single-core performance drop:
generally between 5% and 15%, mainly manifested in running single-threaded programs. The additional cost of two ASs is greater than the cost of one AS.
Metaphor: Only one guest comes to order, and chef No. 1 is designated, but you two waiters stand there, and this guest may think about it for a while. , Am I looking for waiter 1A, or waiter 1B? ? Thinking about it this way, half a minute passed. . . Is it not as simple as having only one waiter?
So in reality, when our supercomputing system tests running scores, we usually turn off HT, because we pursue extreme performance. Now the latest CPU can achieve 5%-15% performance loss, and the old hyper-threaded CPU, such as the old Pentium 4/Xeon 10 years ago, I have seen single-core performance exceed 50% performance loss, The additional overhead of starting HT is enormous. -Increased electricity bills

general power consumption increased by an average of 30%. The 4 extra waiters you hired don't need to be paid? ? ? - In the case of a particularly large number of cores , such as a dual-socket server, congestion is prone to occur . simile


: Imagine a huge cafeteria with 56 waiters (dual CPUs, 28 cores, 56 threads Xeon E5 series CPUs), hundreds of people come, will it be messed up? When you first entered the cafeteria, you didn't know which line to line up (generally, the decision on which line to line up is determined by the operating system). (Under the arrangement of the operating system) For a guest, check the 56 queues one by one to see which queue has the fewest guests. . . .
What I want to ask is, in reality, if you go to the cafeteria for dinner, assuming there are 56 teams, will you check one by one to find the team with the least number of people, and then make a decision? It is estimated that you have checked 56 teams, and 15 minutes have passed, and your friends have finished their meals. At this time, is it easier for you to reduce the team to 28 teams? (Of course, the 28 teams are still tiring enough)

- Poor support for old systems
For example, the old Win2008 and Win2000 have poor support for hyper-threading.
Analogy: If the cafeteria is relatively empty, no one is there. At this time, two customers A and B came to order food, but they ran to the two waiters 1A and 1B of the same chef to queue up (usually this is a good thing done by the operating system), you can find something wrong ?
The correct approach should be that A goes to Chef No. 1 (physical core No. 1), and B goes to Chef No. 2 (physical core No. 2). You let A and B squeeze into Chef No. 1, and Chef No. 2, No. 3, and Chef No. 4 have nothing to do, and they are idle to death. Does it make sense?
In fact, the problem is that the operating system cannot distinguish between physical cores and logical cores. Seeing that there are two waiters and two queues, I thought there were two chefs, so I sent customers A and B to 1A and 1B respectively to queue up. I didn't know the actual situation of the back kitchen - how many chefs there are? .

Back to reality, what kind of CPU do I need?
Here, I discuss by situation.

1) Internet access, QQ chat, simple office use (such as Office document processing), elderly machine
Celeron actually does. Celeron has 2 cores and 2 threads. In fact, compared with the i3 with 2 cores and 4 threads, when dealing with such applications, regardless of the main frequency and cache difference, the advantages of i3 cannot be fully utilized. Note that the price of the i3 is almost 3 times that of the Celeron.
The other is the Pentium, which is actually a Celeron with a slightly higher frequency and a slightly larger cache. The same is 2 cores and 2 threads, the performance is only a little higher than that of the Celeron, but the price is almost the same as 1.5 Celeron. Personally, I don't think it's interesting. For the extra money, it's better to buy a more advanced keyboard, mouse, and monitor. At least, the experience of using it is real.

2) Lightweight games, flat graphics workers (such as PS)
i3 is actually quite suitable. Small games, and some web games, PS and so on, although they are multi-threaded programs (such as PS), in fact, the burden on the CPU will not be particularly heavy. Instead, the bottleneck may be disk I/O speed, etc. Therefore, the i3 with hyperthreading turned on to deal with this kind of situation is actually not a big problem.

3) Heavyweight large-scale 3D games
Today's 3D games will hand over many tasks such as 3D acceleration to the GPU. When the GPU is working, the CPU will generally be in a blocking (interrupt waiting) state until the GPU instruction is executed. continue. So there will be two bottlenecks here, one from the CPU and one from the GPU.

For 3D games, generally speaking, i5 is fully capable. Do you want to use i7? Of course, it’s no problem to carry an i7 in your pocket, and your running score will definitely increase. But if the budget is limited, it may be simpler and more straightforward to invest the money in upgrading the graphics card. For example, i5 with mid-to-high-end graphics card such as 970 is relatively balanced, compared to i7+950.

Is the E3 value with i7 performance worth starting? Of course it's worth it. But if the price of E3 1231v3 is overhyped by the price of JS, it is better to use i5.

4) 3D graphic workers

If there is a lot of 3D modeling in the work, rendering and so on. CPU is important, GPU is also important. The more CPU (logical) cores the better, because various rendering methods can be highly parallelized in terms of algorithms. Each logical core can fill your task queue to the max, maximizing the performance of the CPU. There will never be a situation where there is only one or two customers in such a huge canteen. At this time, the difference between E3/i7 and i5 may be very large.

The GPU burden is also heavy, and ordinary game graphics cards such as GTX980 may not be able to do it, and a Quadro graphics card is required. It's not that the 980 is not powerful enough, but because some graphics-related drivers/libraries have not been added to the GTX980 game card. Without the driver, it cannot run on the GPU. If it cannot run, it can only rely on the CPU to simulate the operation. The result is that the logic of the CPU itself has to run, but if the GPU can't run, it will let your CPU run in the end. You said that the CPU is not powerful enough, can you still survive?
Therefore, for this type of application, you must choose a powerful CPU, such as i7, E3, or even the mid-range Xeon E3 series - 6-core 12-thread, 8-core 16-thread CPU.

Advanced - Why do we turn off hyper-threading when the system runs the test?

At this time, you may ask, since HT can improve the performance of the system, especially the ability to handle multi-threaded programs, why do you turn it off during the system test? . For example, after an E3 1231v3 with 4 cores and 8 threads is turned off, only 4 cores and 4 threads remain, that is, 4 waiters, 4 chefs, and 4 queues. Wouldn't the performance be worse? Isn't the CPU idle time going to be high?

This is actually a very practical and interesting question. Logically speaking, we should enable hyperthreading.

Example:
For example, there are 64 guests, and everyone wants a rice bowl. Two situations
1) In a canteen with 8 waiters, 8 queues, and 4 chefs, how many guests are there in each queue? - 8.
2) In a canteen with 4 waiters, 4 queues, and 4 chefs, how many guests are there in each queue? - 16 pcs.
Which is faster? It should be the first one, because 8 waiters at the same time, taking orders staggered, of course can reduce the delay caused by a certain customer's hesitation and chatter. Keep 4 master chefs busy.

Don't forget, we have discussed before that after hyper-threading is enabled, because of the addition of 4 waiters, it will bring additional overhead-every guest will hesitate before joining the team, and it takes time to think-"Whether the two teams How should I line up? Which line has fewer people? Which waiter looks good?... ". This additional overhead (processing delay, performance loss) is at the hardware level, and it was stipulated when Intel designed the CPU. There is nothing we can do to fix the hardware side of things. And the only way can only be ----> turn off HT. But when HT is turned off, each queue becomes 16 guests, and each waiter increases from receiving 8 guests to 16 guests (AS delay increases from 8 to 16), how to break it? ? ? ?

The main event is coming. Of course, we cannot change the hardware, but we can optimize the software program. We can rewrite the parallel scheduling algorithm of the program, so that the program can be optimized for the hardware architecture of the CPU to the greatest extent. The details of the specific algorithm are too professional and difficult to understand. Let me give you the following example, and you may understand:

Example:
For example, 64 guests came, and everyone wanted to eat a rice bowl. Come to a canteen with 4 waiters, 4 queues, and 4 chefs. How many guests will there be in each queue? - 16 pcs.

Well, for each team, now I don't let these 16 people queue up, but elect a representative from the team, and let this representative replace the 16 people to order from the waiter. There are 16 servings of donburi on a list, and the remaining 15 people back off. In this way, there are only 4 customers (representatives) ordering in total, and the remaining 60 people are resting below. In terms of ordering speed, each team can only grind (one representative) once at most. The chef in the back hall received 16 orders for rice bowls, but only the ones he worked hard to make. You can't fry a bowl of rice and rest for 5 minutes. . .

Look, is the problem solved?
1) It avoids the additional overhead of AS caused by 8 waiters and 8 queues
2) It also makes the most of the chef (reducing the idle time of the PU)

As a supercomputing system, everyone is pursuing extreme performance. The performance ranking of the top 500 supercomputers is conducted every year in the world, and a slight difference in performance may make your ranking fall back a lot, so everyone needs to squeeze the last bit of performance of the system as much as possible.

At the same time, this example also tells DIY players that hardware is important and software is also important. While the hardware is powerful, the software (driver) must also be optimized. If the software is not optimized, no matter how powerful the hardware is, it will not be able to exert 100% of its power. This also explains from the side why some hardware belongs to the type of running score king. For example, in the test of 3Dmark, the score is extremely high, but when it comes to the actual game, the performance is a mess.

When buying hardware, you need to buy something that many people use, and don't engage in too niche things.

Both software and hardware are used, not only the hardware performance is strong,

but the software optimization must also be in place . This disadvantage can be offset by modifying the source code. And the additional overhead of hardware architecture brought by 8 threads (4 more hardware AS) , this can be understood as integrated circuit level, and we can do nothing about it.

So it's like Huashan's Jianzong and Qizong.
Jianzong is: simply increase the number of program threads and turn on the CPU hyperthreading function at the same time.
Airbender is: modify the program, make changes in the algorithm, manually calculate the operation cycle, adjust the parallel strategy, and hide the delay.
The sword sect is fast, but the qi sect is slow. In the same practice for 1 year, Jianzong has reached level 6 power, while Qizong can only reach level 3. But if enough time is given, Jianzong's limit can only be trained to level 9, and there is no way to break through. The Airbender can eventually reach level 10.

2) There is optimization. There is a balance issue involved. Performance vs Versatility.

As a simple example, if you are given an addition operation:
1+1+1+1+1+1+1+1 (8 1s are added, of course, in reality, such a small-grained operation does not need to be parallelized at all, it is not worth it. This is for example.)

First The first solution (low performance + highest versatility):
no optimization is done, and the programmer only needs to graduate from the first grade of elementary school. The program is too simple and clear, one line is done, throw it to the CPU, and do 7 calculations, counting for 1 second, so it is 7 seconds. Multi-core is useless at all, it is completely single-core performance. Emphasize one sentence: a single-sequence program (serial program) such as adding 8 1s, if you do not parallelize at the code level, it will not become a multi-core program by itself. That is: it will only use one core! ! ! Here, no miracles, no magic tricks! How to do parallelization? Change your program, use pthread, fork, MPI, openMP. . . There are many methods, so I won’t go into details. If you are interested, ask Du Niang for help.

Total time: 7 seconds

The second option (good performance + high versatility)
for parallel optimization:
1) First, count how many cores your machine has, assuming that there are only physical cores. Ok, counted, with 4 cores, took 0.5 seconds (the time value is just an example).
2) At this time, according to the number of cores (=4), I can split the operation into 4 parts, generate 4 (programmatic) threads, and change it into the following form, so as to perform 1 calculation with the number of hardware cores (=4). :1 matches. Such a scheduling overhead is 0.5 seconds,
3) Then, start the following calculations
(1+1) (1+1) (1+1) (1+1) The first round of each core is an operation, a total of 1 second Minutes
(2+2) (2+2) In the second round, No. 1 and No. 2 cores are operated once, a total of 1 second
(4+4) In the third round, only No. 1 core worked for a total of 1 second
and took 1+1 +1=3 seconds. And this is also mathematically proven ( Divide and Conquerproblem, I won't say much about the specific details. If you are interested, ask Du Niang for help), I believe everyone has learned logarithms, Log28 = 3.

Total time: 0.5+0.5+3=4 seconds

The third option is more extreme (extreme performance + low versatility).
If, I know that there are 4 cores in my system, is it: 1) How many cores 2 ) All scheduling overhead can be saved? Well, all these time-consuming steps are removed. Go directly to step 3).
----> In this way, it only takes 3 seconds in the end.
However, this method is only suitable for 4-core machines. If you give him a dual-core or 8-core machine, the overall speed will be greatly reduced. It is not as good as the second solution, because the second solution has certain versatility and self-adaptation sex. And the third option is "dead" and brainless. That is, hard coding when programming (translated as dead code or hard code). This programming habit is not recommended, because the written program will be very poor in practicability.

Now I don't know if you can see any clues?

In fact, our system testing is pursuing the third solution, because we are very clear about the architecture of our system. There is no need to consider the situation of dual-core and 4-core CPUs at all (we generally use 8-core U). The size of the L2 and L3 caches in the CPU is fixed. That is to say, our code optimization can be very extreme, completely optimized for specific types of hardware. The result of such optimization is low versatility, that is to say, the index performance we have achieved can only be achieved on our system.

Another interesting example:
The Fermi generation of NVIDIA graphics cards, that is, GTX460, basically has about 300 CUDA Cores (which can be understood as stream processors). The next generation of Fermi is Kepler, which is GTX660. Compared with 460, its performance is almost doubled. But how doubled, you know? The number of stream processors of GTX660 has increased to about 1000. But the performance of a single stream processor, 660 is only about 1/2 of 460. So 660 won by quantity.

As a result, the problem came. One of our users ran the same GPU program (Gromacs) on the Fermi card and the Kepler card (Tesla computing card, which is almost equivalent to 460 and 660). As a result, Kepler was much slower than Fermi, and the running time was longer. doubled. Logically speaking, it shouldn't be, users ask us for help. After we checked, it was not a hardware failure, both cards were working normally, and the NV driver was also normal. Then, I started to look at the source code of the program, and after a lot of work, I found the reason! ! ! The number of threads called in that program is dead! It is set to generate a maximum of 256 threads (as for why it is 256 instead of 257, because there is a concept of thread wrap (thread package) in CUDA, a package is 32, so the general total number of threads is used to be a multiple of 32. Specifically Not much to say, if you are interested, you can take a look at "GPU High Performance Programming CUDA Actual Combat"). In this case, on the 300-core Fermi, the stream processors of the entire card are (almost) fully run. But for Kepler with 1000 stream processors, 256 is equivalent to only using almost 1/4, and more than 700 stream processors are idle from beginning to end. Don't forget, Kepler single-core performance = 50% Fermi single-core performance. So, in this way, why the Kepler card runs slowly can be explained.

We don't know the logic of the Gromacs software, and there is no need to know, because we are not experts in computational molecular dynamics. So later, we reported our findings to the developers of the Gromacs program, and asked them to optimize and improve the program, re-improved the parallel algorithm, increased the number of threads, and finally solved this problem in the next version update , so that the Kepler generation of graphics cards can be perfectly supported.

It can be seen how important targeted program optimization is!

(The following is my understanding as a player, not necessarily professional and accurate)

In the actual consumer market, if a consumer product, especially its hardware driver, adopts the third way of thinking, it is very likely that there will be a phenomenon of "king of running points". Here, I give an example of a graphics card, which may be easier for everyone to understand. Like the ATI graphics card many years ago, the hardware driver is likely to be extremely optimized for those benchmarking software such as 3Dmark. why? To a large extent, it is because ATI (the second solution) cannot compete with NVIDIA (the second solution), and these manufacturers know that consumers always look at the evaluation and running scores before buying a graphics card. Well, I will show you here Taotian avoids horse racing: ATI (the third option) will compete with NVIDIA. As a result, the running score does not fall below the peak, but the versatility may be greatly reduced. In actual use, when encountering various games, if the game does not further optimize the matching and optimization of the graphics card driver, the performance will be greatly reduced. So here is a problem: Game program <--> hardware driver <--> hardware architecture The three match each other and optimize the scheduling of each other. (Of course, in reality, there may be an additional link of the game engine, that is, 4).

Game developers like Crysis (including 3D engine developers like Cryengine) generally have different ideas from manufacturers who develop hardware drivers. That is to say, developers of isolated islands must consider the versatility of the platform, and generally will not adopt particularly extreme development ideas (such as the third solution). In reality, the slow game speed and freezes are usually caused by manufacturers (ATI, NV), who blame the products for their lack of power, but would you blame the game developers for the code garbage they wrote? So what if the silo has always been called a hardware killer? As a consumer, you can only keep paying to upgrade your graphics card. . . .

If the island adopts the second solution idea, it makes sense. Of course, 2 cores and 4 threads, 4 cores and 4 threads, and 4 cores and 8 threads are different. Because you never have an absolute performance benchmark - how fast and smooth should the island be, and whether it has exhausted the absolute performance of the hardware GTX980 to the extreme. And you can only have a relative feeling - playing an isolated island with an i3 with fewer cores is definitely not as cool as an i7 with more cores.

This makes sense.

------------------------------------------------------------------------

Source of this excerpt:

Detailed explanation of CPU working mode, multi-core, hyper-threading technology - Zhihu (zhihu.com)

Guess you like

Origin blog.csdn.net/weixin_70280523/article/details/132155753