Codec revolution based on AI and NPU - collaborative innovation of VPU and NPU

Editor's note: In this rapidly changing digital media era, Codec technology plays a vital role in video and audio processing. The rise of AI has brought unprecedented opportunities and challenges to Codec. At the same time, the development and collaborative innovation of VPU and NPU has enabled Codec to better adapt to complex scenarios and needs, and achieve a higher level of image and sound processing capabilities. LiveVideoStackCon2022 Beijing Station invited Mr. Kong Dehui, Director of Multimedia Technology of Center Microelectronics, to discuss the impact of AI and NPU on Codec from multiple perspectives, including algorithm optimization, performance improvement, and energy efficiency improvement. Gain an in-depth understanding of the key factors and potential opportunities of AI and NPU-based Codec revolution, and further promote innovation and development in the field of digital media.

Text/Kong Dehui

Organize/LiveVideoStack

Hello everyone, I am Kong Dehui, Director of Multimedia Technology from ZTE Microelectronics. The topic I am sharing with you today is the transformation of AI and NPU Codec - the collaborative innovation of VPU and NPU.

At present, it is generally believed that AI has a relatively strong impact on Codec. But from the perspective of bottom-level thinking, while this change changed the way of working, it also partially replaced or reduced some ineffective labor. After reducing invalid labor, people can pay more attention to things that machines are not good at but people are good at. As described in "The Three-Body Problem", "What the three-body people don't understand the most is human thinking", human creativity is something that machines may not have at present or even in the future.

11c795ba1bef3676d00e7f722cd5fab4.jpeg

The main points shared today mainly include the following four aspects:

The first part, general introduction;

The second part, the trend of AI and Codec;

The third part, the form of NPU and VPU;

The fourth part discusses the integration of NPU and VPU.

-01-

General introduction

20a4569c2517b0c1fc1cdba046e3aa41.jpeg

In the development process of multimedia technology, from the initial wired and wireless communication capacity, to 2G, 3G, 4G, and now to 5G, the changes are obvious. In this process, people will find that the 3G era lasts for a short time, while 4G lasts for a long time. This is because 4G can more fully accommodate the needs for life experience or lifestyle. The most important point is that audio and video data and information can be accessed more conveniently, including the current short video. It is precisely because of the larger and wider data path that we can transmit the content we want to present to the user side.

Before the appearance of 3G and 4G, including before the appearance of 5G, Codec technology existed all the time, and the compression ability did not advance by leaps and bounds until after 4G. It is precisely because now as the capacity of the pipeline becomes wider, more data can be accommodated, which is a process from "can't" to "can".

Now that AI technology is available, I hope it can play a better role and accelerate the change process from "can't" to "can". It used to take 10,000 people to do the work for a year, but now with 100,000 people, it only takes half a year to complete. The impact of this technology has accelerated the development of the Chinese market. In the 5G era, some people may wonder "Why haven't you felt the earth-shaking changes brought about by 5G?". In fact, what is needed is to find or change the needs of users, from the previous "can't" or limited to the current "can". In the past, everyone thought that WiFi was very important, but now they no longer care about whether the hotel has WiFi, because mobile phones can already realize the corresponding demands through indoor small base stations.

The next step is to ensure that the content users see meets their needs, such as how to meet the user's demands in terms of delay and picture quality? In addition to short videos in entertainment life, are they also closely related to videos at work? In the early years, multimedia technology appeared, but there were no corresponding products, because multimedia technology has been integrated into the user experience of the cloud or terminal, and no special technology is required. If you want to make "energy" "better", you need to gradually realize the intelligence of audio and video technology.

bc2c5aafaf42d05c0da6ba42de76bbf9.jpeg

The direction of intelligence is not only life and entertainment, but also transportation, government services, travel, health and other fields. These data often need to be transmitted through video as a medium, rather than pure text, such as digital services of enterprises. "Every book we read is actually a cropping of people's thoughts", so the significance of video recording is to record the process of face-to-face communication between people and information owners. Therefore, video business is not only entertainment, but also includes many In other fields, there will be many opportunities to expand these fields in the future.

31b5dc8f2bf467514996375e1fbc6821.jpeg

There are two cases here. The first case is about terminal computing capabilities, such as edge devices such as mobile phones and watches, which themselves have strong computing capabilities, which is very helpful for completing business. The second case is about ADAS. Its computing power demand has increased exponentially, which means that it needs to have a larger computing platform to support the business, which was rarely considered before. Because before, more attention was paid to computing on the CPU, GPU or DSP. But a deeper understanding of its computational logic is required to properly deliver encoding, decoding, and enhancements to these IPs. In fact, these IPs are so powerful that if you only focus on the computing power of the CPU, you will find yourself limited.

In recent discussions, it has been found that people are paying more attention to applications, so how do applications sink to the actual computing platform? As for the computing platform that needs to be fully considered, the first operating platform is the CPU, but from the perspective of computing power, the CPU is not the strongest. For video, image or audio processing, DSP and GPU have more potential, and these potentials need to be tapped. In addition, NPU is much more powerful than CPU in all aspects of peak computing power.

88ff3261b945ce6efb59703c294c5c46.jpeg

The main content of the above picture is about the development of ChatGPT. As the algorithm continues to advance, computing power and computing platforms are also improving. There is no need to worry about the waste of the computing platform, or the inability to independently upgrade the self-deployment platform. Because as the evolution of the algorithm brings higher performance, attention will be paid to its parameters, which may be associated with the synapses of the neural network. In fact, this is also a note to improve computing power and computing platforms.

People should not frame the application platform first, but should start from computing trends or algorithm performance, and consider how to promote changes in computing platforms. This kind of thinking will bring many choices, and there will be reasonable reasons for the continuous evolution of computing platforms, because the improvement of algorithms or performance is brought about by changes in computing power.

c54dcf075c2da1e15011907bd6ab179d.jpeg

Regarding the evolution of the computing platform, the way of digitization was mentioned earlier. What are the impacts of these digital methods on individuals? These ways include engaging and communicating with family, business partners, colleagues, and virtual people. This connection method has brought more communication options, no longer limited to sound, but can interact through video and other methods.

In recent years, we have seen the development of technology, from the popularity of 4K to 8K, which has gradually entered the field of vision, which has brought great challenges to audio and video codecs. Although 2K is relatively easy to achieve, 4K is still challenging, so what about 8K? This is a current issue, and while it may not be urgent, it is clearly visible.

Faced with this problem, how to solve it? In virtual reality (VR) and augmented reality (AR) devices, the latency requirements are very low. Data is loaded from the SOC into the DDR, then transmitted to the CPU, and then passed repeatedly in the multi-layer cache. Can the latency be kept low? Or must a direct point-to-point logic be used, avoiding any intermediate links?

Therefore, when dealing with the design scheme, it is necessary to take into account the impact of these new connection methods and the number of connections. For example, low-latency display, greater data volume and throughput requirements, these solutions will bring changes to the processing platform. In the past, people hardly considered the issue of voice, but now voice has become a keyword that cannot be ignored. So, how to reasonably allocate the work tasks of the cloud and the device side? How to reduce payload while guaranteeing latency? These are factors that need to be optimized and considered.

-02-

Trends in AI and Codec

4ad18fcb390da142ecc09b3b2bfa9752.jpeg

In the face of these new trends and challenges, artificial intelligence (AI) and codecs (Codec) are also changing, and this change includes two aspects:

First, with the introduction of large models, more and more data are continuously provided to them. However, such data inherently presents two problems. The first point is the validity of the data. In advanced tasks, there may be dirty spots in the data, which will affect the accuracy of the model itself. It is difficult to achieve 100% accuracy even if the model exceeds the order of magnitude. Therefore, it is necessary to ensure the data quality and improve the accuracy of the model. The second point is that in some shallow tasks, there may be natural data set construction problems. For example, when performing SR, it is difficult to obtain point-to-point completely real Ground Truth. Therefore, there may be defects or deficiencies in the data set, but we are constantly trying to make up for these defects or deficiencies. Such compensation actually means "how do we solve the problem by simply increasing the number of models we have seen before", It becomes necessary to comprehensively consider factors such as models, data sets, calculation methods, and training methods, rather than just increasing the number of models to solve the problem. Especially when using large models, it is necessary to consider how to adopt distributed training to improve training efficiency, which is a problem that needs to be solved now.

ad8734626eabf3e9b5ab734449c5ce7d.jpeg

In the second aspect, how to effectively improve the effectiveness of data calculation also has three problems. The first is that for NPU and AI, this is a fatal problem. For the disillusionment of AI, one of the important reasons is that although the computing capacity has been improved, when it is actually delivered to the user, the user finds that the capacity is very large, such as 10TOPS, but in fact only 5TOPS or even 2TOPS can be used per second. This problem Very common in the first generation of NPU, so how to fully adjust the calculation dimension? The second is the data type. When performing AI algorithms, many calculation types are completed through gradual data transmission or data approximation. This is an approximation process, so can you consider introducing some calculation dimensions related to AI acceleration instead of just doing LP32 or LP64? Such a computational dimension. Such computational dimensions can improve the overall computational performance, especially when performing data multiplexing. For example, a 64-bit multiplier can be simply folded into two 32-bit multipliers, and such a technical increase can bring considerable computing power expansion.

74c0fcd3321f9645009cfc98b6287ffb.jpeg

The third question is about the radical capabilities of the data center. The picture shows the "Reform of Computing Efficiency in the Next Ten Years" announced by ISSCC in 2023. It can be seen that the amount of data is constantly increasing. What does this rise mean? If you review the previous content, you will find that, first of all, as the cost of unit computing power declines, the computing center or the so-called computing power anxiety will come earlier than expected. Thirdly, as the computing power and computing efficiency of the computing center increase, more significant benefits can be obtained. In the past, the most expensive thing about training large network models was not the purchase of GPU cards, but the long-term power consumption. If computing can be made more efficient, even saving 1% or 10% of energy consumption will bring qualitative benefits for large model training or data centers. In addition, this benefit is very important for the operation and maintenance of the model after deployment, because it brings long-term benefits.

The development of a model may be staged. When training the model, more attention is paid to accuracy, while during the model operation stage, more attention is paid to operating costs. Therefore, when operating the model, computing requirements can be adjusted to reduce operating costs.

8bffa7cdb12aca468a7a6bba1d1b9b43.jpeg

This part is to consider multiple dimensions of business expansion. First of all, with the expansion of channels and the increase of parameters, more data dimensions can be provided to users. These users can not only be people, but also machines or sensor cascading devices to open up the task point. Although the ultimate goal is to serve people, for example, in places such as ports or mines, the area is huge. If you have been relying on manpower, you will face huge challenges in laying out video streams. Therefore, how to realize intelligent control and interaction becomes the key, and there may be only about 1% effective information in video information, which is not all useful for final judgment. This is also one of the core reasons why VCM can exceed VVC by more than 50%.

Second, when performing control algorithms, data control does not require human subjective experience. The demand for this subjective experience is often preconceived by people, but it can be reasonably optimized when designing a system solution.

Third, the scheme of feature transfer needs to be considered. For humans, there may be a need for precision in features. But for machines, when data changes or is lost, for example in terms of back-end recovery or machine judgment, a certain degree of data change is acceptable. Therefore, there will be significant differences in performing VCM and human vision. With the improvement of intelligence level, more video data or similar data expressions should be designed for machine judgment, while people pay more attention to the results.

4bb21c911a17bfdfcaa6ca0603090c0e.jpeg

For the improvement of user experience, both machine-related technology and human perception are changing. Here are some intuitive examples for human perception:

The first is the 8K large screen. In last year's World Cup, a survey was conducted and it was found that those who had already experienced 8K had difficulty returning to the 4K viewing experience, because the immersion brought by 8K and the matching sound design brought users An irreversible experience. Therefore, we should try to broaden the needs of users, instead of being forced to make adjustments, we should actively pay attention to these changes.

The second example is the Metaverse. This is a concept that everyone is talking about. In the metaverse, what needs to be considered is what the interactive experience is, and how to transfer this interactive experience to the people who interact with it. I think this is an important challenge and focus for Codec and AI generation technology in the future.

The third example is "enjoy work". As a technology developer, especially an audio and video developer, you should provide some products to make your work easier. This includes not only remote working methods, but also aspects such as communicating with clients and colleagues. Especially in the past few years of remote work experience, do you feel the fluidity of the way you work? I remember that when the epidemic was serious last year, just in time for the peak business period, I found that when I continuously communicated with my colleagues remotely, the efficiency actually dropped. This needs to be adjusted and optimized by yourself. Now many multinational companies have begun to sign permanent "home office" agreements. This way of working has a certain relationship with how to design data paths, user interfaces, and even dedicated hardware devices.

-03-

Forms of NPU and VPU

The needs and changes of users need to be considered, and it is hoped that these changes can be further sunk into the more efficient hardware solutions provided.

78b33898e7e81af2e19521cace4c1dd6.jpeg

The first generation of NPU has excellent "parallel space" and "stack computing" capabilities. However, over time, there is a need to think about how to effectively apply this computing power to the desired business deployment. Therefore, we will further abstract the calculation, including the optimization of 1D, 2D and 3D calculations. This provides an opportunity for the architecture design of the next-generation NPU to better meet business needs and provide appropriate support and abstraction for the existing AI algorithm computing layer. Committed to closely integrating with business, and actively exploring how to support and optimize the existing AI algorithm computing layer.

In this regard, a question needs to be considered, that is, whether the computational abstraction mentioned earlier is reasonable. For each calculation type, its optimization efficiency may be different in different situations. Therefore, how to make full use of current resources to achieve the best optimization effect? For example, suppose there are two types of tasks, they can be mapped to three-dimensional computing, and "low-dimensional" can be mapped to "high-dimensional", but this mapping may lead to a waste of computing resources. However, in order to push all computing tasks to dedicated hardware, it is necessary to sacrifice some computing resources and costs to a certain extent.

7da1dc0e05ef43a77958bb72dbfa8fa1.jpeg

In the transition from single-core tasks to multi-core tasks, we are faced with a problem: how to push tasks with high computing requirements to two suitable computing types? However, there may be some mismatch in such computation types, resulting in wasted computation resources. In this case, you can consider splitting the computing core or instantiating multiple cases, and deploying different tasks for different cases to make full use of the overall computing power.

Also, there are some issues related to synchronization. In the entire AI acceleration process, in addition to the utilization rate, there will also be a bottleneck, which is not the calculation logic itself, such as convolution operations. Now there are some better acceleration methods or approximate methods. On the contrary, bottlenecks mainly appear in the "pre-processing" and "post-processing" stages, because these calculation logic must be migrated to GPU, DSP or even CPU, which may become a short board.

Therefore, it is necessary to consider how to divide the current computing tasks, abstract the "pre-processing" and "post-processing" respectively, first give some "pre-processing" logic, and divide the "post-processing" tasks into several categories, because currently The calculation logic is mainly biased towards some Mac arrays, while "pre-processing" and "post-processing" are more involved in data rearrangement and logical operations. From this point of view, it can be divided functionally, from the previous "computational logic abstraction" to the current "functional logic abstraction".

aca7f9644e247444cf07b06ec3e7ac60.jpeg

Another important aspect is about some characteristics of the current VPU architecture. It can be divided into prediction unit, filtering unit, semantic decoding and pixel decoding, etc. In the VSE, the reverse analysis of the grammatical elements is performed, and in the subsequent stage, the pixel is processed to form the structure of the VPE and VSE. At the same time, some post-processing is also integrated in it. For example, when designing a VPU, if it can only output at the original resolution, this may not match the actual user needs. The most direct example is the TV at home. In recent years, the resolution commonly used in China may have reached 4K on average. However, the situation of overseas users is very different. Many users even still use low-resolution display devices. In this case, if the VPU can support the needs of different display terminals or display types at this stage, then the data will gain a great advantage. If there is no such unit, then after the data is output, it needs to be stored in the DDR and then passed through an additional processing unit, whether it is a DSP, GPU or NPU. After this period, the data may need to be written to the DDR again and then sent to the output interface, and the overall delay will be much greater than the currently used solution.

Another factor that needs to be considered is the matrix problem of power consumption. For users, frequent read and write operations will lead to a continuous increase in power consumption, because reading and writing itself is not friendly to power consumption. Especially in the deployment on the edge side, many times the problem is not insufficient computing power or insufficient algorithm mapping ability, but that although it can be deployed and run, it can only run for 5 minutes. This is because after 5 minutes, the device is already overheated, and it is impossible to add a $5 heat sink to a $10 device, which is not in line with the principles of product design.

Therefore, for the edge side, it is very important to consider the application requirements of all aspects of the product in the early stage of design. For example, in the initial design stage, stream processing can be used to reduce the need for data interaction. At the same time, VME can also be used for memory control and rewriting to optimize memory read and write operations.

2e01ad9377fcff773ce4b8e32f6613ad.jpeg

From a logical point of view, it is necessary to combine the hardware architecture with the existing software codec architecture, and we can see that there are many corresponding relationships between them. From this perspective, this solution has a mature solution around 2017 in terms of supporting 4K. The picture shows an efficient video processing report written by a colleague in 2019. It can be observed that many new cases have appeared on the VPU, and these new cases have two main aspects in terms of calculation.

In the first major aspect, more refined management is pursued. For example, what was previously used for the Y channel is now applied to the UV channel. It was previously considered that the influence of the UV channel is not very important, and it can be lowered by one level. Doing so reduces the size of the overall computing logic, making the chip more edge-friendly or user-friendly. However, it turned out later that this part is essential if quality is to be improved. So the first aspect is a more comprehensive and fuller use of the entire computational logic.

The second important aspect is the finer estimation of the parameters. We are also trying to use AI methods to optimize the estimation process of these parameters. AI methods work quite well in this regard given the amount of data and the right type of data. Such fine parameter estimation can improve the quality and efficiency of video coding.

But there is a problem. Among the two trends just mentioned, the first is the fine estimation of motion parameters, and the second is the improvement of the quality of content that was previously considered marginal. In addition, how to support parallel computing is also an important issue. In parallel computing, key logic used in the original architecture, such as VSE, VPE, and VME, can be considered for syntax element analysis and pixel-level restoration. However, with the sharp increase of input data, especially in the core experience of large outdoor screens and future home terminals, these data become extremely important data sources. Compared with decoding 4K or even 2K, the data paths of these data sources are longer. Much larger, maybe 2x or even 8x+. Therefore, at the software and hardware level, it is no longer enough to simply strengthen horizontally or expand in scale.

The next dimension is the need to support parallel decoding, but parallel decoding also puts forward some requirements on the encoding process. When decoding the third or fourth line, if the syntax elements of this line have a strong correlation with the previous line, the decoding process may be limited, even if the previous decoding has been carried out to the earlier position, But if the decoding of the previous stage is blocked, then problems will arise.

6cc9598ec61f6a672d1d064829ab7f2a.jpeg

In the previous discussion, the NPU can only meet the basic functions at first, and then better support the corresponding tasks through the NPU. In the future, it is hoped that the NPU will have sufficient capabilities to adapt to various tasks. Including the VPU, it corresponds to the previous decoding process, but now it is starting to let the decoding in turn constrain the encoding process, which is the change seen so far. So for these changes, how to integrate or decompose?

In a previous sharing, I put the entire NPU part inside, and regarded it as part of the entire process. However, after thinking about it carefully and discussing with others, I found that this logic may give people a misunderstanding that the NPU is only a part of pipe processing. In fact, a more reasonable logic is that the NPU should support the entire process of the entire process. This includes the aforementioned ability to use the NPU to enhance Codec parameter estimation. In addition, I think that in the next generation of VCM, if some hardware logic is to be implemented, judging from the current structure, it is possible to place it under the NPU framework and carry out corresponding scheme design.

-04-

Discussion on the Fusion of NPU and VPU

4ded2dbffd0c08f451b1da74d4f5660c.jpeg

In order to provide users with better visual and audio experience, NPU should be associated with logic such as ISP and DPC. What are the benefits of this association? Take a mobile phone as an example. In traditional pipeline processing, when the camera of a mobile phone is directly used to obtain data, the brightness it can usually handle is above 1Lux. However, when combined with the capabilities of the NPU, it will be found that processing above 0.1Lux can be achieved relatively easily. Now, a lot of night scene shooting is achieved through this logic, which also explains why high-end flagship phones perform better in taking pictures, while entry-level phones do not take pictures well. There is some logic in this, that is, the gap is intentionally widened. But on the other hand, this is also because there are differences in resources, limited by available resources, it is difficult to provide a consistent solution.

2db6427991362f15fbe6b8a20722786b.jpeg

In addition, IaaS/AaaS, which was previously considered, can now take the PC as a basic service unit in the office dimension. From this logic, the future office can become more convenient. Because turning personal PCs into mobile resources is of great significance to fields such as cloud office, family communication, education, and telemedicine. The advantage of this is that various needs can be met by accessing specific communication environments in different environments, which increases the convenience of accessing specific communication environments in different environments. Also, since more data is stored in the cloud and the data source is in the cloud, it provides greater flexibility when accessing at the edge device.

This change may have an impact on how it is coded. I used to think that using 420 or 422 is enough when encoding in an office scene. However, in fact, when dealing with this type of stream, it will be found that the quality of the video data will become very poor if it is encoded according to the traditional thinking, which is different from the traditional thinking. This situation is easy to simulate, as long as you use the current generation scene to do some data generation, and then do the coding according to the current coding method in turn, and then solve it, you will find that the effect will become very poor.

dc735ef7e9cb51f5f0d294a9f5c21dc4.jpeg

For Codec, this is a question that needs to be thought about. If you only use 4x4 for encoding, you will find that the bit rate increases very quickly. However, if you combine it with the NPU and use the NPU for restoration and enhancement, the complexity is actually very controllable. In addition, when solving the problems mentioned above, especially in the case of limited bandwidth, because the current network access environment varies greatly, you need to pay attention. Why did you have to learn from foreign countries in the 3G era? Because the deployment progress abroad is faster, they can see more scenarios. But in the era of 4G and 5G, other countries have started to learn from us. Why? The number of 5G access scenarios in China is far ahead in the world. With such complexity, we are faced with many problems. For example, should we transmit high-resolution low-quality data, or low-resolution high-quality data? In addition, edge computing and NPU can also be used for super-resolution processing, or combined with low-resolution and low-quality data and all in one enhancement logic. This is a direction worth thinking about.

In the second aspect, we have also made some attempts, mainly based on the practice of NPU enhancement on the end side, and the benefits it brings are also obvious. It's better for user experience and bandwidth control than to focus all your energy on codecs. In the past, you may think that the system is like a wooden barrel, and its performance depends on the weakest link, that is, the short board. But in fact, you can think about it in reverse, what does this barrel effect mean? It means that there must be not only short boards, but also long boards, that is, there must be several relatively advantageous parts in the entire system. Why not take advantage of these long boards to solve water leaks?

35dd2a89104320669b9ec706337aecbf.jpeg

These are some existing VPU alternatives, including using the previously mentioned VCM as well as AI-based solutions. These AI solutions can be applied to NPU and lead to some new thinking. You can try to investigate several existing AI codec schemes, which can be divided into different types.

The first type is an end-to-end solution that no longer uses traditional techniques such as quantization, residual estimation, and MV estimation. Instead, the entire process is done end-to-end.

The second type is a scheme that replaces a specific part, such as the aforementioned MV estimation. Logically speaking, if it can be replaced, the entire output code stream is still coded according to coding standards such as H.264 or H.265, and even the AV1 decoding scheme can be used. Afterwards, when decoding in the cloud, directly use the normal decoder for hard or soft decoding. These are two different solutions, and which one to choose actually depends on the specific scenario. If the scene is relatively closed, for example, only an end-to-end solution is required, then the entire encoder can be completely discarded and all use its own decoder solution. However, if more user scenarios are to be considered, especially when the network environment at home and abroad is inconsistent, the latter idea may be more appropriate.

-05-

Summarize

cf7403615bd93043830ea19b74a9f1e1.jpeg

For the audio and video field, what needs to be paid attention to is how to combine its own solutions with computing power, not just focusing on the cloud, but should pay attention to the terminal side, because some limitations in computing power, power consumption and computing platforms need to be solved on the terminal side This is a very important consideration in order to effectively deliver the solution to the problem.

In addition, you also need to consider how to handle more connections. Other speeches at the conference also discussed the problem of solving the problem of access for 10,000 people, which is a very meaningful discussion. In addition, better performance will in turn lead to more opportunities and demands from users.

An interesting point of view is that I have always believed that the so-called cloud office is actually more derived from the migration of entertainment needs. Why do you have to be on-site for office work when individuals can just as well plug into the same video stream? This is a point of view.

f37d1e8b66ca8de4b877704474aca309.jpeg

Finally, I would like to share with you some future trends that I think. These trends cover how to integrate computing power with existing standards, because existing standards mainly define different profiles. Need to think about how to match these profiles with computing power. In addition, include several strategies mentioned before.

First, more data is generated directly using the AI ​​network. This computing acceleration method is essentially a complete subversion of the previously mentioned streaming codec architecture or hybrid coding strategy.

The second strategy involves some strategies related to AI Codec.

The third strategy is how to consider related costs, including performance costs and effective utilization.

The fourth trend is the evolution of some hardware architectures seen so far. In the case of supporting 8K, the single-channel solution is no longer reasonable, because many new challenges will be encountered when further reducing cost and power consumption.

The last point is about software issues, especially in the tool chain of the NPU. It is necessary to think about how to map different operators to the existing NPU computing unit, and at the same time, it will in turn form a circular problem, that is, how to integrate the corresponding functions into the system. This is a good idea to try.

The above is the sharing of this time, thank you!


9c466f6f986bc4a82299134821d9b47b.jpeg

LiveVideoStackCon is the stage for every multimedia technician. If you are in charge of a team or company, have years of practice in a certain field or technology, and are keen on technical exchanges, welcome to apply to be the producer/lecturer of LiveVideoStackCon.

Scan the QR code below to view lecturer application conditions, lecturer benefits and other information. Submit the form on the page to complete the instructor application. The conference organizing committee will review your information as soon as possible and communicate with qualified candidates.

d134e4aa2342e316b20c8a77095c992d.jpeg

Scan the QR code above 

Fill out the Instructor Application Form

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/132114201