The rise of heterogeneous accelerated computing should not just focus on computing chips

Original title: Why SYCL: Elephants in the SYCL Room

By James Reinders and Michael Wong

Excerpted from: https://www.hpcwire.com/2022/02/03/why-sycl-elephants-in-the-sycl-room/


Commentary — In the second of a series of guest posts on heterogeneous computing, James Reinders, who returned to Intel last year after a short “retirement,” follows up on his piece about how SYCL will contribute to a heterogeneous future for C++. He is joined by Michael Wong, of Codeplay Software Ltd., who is also the current SYCL committee chair. Together, they offer their responses to what might be called the ‘Elephants in the SYCL Room.’ 

Comments - In the second in a series of guest posts on heterogeneous computing, James Reinders, who returned to Intel last year after a brief "retirement", continues to talk about how SYCL will contribute to C++'s heterogeneous future. article. He was joined by Michael Wong of Codeplay Software Ltd., who is also the current SYCL committee chair. Together they responded to what they called the "elephant in the room" at SYCL.


The case for C++ programming, with SYCL bringing in full heterogeneous support, has been well articulated by persons close to the SYCL specification including a recent article “Considering a Heterogeneous Future for C++” and numerous other resources enumerated on sycl.tech. SYCL is a Khronos standard that introduces support for fully heterogeneous data parallelism to C++. While SYCL is not a cure-all, it is a solution to one aspect of a larger problem: How do we enable adequately enable full heterogeneous programming given the emerging explosion in hardware diversity?

The case for C++ programming that SYCL brings full heterogeneous support has been well laid out by those familiar with the SYCL specification, including a recent article, "Considering C++'s Heterogeneous Future," and many other resources listed on sycl.tech. SYCL is a Khronos standard that introduces support for fully heterogeneous data parallelism in C++. While SYCL is not a panacea, it is a partial solution: given the explosion of hardware diversity, how do we fully enable fully heterogeneous programming?

In this article, we offer our perspective on key questions about SYCL, based on our perspectives of being having worked in this domain for decades. These important questions are asked by software developers looking to understand if SYCL matters to them. Let’s face it: at some point, every major project has Elephants in the Room.[1] Successful projects address their elephants openly.

In this article, we present our perspective on key issues in SYCL based on our perspective from decades of working in the field. These important questions are asked by software developers who want to know whether SYCL is important to them. Let's face it: at some point, every major project has an "elephant in the room." [1] Successful projects address their problems openly.

Elephant 1: Aren’t GPUs enough? Do other accelerators really matter?

Elephant 1: Isn’t GPU enough? Do other accelerators really matter?

Valid questions exist about which accelerators will stay, and which will be a passing fad. For decades, different accelerators have come and gone while CPUs persist. Today, GPUs are present in the vast majority of computer systems. Writing our applications to leverage GPUs makes a lot of sense given their near ubiquity.

There are some legitimate questions about which accelerators are here to stay and which will become a fad . Different accelerators have come and gone over the decades, but CPUs have remained . Today, GPUs are found in the vast majority of computer systems. Given that GPUs are almost everywhere, it makes sense to write applications to take advantage of them.

As a result, one of the first elephant questions is whether we really need to generalize, i.e., do we need to be multiarchitecture and multivendor?

So one of the first questions is do we really need generalization, i.e. do we need multi-architecture and multi-vendor?

The expectation that the future will require “dedicated or semi-dedicated hardware accelerators” as a must-have feature for computing in this decade is expected by experts including researchers led by Prof. Masaaki Kondo in “White Paper on Next-Generation Advanced Computing Infrastructure” and by Hennessy & Patterson in their paper “A New Golden Age for Computer Architecture”.

Experts such as researchers led by Professor Masaaki Kondo predict in the "Next Generation Advanced Computing Infrastructure White Paper" that "dedicated or semi-dedicated hardware accelerators" will be needed in the future as an essential feature of computing in this decade. ” and Hennessy and Patterson in their paper “A New Golden Age of Computer Architecture.”

As long as we are talking about dedicated accelerators, why stop at GPUs? Optimizing for different types of accelerators is a great objective, but we don’t want to write different code for different types of accelerators. We believe that the industry will benefit from a standardized language, that everyone can contribute to, collaborate on, is not locked into a particular vendor, and can evolve organically based on its members and public requirements.

Since we're talking about dedicated accelerators, why stop at GPUs? Optimizing for different types of accelerators is a great goal, but we don't want to write different code for different types of accelerators. We believe the industry will benefit from a standardized language where everyone can contribute, collaborate, not be locked into a specific vendor, and grow organically based on the requirements of its members and the public.

SYCL takes an interesting approach that allows us to use common code when we want and specialize when we want. In this way, SYCL embraces accelerators in general, leaving it to us, the developers, to decide when to write common cross-architecture code, and when we feel it is sufficiently advantageous to specialize code.

SYCL takes an interesting approach, allowing us to use generic code when needed and specialize when needed. In this way, SYCL embraces accelerators in general, letting us developers decide when to write code that is common across architectures, and when we believe there is sufficient benefit to specialized code.

Its underlying programming model, SPMD, has shown to be usable across many architectures. SPMD is how most programmers using Nvidia CUDA/OpenCL/SYCL think: writing code from the perspective of operating on one work item and expecting it to run concurrently on most hardware such that multiple work-items fill vector hardware lanes.

Its underlying programming model, SPMD, has been proven to work on a variety of architectures. SPMD is how most programmers working with Nvidia CUDA/OpenCL/SYCL think: write code from the perspective of operating on one work item, and expect it to run simultaneously on most hardware, so that multiple work items fill the vector hardware lanes.

SYCL offers a large degree of portability across vendors (e.g., many different sources of GPUs) as well as architecture (e.g., GPUs, FPGAs, ASICs).

SYCL provides a high degree of portability across vendors (eg, many different GPU sources) and architectures (eg, GPUs, FPGAs, ASICs).

Elephant 2: Why not just use Nvidia CUDA?

Elephant 2: Why not just use Nvidia CUDA?

A vibrant GPU eco-system is emerging thanks to competition from multiple GPU vendors. This is part of a trend for more and more competition for accelerators in general. The installed base of CUDA applications that make use of Nvidia GPUs are poised to be able to adapt over time to an open, multivendor, multiarchitecture software approach created to serve all vendors, not just Nvidia.

Due to competition from multiple GPU vendors, a vibrant GPU ecosystem is emerging. This is part of a trend of increasingly competitive accelerators. The installed base of CUDA applications using Nvidia GPUs will be able to adapt over time to an open, multi-vendor, multi-architecture software approach designed to serve all vendors, not just Nvidia.

While CUDA has earned a strong following given its value proposition and the strength of Nvidia GPUs in the ecosystem, there are increasing concerns regarding the lock-in that use of CUDA creates. Such concerns stem from the proprietary focus highlighted by these factors:

While CUDA has gained a large following due to its value proposition and the strength of Nvidia GPUs in the ecosystem, there are growing concerns about the lock-in caused by CUDA's use . These concerns stem from proprietary concerns highlighted by:

  1. The definition of CUDA, its implementation and evolution, is managed by Nvidia and evolves specifically to serve Nvidia GPU product designs. Details of new features in CUDA, are generally shielded from public view until NVIDIA has both hardware and software to support them. As discussed more fully below, this control stifles innovations from other vendors.

  2. The licensing for CUDA tools and libraries, from Nvidia, specifically states they must be used to “develop applications only for use in systems with Nvidia GPUs.” Even “open source” from Nvidia includes licensing languagerestricting key parts in the same manner.

    1. The definition, implementation, and development of CUDA are managed by Nvidia and developed specifically to serve Nvidia GPU product designs . Details of new features in CUDA are generally not made public until NVIDIA has the hardware and software to support them. As discussed more fully below, this control inhibits innovation by other suppliers.

    2.  The license for Nvidia's CUDA tools and libraries specifically states that they must be used "to develop applications for use only in systems with Nvidia GPUs."  Even Nvidia's "open source" includes licensing language that restricts key parts in the same way.

Nvidia CUDA can claim credit for bringing accelerated computing to the masses using Nvidia GPUs.With the explosion of competition in the accelerator market, it could appear that CUDA has become a walled garden in an increasingly open and transparent world.The desire for an open, multivendor, multiarchitecture alternative to CUDA is not going away.

Nvidia CUDA is renowned for bringing accelerated computing to the masses using Nvidia GPUs. As competition explodes in the accelerator market, CUDA appears to have become a walled garden in an increasingly open and transparent world. The desire for an open, multi-vendor, multi-architecture alternative to CUDA will not go away.

Elephant 3: Why not just use AMD HIP?

Elephant 3: Why not just use AMD HIP?

AMD Heterogeneous-Computing Interface for Portability (HIP) is a C++ dialect. AMD tools include a “HIPify tool” to help transform CUDA code into HIP. AMD states that “HIP code can run on AMD hardware (through the HCC compiler) or Nvidia hardware (through the NVCC compiler) with no performance loss compared with the original CUDA code.”

AMD Heterogeneous Computing Portable Interface (HIP) is a C++ dialect. AMD tools include the "HIPify tool" to help convert CUDA code to HIP. According to AMD, "HIP code can run on AMD hardware (via the HCC compiler) or Nvidia hardware (via the NVCC compiler) without any performance penalty compared to the original CUDA code."

HIP is a “follow CUDA” strategy – i.e., where AMD develops an update to HIP as quickly as possible after Nvidia has released an update to its CUDA platform. The arguments in favor of HIP rest on the virtue of reuse of a large CUDA codebase for AMD GPUs. Unfortunately, given the opaqueness of CUDA no one can follow CUDA too closely, timely, or accurately. This offers no opportunity for AMD to expose unique AMD hardware innovation without forcing CUDA developers to change their code with #ifdefs for AMD GPUs.

HIP is a "follow CUDA" strategy , where AMD develops HIP updates as quickly as possible after Nvidia releases its CUDA platform updates. The argument in favor of HIP is based on the advantages of AMD GPUs reusing large CUDA code bases. Unfortunately, given  its opaque nature, no one can track CUDA too closely, timely, or accurately . Without forcing CUDA developers to change code using #ifdefs for AMD GPUs, AMD has no chance to showcase unique AMD hardware innovations.

While AMD has created value with HIP for those that seek AMD GPUs as an alternative to Nvidia GPUs, it is not hard to want more. Imagine having a solution that can keep pace with the feature innovation and performance of CUDA!

We believe that innovation will flourish the most in an open field rather than in the shadows of a walled garden.

[Editor’s note: There is a SYCL implementation called hipSYCL that sits on top of HIP and targets AMD GPUs running ROCm and Nvidia GPUs.]

While AMD has created value through HIP for those looking for AMD GPUs as an alternative to Nvidia GPUs, it's not hard to want more. Imagine having a solution that keeps pace with CUDA's feature innovation and performance! We believe that innovation will thrive in open fields rather than in the shadows of walled gardens.

[Editor's note: There is a SYCL implementation called hipSYCL that sits on top of HIP and targets AMD GPUs running ROCm and Nvidia GPUs. ]

Elephant 4: Why not just use OpenCL?

Elephant 4: Why not just use OpenCL?

OpenCL provides an open multivendor alternative, but at a lower layer of the software stack than SYCL or CUDA offers. SYCL grew out of a desire to bring the benefits of OpenCL’s open, multivendor, multiarchitecture approach by providing a standard C++ interface for heterogenous parallel architectures. SYCL implementations often utilize OpenCL for their implementations, but also have the flexibility to use other backends under the hood as of SYCL2020. SYCL delivers on the promise of OpenCL, in a higher productivity form through its C++ abstractions.

OpenCL provides an open, multi-vendor alternative, but at a lower software stack level than those provided by SYCL or CUDA. SYCL was born to take advantage of OpenCL's open, multi-vendor, multi-architecture approach by providing a standard C++ interface to heterogeneous parallel architectures. SYCL implementations are typically implemented using OpenCL, but starting with SYCL2020, there is also the flexibility to use other backends behind the scenes. SYCL delivers on the promise of OpenCL in a more productive form through its C++ abstraction.

Elephant 5: Can’t we just use C++ ?

Elephant 5: Can't we just use C++?

Let’s start with the assumption that we want to program heterogeneous machines, we value portability, and we do not want to pay a penalty in performance for portability.

Let's first assume that we want to program heterogeneous machines, we value portability, and we don't want to pay a performance penalty for portability.

We might answer ”yes” – C++ is enough when you have SYCL support too. After all, C++ was built to be extended by template libraries like SYCL. SYCL adds no new keywords, but it does benefit from SYCL-aware C++ compilers to help with cross-compilation, fat binaries, and remote memories. Those are simply things C++ compilers have not historically made easy.

We'd probably answer "yes" - C++ is sufficient when you also have SYCL support. After all, C++ is built to be extensible through template libraries such as SYCL. SYCL adds no new keywords, but it does benefit from a SYCL-aware C++ compiler to help with cross-compilation, fat binaries, and remote memory. These are things that historically C++ compilers have not been easy to do.

SYCL also offers a solution today, within standard C++, to address programming for full heterogeneous computing built on top of ISO C++. This includes device enumeration (info), defining work (kernels), submitting and coordinating work across devices (queue), and managing remote memories.

Today, SYCL also provides a solution in standard C++ to solve the programming problem of completely heterogeneous computing built on top of ISO C++. This includes device enumeration (information), defining work (kernel), submitting and coordinating work across devices (queues), and managing remote memory.

That brings us to “No” – the C++ standard does not define support for heterogeneous systems with disjoint (non-coherent) memories. Some think it will add that one day, and there is effort to go in that direction, but even those involved believe the current direction will take at least 10 years and it is limited by the need for C++ to maintain backwards compatibility with millions of lines of existing code. In fact, one of us (MW) has written papers urging C++ in that direction. The response from WG21 (ISO C++), understandably because of the backward compatibility concerns, has been to start with parallel algorithms and executors, and add forward progress guarantees instead of making radical surgical change to the memory and addressing model. Therefore, if you are programming heterogeneous machines it is not likely to be enough to claim “C++ is enough.” There are some trying to move in that direction and that is the beauty of a competitive industry, we can see what will work out in the best interest of the market and consumers. However, today what will work immediately is “C++ plus SYCL” or “C++ plus CUDA” or “C++ plus OpenCL.”

This brings us to the conclusion "no" - the C++ standard does not define support for heterogeneous systems with disjoint (non-coherent) memory. Some people think this will be added one day, and are working in that direction, but even those involved think the current direction is at least 10 years away, and that it is hampered by the need for C++ to remain backwards compatible with millions of lines sexual restrictions. existing code. In fact, one of us (MW) has written a paper urging C++ to move in this direction. For reasons of backward compatibility, WG21 (ISO C++) reacted by starting with parallel algorithms and executors and adding forward progress guarantees, rather than making fundamental surgical changes to the memory and addressing models. So if you're programming on heterogeneous machines, claiming "C++ is enough" may not be enough. There are people trying to move in this direction, and that's the beauty of a competitive industry, we can see what's going to be in the best interest of the market and the consumer. However, what works immediately today is "C++ plus SYCL" or "C++ plus CUDA" or "C++ plus OpenCL".

The purpose of adding SYCL support into our C++ compiler and runtimes, is to add capabilities so C++ supports full heterogeneous support that it does not offer today without SYCL. It is also a way to show how C++ can support heterogeneity in the future, as ISO standards tend to standardize best practices of pre-existing knowledge. We will show one such example below.

The purpose of adding SYCL support to our C++ compiler and runtime is to add functionality so that C++ supports complete heterogeneous support that C++ currently cannot provide without SYCL. It's also a way to show how C++ can support heterogeneity in the future, as ISO standards tend to standardize best practices based on existing knowledge. Below we will show an example of this.

Elephant 6: Can SYCL queues can make it into ISO C++?

Elephant 6: Can the SYCL queue go into ISO C++?

Queues are how SYCL assigns work to heterogeneous devices, including handing off data within complex memory systems (not necessarily unified and coherent).

Queues are SYCL's way of distributing work to heterogeneous devices, including passing data within complex memory systems that are not necessarily unified and consistent.

It is easy to speculate on whether a queue class belongs in C++ long-term, but such speculation is premature.

In the long run, it is easy to speculate whether a queue class belongs in C++, but such speculation is premature.

Proposals for C++23 have included various constructs to direct execution to specific devices, including “std::execution” in p2300. We know C++23 will continue to rely on a unified global memory address space and will not support disjoint remote memories (complex memory systems).

Proposals for C++23 include various structures for direct execution to a specific device, including "std::execution" in p2300. We know that C++23 will continue to rely on a unified global memory address space and will not support disjoint remote memories (complex memory systems).

It is easy to get caught up on syntax. Eventually, if C++ expands to include full heterogeneous support, the concepts embodied in SYCL queue will be needed. Until then, SYCL fills this void. Some important capabilities, such as parallel directives, and message passing, have remained independent standards (OpenMP and MPI). While it is possible C++ will not grow to include full heterogeneous support, we believe C++ will eventually add such support incrementally.

It's easy to get bogged down in grammar. Eventually, if C++ is extended to include complete heterogeneous support, the concepts embodied in the SYCL queue will be required. Until then, SYCL has filled the gap. Some important features, such as parallel instructions and message passing, remain separate standards (OpenMP and MPI). Although C++ may not evolve to include complete heterogeneous support, we believe that C++ will eventually add such support over time.

C++ aims to standardize established best practice instead of inventing new and unproven features, therefore SYCL is an important steppingstone as one of the many feeders of ‘established best practice’ into the intentionally slower moving C++ standardization process.

The goal of C++ is to standardize established best practices rather than invent new and unproven features, so SYCL is an important stepping stone as "established best practices" into the intentionally slow-moving C++ standardization process. One of many feeders.

As C++23 settles, and C++26 is considered, the future of C++ for heterogeneous computing will begin to take shape, including syntax but likely a full solution will not emerge for another 5-10 years.

As C++23 stabilizes and C++26 is considered, the future of heterogeneous computing in C++ will begin to take shape, including syntax, but a complete solution may not appear for another 5-10 years.

SYCL offers a solution today, within standard C++, to address programming for full heterogeneous computing. This includes device enumeration (info), defining work (kernels), submitting work to devices (queue), and managing remote memories.

SYCL now provides a solution in standard C++ to the programming problem of completely heterogeneous computing. This includes device enumeration (information), defining work (kernel), submitting work to the device (queue), and managing remote memory.

Elephant 7: Who is behind SYCL? Is it really Open in the true sense of the word?

Elephant 7: Who is behind SYCL? Is it really open in the true sense of the word?

We believe that open, international standards and Open Source Software (OSS) projects are good for everyone. When individuals from Intel and Codeplay get involved, we have found that they work hard to help develop and promote such standards and OSS – from WiFi, USB, PCIe to OpenMP, MPI, Fortran, C, C++, OpenCL, and SYCL.

We believe that open international standards and open source software (OSS) projects are good for everyone. When individuals from Intel and Codeplay get involved, we see their efforts to help develop and promote such standards and OSSs—from WiFi, USB, PCIe to OpenMP, MPI, Fortran, C, C++, OpenCL, and SYCL.

Apple was the original force behind OpenCL, which began as a set of C interfaces at a fairly low level. SYCL originally grew out of efforts within OpenCL to consider higher level interfaces, specifically using C++. After multiple years of very open debates, SYCL was born. Codeplay has been instrumental in SYCL from the very beginning. Intel’s interest in SYCL grew after entering both the FPGA market and announcing the Intel Xe architecture to include GPUs for compute. Intel is proud to be an active member in the SYCL committee, and an active contributor to implementations to support SYCL. SYCL is a community effort, and the homes of both authors of this article (Intel and Codeplay) are enthusiastic participants along with many others.

Apple was the original force behind OpenCL , which began as a set of fairly low-level C interfaces. SYCL originated from efforts within OpenCL to consider higher-level interfaces, specifically using C++. After years of public debate, SYCL was born. Codeplay has played an important role in SYCL from the beginning. Intel's interest in SYCL has grown after entering the FPGA market and announcing that the Intel Xe architecture includes GPUs for computing. Intel is proud to be an active member of the SYCL committee and actively contribute to support the implementation of SYCL. SYCL is a community effort, and two of the authors of this article (Intel and Codeplay), as well as many others, are enthusiastic participants.

Elephant 8: I see a herd of elephants – why should I believe in SYCL?

Elephant 8: I saw a herd of elephants - why should I trust SYCL?

If you have not yet needed to program an application for multiple heterogeneous machines, you may not yet feel the pain to really understand why we are so excited about the prospects for SYCL. Questioning the need is quite logical.

If you haven't had to write applications for multiple heterogeneous machines, then you probably haven't really understood why we're so excited about the prospect of SYCL. It is very logical to question this need.

There are many use cases for heterogeneous programming. In our CPPCON 2021 tutorial, we taught programmers from large companies, small companies, and national labs, how to offload high throughput workloads to specialized accelerators.

There are many use cases for heterogeneous programming. In our CPPCON 2021 tutorial, we teach programmers from large companies, small companies, and national labs how to offload high-throughput workloads to specialized accelerators.

Based on many experiences like that, we have every reason to be confident that interest in SYCL will continue to grow at a rapid pace because of the need for C++ programming for heterogeneous platforms.

Based on many similar experiences, we have good reason to believe that interest in SYCL will continue to grow rapidly due to the demand for C++ programming on heterogeneous platforms.

If you believe in the power of diversity of hardware and want to harness the impending explosion in architectural diversity, then SYCL is worth a look. Not only it open, multivendor, multiarchitecture play – but it is the key one for C++ programmers (as detailed in “Considering a Heterogeneous Future for C++”).

If you believe in the power of hardware diversity and want to take advantage of the coming explosion of architectural diversity, SYCL is worth a look. Not only is it an open, multi-vendor, multi-architecture game, but it's also key for C++ programmers (see "Considering C++'s Heterogeneous Future" for details).

Open, Industry Standards are Critical to Enable High-Volume Markets

Open industry standards are critical to enabling high-volume markets

New technology often starts as proprietary developments, which may be sufficient to enable niche applications and markets. But, as these niche applications grow into technology ecosystems, so does the need for competition and industry standardization to enable widespread adoption. Accelerated computing, for many years only a niche capability, has certainly emerged with the status of “here to stay.” Multiple factors contributed to this, and they are not all going away (power wall, IPC wall, memory wall).

New technologies often begin with proprietary development, which may be sufficient to enable niche applications and markets. However, as these niche applications grow into technology ecosystems, competition and the need for industry standardization increase to achieve widespread adoption. Accelerated computing has been a niche feature for years, but has certainly emerged as a "long-term presence." There are many factors that contribute to this, and they don't all disappear (power wall, IPC wall, memory wall).

SYCL and related efforts, like oneAPI, were introduced to bring open, industry standards to the historically proprietary universe of accelerated computing.

SYCL and related efforts such as oneAPI were launched to bring open industry standards into the historically proprietary world of accelerated computing.

The biggest question is: how many influencers are eager to promote a move to standards, vs. how many are locked up by proprietary interests?

The big question is: How many influencers are eager to push standards forward, and how many are hamstrung by proprietary interests?

As the Cambrian explosion of novel computer architectures unfolds, the case for open, multivendor, multiarchitecture standards only grow stronger.

As the explosion of new computer architectures unfolds, the need for open, multi-vendor, multi-architecture standards will only become stronger.

SYCL is an open standard that invites feedback and contributions from everyone to the standard and the open source projects to implement it. The shared goal by everyone involved is to unambiguously ensure paths to high performance for all accelerators in this exciting new golden age for computer architecture.

SYCL is an open standard that invites everyone to provide feedback and contributions to the standard and to the open source projects that implement it. The common goal of all participants is to clearly ensure that all accelerators achieve high performance in this exciting new golden age of computer architecture.

About the Authors

James Reinders believes the full benefits of the evolution to full heterogeneous computing will be best realized with an open, multivendor, multiarchitecture approach. Reinders rejoined Intel a year ago, specifically because he believes Intel can meaningfully help realize this open future. Reinders is an author (or co-author and/or editor) of ten technical books related to parallel programming; his latest book is about SYCL (it can be freely downloaded here). 

Michael Wong is the Distinguished Engineer at Codeplay Software. He is a current Director and VP of ISOCPP Foundation, and a senior member of the C++ Standards Committee with more than 25 years of experience. He is a member of the C++ Directions Group. He chairs the WG21 SG19 Machine Learning  and SG14 Games Development/Low Latency/Financials C++ groups and is the co-author of a number C++/OpenMP/Transactional memory features including generalized attributes, user-defined literals, inheriting constructors, weakly ordered memory models, and explicit conversion operators. He has published numerous research papers and is the author of a book on C++11. He has been an invited speaker and keynote at numerous conferences. He is currently the editor of SG1 Concurrency TS and SG5 Transactional Memory TS. He is also the Chair of the SYCL standard and all Programming Languages for Standards Council of Canada. Previously, he was CEO of OpenMP involved with taking OpenMP toward Accelerator support and the Technical Strategy Architect responsible for moving IBM’s compilers to Clang/LLVM after leading IBM’s XL C++ compiler team.

[1] Elephants in the Room can be defined as important questions that are obvious, but no one mentions them because they make at least some persons uncomfortable.

You’ve all seen this, why don’t we chat a little bit!

  1. From the perspective of (domestic) chip companies, we do not want to consider that users may need to write applications facing multiple heterogeneous machines. But this is what the market needs, and this kind of revolutionary idea will only come from a third party.

  2. I know that Codeplay was fully acquired by Intel this year. But is there a place for such a company to survive in China? If companies like Pengfeng Technology and First-Class Technology that engage in basic software research and development are rare in China in recent years, and if they cannot survive, what hope is there for China's computing industry? I also hope that investors will not distort such small and beautiful software companies, but help them, so that everyone can succeed together.

Guess you like

Origin blog.csdn.net/weixin_45571628/article/details/132429658