Chris Lattner, the father of LLVM: My philosophy on building AI infrastructure software

3b6e18757800970e63bec66deae4e2e7.png

Source|Latent Space
OneFlow compilation

Translation | Yang Ting, Wan Zilin

If AI is so important, why is software so bad?

Before founding Modular, this was a problem that had troubled Chris Lattner, the father of LLVM for a long time. He then collaborated with former TensorFlow product colleague Tim Davis on modular solutions to the problem of large, single, fragmented platforms in AI development. In 2022, they launched Modular with US$30 million in seed funding. Following the successful launch of the AI ​​engine Modular and programming language Mojo in May this year, they recently received US$100 million in Series A financing.

While Mojo is getting attention for its multi-threaded compiled superset of Python and its impressive performance, it's just a side project, and the vision for Modular's Python inference engine is equally ambitious.

Chris's achievements in compilers are outstanding. He developed LLVM during his PhD, for which he received the 2012 ACM Software Systems Award. He then joined Apple and created Clang and Swift (the iPhone programming language that replaced Objective-C). After that, he led the TensorFlow infrastructure team at Google and developed the XLA compiler and MLIR compiler framework. At this time, his task was not to build a set of compilers most suitable for AI, but to build the best compiler for TPU processor so that all TensorFlow users can have a great experience on Google Cloud.

In his view, Meta’s PyTorch team is not improving AI performance in all areas, but mainly serving their recommendation and advertising systems. Chris and Tim realized that big tech companies were not prioritizing AI engines and developer experience, so they decided that Modular was the best way to deliver the AI ​​development platform of the future.

Although Chris is recognized as one of the top compiler engineers in the industry, he did not simply apply the compiler approach to Python. Instead, he chose a completely different approach.

Modular’s ​​initial goal is to build a “unified AI engine” to accelerate AI development and inference. Unlike the GPU-everything AI world, where only those “GPU-rich” can benefit, Modular treats AI as a large-scale, heterogeneous, parallel computing problem. They hope to run AI in a more extensive and inclusive way, not just relying on GPUs, but exploring the use of multiple computing resources to meet the AI ​​computing needs of different applications, different environments and different needs.

4caf4f3f8030af95e19ea6cd8acf220d.png

As a standalone AI engine, Modular can run all other frameworks on the CPU, speeding up these frameworks by 10% to 650% compared to their original performance (GPU support coming soon).

Mojo is part of Modular, a new AI programming language and a superset of Python that was originally created as an internal DSL to make teams more productive.

96ef9830759f6d7b47cb042f41a4f0d0.png

Mojo adopts the syntax of Python. At the same time, it also supports all existing PyPi packages. Developers do not need to make any code changes when switching from Python to Mojo. In terms of performance, Mojo runs up to about 68,000 times faster than Python, and most Python code bases can run 10 to 100 times faster after switching to Mojo with only minor adjustments.

The Modular team is undertaking a difficult technical challenge. In the latest episode of Latent Space’s podcast, Chris Lattner explains why he developed Modular and Mojo, and shares what he sees as the challenges and future of AI infrastructure software development.

(This article is compiled and published by OneFlow with authorization. Please contact us for authorization for reprinting. Original text: https://www.latent.space/p/modular#details)

1

Unified AI engine platform

Alessio: Modular’s ​​goal is to build a unified AI engine, which means that the situation faced by the AI ​​engine is fragmented. Starting from the source, what are the problems that need to be solved in the field of AI research and development?

Chris:Although the development of AI is obvious to all, looking back from 2015 to 2017, for those who follow it, the development in the field of AI is truly amazing. The technology of that era was mainly driven by TensorFlow and PyTorch, and although PyTorch came out slightly later, their designs were somewhat similar in some aspects.

The challenge is that the people who build these systems mainly focus on various aspects such as artificial intelligence, research, differential equations, automatic differentiation, etc., and do not consider solving the problem of hardware and software boundaries. So they thought there needed to be a way for people to set up layers, similar to Keras or NNModule.

Under the hood, these components called operators do the work, so you get different types of operations, such as convolutions, matrix multiplications, reductions, and element-wise operations. So how do you implement these operations? Can be built using CUDA and Intel's math libraries such as Intel MKL.

It looks good at first, but even if you just launch a new Intel CPU model, you will only get a very limited number of operators. However, TensorFlow and PyTorch now have thousands of operators. The result is that each operator will have a corresponding kernel, which is usually written manually. Therefore, when a new piece of hardware is introduced, thousands of kernels need to be reproduced, which makes it very difficult for people to enter the hardware field.

On the other hand, only a few people know how these kernels work, and the skills required to master them are so different from those required to innovate model architectures that research is difficult.

When I was involved in the development of the Google TPU, the initial goal was primarily to inspire and drive a lot of research. Before I joined, someone had already proposed a novel idea, which was to use a compiler for processing instead of manually writing thousands of kernels and rewriting all the operators, trying to do what Intel or NVIDIA did.

In contrast, compilers are more scalable and versatile, and can combine kernels in different ways. There are also many important optimization techniques, including kernel fusion that can significantly reduce memory traffic. If you use traditional handwritten kernels, you get specific combinations that people find interesting, rather than new things that researchers want to study next, but a lot of research is trying to explore new areas.

So we bet on compilers and developed the XLA system, which is part of the Google stack that makes exascale supercomputers possible, and a lot of amazing work has been done on top of that. But the biggest problem is that it was originally developed to support hardware like the Google TPU.

Compiler engineers need to be hired now, and compiler engineers who are familiar with machine learning and all these areas are even rarer. From a technical perspective, if you work at Google and have access to the hardware, you can scale it, otherwise it's a real challenge.

So one of the reasons I love the NVIDIA platform in particular is that despite all the complaints about CUDA, if we look back at the moment when AI took off, like the combination of deep learning and AlexNet, a lot of people attribute that moment to two factors : The combination of data (such as ImageNet) and computing power (powerful GPU) makes the birth of AlexNet possible.

However,they often overlook the third factor, programmability. It is precisely because CUDA allows researchers to create convolution kernels that did not exist before. There were no frameworks like TensorFlow, nor the rich tooling available today. In fact, it is the trinity of data, computing power and programmability that opens a new research paradigm and forms the entire wave of deep learning systems.

Therefore, we must draw on historical experience and adopt a modular approach to think about how to move into the next era. How to get the amazing advantages that people have in terms of algorithmic innovation, ideas, sparsity, etc., and those ideas in the fringes of research that might be relevant. How should we take full advantage of the compiler? The compiler has broad applicability and strong extensibility, and can handle new problems. Finally, how do we take full advantage of programmability and bring all of these things together?

Alessio: I remember you mentioned that people who are unfamiliar with certain areas and cannot contribute will be left behind, and CUDA does a great job in this regard. What are your thoughts on AI development in a post-modular world like this?

Chris:My belief is that humans are amazing, but we can’t hold everything in our heads. People have their own strengths, and if we can get them to work together and Understanding the internal principles of these architectures can create results that exceed individual abilities and achieve new breakthroughs.

I think the key is how to promote a virtuous cycle and how to promote innovation? How do you get more people from different fields who understand the problem to actually collaborate? That's why our work with Mojo and the AI ​​engine is so important, we're really committed to taking the complexity out of this problem.

Many systems have been built that simply come together as a useful tool to help me solve a problem, however, they are not well designed from top to bottom. I believe that modularity provides a simpler, more orthogonal, more consistent, and more principled stack that allows us to remove complexity from the entire stack. But if you're basing your work on all this fragmented historical infrastructure, you're just hoping for the best.

In addition, many AI research systems in particular have this kind of preset path, and they can only run normally if they follow accurate demonstration steps. As soon as you change it even slightly, everything breaks down, performance degrades or even stops working. This is the product of underlying fragmentation.

Swyx: So, compilers and languages ​​are the medium through which humans can collaborate or cross boundaries.

Chris:I've been working on compilers for a long time, but we start with the problem, and if the compiler or compiler technology helps solve the problem, then It can be used without putting the cart before the horse.

You may ask, what is the use of a compiler? What is its fundamental purpose? The thing is you don't need to know too much about the hardware. You can write very low-level assembly code for each problem, but the real role of a compiler, programming language, or AI framework is to allow you to express things at a higher level of abstraction, and this goal has multiple functions.

One of the goals is to make everything easier, and the other is that if you can eliminate a lot of extraneous complexity, it will make room for new complexity. So this is really about reducing non-core complexity so that you can focus on solving the essential complexity of the problem. Another goal is to provide more possibilities through abstraction, especially with modern compilers like the ones we're building, which pay infinite attention to detail.

Humans do not have this ability. It turns out that if you hand-code a bunch of assembly code and then run into a similar problem, it only takes a few tweaks to solve the problem rather than a full first-principle analysis, and the compiler can actually do a better job than a human can. .

Higher levels of abstraction also give you other capabilities. I think what's very exciting about deep learning systems and systems like the one Modular is building is that it's already taking computing to the graphics level. Calculations can be transformed once you've separated them from the messy for loops and semicolon-ridden code into a more declarative form.

This is something that many people don’t realize yet, and to the extent that it’s very difficult to implement within existing systems. Because a large part of the capabilities provided by abstraction is the ability to perform operations like Pmap and Vmap, that is, to transform calculations. One of the inspirations I gained while working at Google was that you can start from the bottom, from single-node GPU machines to clusters to asynchronous programming, and do a lot of basic work like this.

When I left Google, we were doing research and training on supercomputers running over 100 billion operations per second in Jupyter Notebooks. This is a huge leap forward in terms of technology, thanks to a lot of clearly layered and well-designed systems, based on a lot of innovative HPC-type hardware, and a series of breakthroughs. So I'd like to see this technology more widely adopted, promoted and made available, while also hoping that it can resolve all the complexities that have accumulated along the way.

Alessio: You mentioned the relationship between the framework layer and the hardware layer. When your AI engine was first released, it asked how many petaflops of performance it could achieve on the A100, but I found that your official website currently discloses only powerful performance information about the CPU.

Chris:That’s why we work from first principles. You have to do everything from scratch, and if you do it right, you shouldn't skip any important steps. Many people believe that today's AI systems are designed for GPUs and are debating the performance of GPUs, but in my opinion, AI is actually a large-scale heterogeneous parallel computing problem.

Therefore, the starting point of AI is usually data loading, but the GPU is not responsible for loading data. You have to do a series of work such as data loading, preprocessing, network transmission, etc., and then perform a lot of matrix multiplication and other calculations. Next, you need to perform post-processing and send the results over the network or store them on a hard drive. And it turns out that the CPU is necessary to drive the GPU.

At the same time, when you prepare software for accelerators, you find that you are only solving part of the problem because they only focus on matrix multiplication or the part they think is important, so you build a system that can only do what the chip can do a part of.

You never have time to solve the real main problem, so when using tools like TensorFlow or PyTorch, you will find that the CPU work is largely run through Python, such as tf.data etc. These tools are not programmable, not scalable, in many cases very slow, difficult to do distributed processing, and very confusing.

Therefore, today's CPUs already have tensor cores. They were just given some funny names, similar to the AMX instructions, and the reason was that previous CPUs and GPUs were completely different. As time goes on, GPUs become more programmable and more similar to CPUs, and CPUs become more parallel, and we are in the process of diversifying this technology.

When we start modularizing, if you look at the problem from a technical perspective, it's useful to build a general architecture because it can be specialized.

As I found in my research on XLA and some other stacks,It is very difficult to start with a specialized architecture and then generalize. In addition, the actual situation shows that different people spend different costs on different aspects of AI and invest in different fields, but training expands the size of the research team, and inference expands the size of the product and user base. and other aspects. Therefore, many inference tasks are still performed on the CPU today. Therefore, we should start with the CPU and improve its architecture.

CPUs are also easier to use, they don't go out of stock, and are easier to deal with for a variety of other reasons. We demonstrate that it is possible to build a general architecture that scales across different accelerator families. We also showed that we can handle different processor types such as Intel, AMD, and ARM Graviton, and even provide a lot of support for strange combinations inside Intel CPUs.

We can provide better than vendor-provided software through a more versatile and flexible programming approach for different vector lengths and other complex operations. We are doing targeted development for GPUs. You’ll benefit from a well-thought-out, clearly layered stack with the right DNA. So over time we will gradually expand to different types of accelerators.

2
The Challenge of Building an AI Engine

Alessio: In my opinion, CPU is the focus of people's attention, that's why you see various technologies such as LLAMA.cpp etc. And methods, most people think by quantizing the model so that it can run on the CPU, but your idea is more radical, have you redesigned the entire engine? What are the differences in approach? What were the challenges of actually building the engine?

Chris:One of the characteristics of modularity is that it takes a harder way to get better results. Many people on our team have been involved in the development of various systems, and even some of the best minds in the industry have created some exotic tools or technologies such as ONNX Runtime. A lot of people have been working on different systems, and the challenge with those systems is that many of them were designed five to eight years ago, and as the systems evolve, it becomes increasingly difficult to make fundamental changes. So we decided to start from scratch, which is definitely more difficult, but the team and I love building new things.

In fact, TensorFlow, PyTorch and all these systems still use the same thread pool as Caffe. It is widely believed that this is a huge vulnerability that causes serious performance issues and makes the latency during inference extremely unpredictable. This problem requires a series of very specific design choices to make the thread pool block and remain synchronized.

It's like there's something wrong with the entire underlying architecture. Once this error occurs, it cannot be undone. Therefore, our thread pool assumes that no tests can block. This requires very lightweight threads that are directly related to everything built on top of them. Then we move on to how to express the kernel. We still want to be able to write the kernel by hand, start with prototyping in C++, and then move further into the Mojo stage.

So we built a very sophisticated automatic fusion compiler that uses all the latest technologies while going beyond existing technologies because we know users hate the limitations of static shapes and the lack of programmability. They don't want to be limited to just tensors, for example many LLMs use irregular tensors and so on.

So what you're hoping to do is, from first principles, bring together all the troubles you've experienced and felt in other systems that you've never had the opportunity to do in the past because of schedules and constraints and so on. make any changes. But now, we can build the right thing to make it scalable. That's the approach we've taken.

So a lot of it is pretty routine, but it's very in-depth design engineering that requires a real understanding of the second- and third-order impacts of every decision. Fortunately, much of this is no longer in the research stage.

Swyx: You're very focused on first principles thinking, but I think your principles are different than most people's. These insights are the result of your extensive work on AI over the years. What are they specifically?

Chris:I'm not sure I have a specific set of principles, a lot of our work is about unlocking the potential of a lot of hardware and doing it in a very easy-to-use way. Therefore, our starting point is often not to enable a new feature, but to remove complexity that is difficult for people to handle, so as to achieve the goal.

So it's not research, it's design and engineering. On that note, we're also looking to get the most out of any given piece of hardware. So if you talk to an LLM company that spent $200 million on an A100 GPU for a specific memory size, what they want is to get the most value out of that chip, not a lowest common denominator solution.

On the other hand, many people want greater portability, generality, and abstraction. So thechallenge is how to implement and design a system that gains abstraction by default without giving up full performance? Those compiler systems for machine learning and other types have actually given up all performance. They just try to cover a certain feature in the space. So having and designing a system for that is very important to what we're doing right now.

On other fronts, it's important to empathize with users. Many people who are obsessed with technology ignore the huge difference between the people who use technology and the people who build it. In developer tools, it is important to understand that the developers using the tools do not want to know too much about the technology.

The very interesting thing about working with compilers is that no one wants to care about the compiler. You might be developing a Mojo application or a C application, or some other kind of application, but you just want the compiler to not get in your way, or to tell you when something goes wrong. You only care about it if the compiler becomes too slow, breaks down, or has other problems.

So, the same goes for AI technology. How much of the user's time is spent fighting with the tool when building and deploying a model, rather than because it hits a special situation that throws you off track and then you get a bunch of crazy Python stack traces from some tool? information. Therefore, empathy for users is important, especially as AI infrastructure is immature, and empathy has never been one of the core values ​​of tool builders.

3
Mojo's birth

Alessio: By far the easiest to understand is Mojo, which is a superset of Python. I'm sure this wasn't a fad, what limitations do you see?

Chris:When we started the Modular project, we had no intention of building a programming language. We just had to create a language to solve the problem. Next is dealing with thread pools and other very basic processes.

How do we integrate with existing TensorFlow and PyTorch systems? This proved to be very technically complex. But then we get into a deeper category, which is pushing hardware acceleration. We decided to invent a set of very professional, low-level compiler technologies, with automatic fusion and other functions, designed for cloud computing. After all, there is more than one computer in the world.

Humans excel at algorithms due to the complexity of the hardware, but attention span is not always the key to problem solving, which leads to some requirements. Therefore, we built this pure compiler technology and verified it to show that we can generate very high-performance kernels.

In the next stage we manually wrote these very low-level MLIR codes. We were happy with it at the time, but the team hated writing the code manually. So a grammatical rule is needed. There was an option to create a Domain-Specific Language (DSL) similar to Halide, or other solutions.

Building a programming language is a more difficult way, but gives better results. There were similar tools like Halide or OpenAI Triton, and while the demos were great, the problem was that their debuggers were terrible, the tools weren't easy to use, and usually the people who were best at using them were the people who created them.

Therefore, we decided to build a complete programming language. I've built Swift, so I know what to do, but I also understand that it's a daunting task.

Obviously, the machine learning community is focused on Python. When making in-depth decisions, we found that there are many languages ​​​​similar to Python, but they are not widely adopted, and there are huge problems, which may also lead to community divisions and other consequences. So we tried to build a language that would take time to perfect, and eventually it would be a superset of Python.

In fact, Python's syntax is not the most important. The key lies in the community, that is, the accumulation of experience and skills of programmers. So building a language that looks like Python but isn't really Python was never the goal. Our ultimate goal is to achieve higher quality results by building Mojo, even if it takes longer.

Today, the Modular company has the most important Mojo developers. When building a language, it's actually very important to use it yourself. This was a mistake we made during the development of Swift, we built Swift to solve the problem of people not liking Objective-C syntax, but before we launched, there were no internal users.

Mojo is a more lightweight language than Swift. We're actually using Mojo, which is the powerhouse that drives all the cores in the engine. We thought it would be valuable and attractive to others as well, so we released it as a standalone project.

Mojo will one day mature, and we hope to be able to build a large community around Mojo. To achieve this goal, we will open source Mojo. This is a very significant thing, and once we decide to do it, we want to do the best we can.

Alessio: We went through a chaotic period from Python 2 to Python 3, which is a stage that no one wants to recall. Some of Mojo’s features are great, so why doesn’t it support Python 2? How do you plan to keep the two languages ​​in sync in the long term? How are they related to each other?

Chris:Guido knew about Mojo before it was released, communicated a lot with our team, and occasionally asked us questions on Discord.

Guido van Rossum (the father of Python) likes to poke fun at us, and of course, that's a good thing. Mojo is a member of the Python family, along with PyPy, Cython, and others. We want to be part of the Python family and want Python to continue to grow and add new features.

So will Mojo. It's like going back thirty years ago when the C language was born, and then around 1983, C++ suddenly appeared, which was the C language with classes. Python not only has classes, but also includes all operations normally performed in C language. Initially, C and C++ were two different communities, but the two communities had a lot of exchanges and influenced each other, resulting in each having some characteristics of the other.

I hope the same convergence happens between Python and Mojo. Python 3 and Mojo are like C++ and the C language. Python 3 is defined by its runtime and a specific object model, while Mojo is defined by a richer expressiveness enabled by things like our nifty MLIR compiler. My hope is that Mojo will become a superset of Python, i.e. more powerful in all aspects, but those features will evolve in parallel. Our goal is not just to add general functionality like the "Walrus 2" operator, but to add system programming features.

4

Why Start Modular

Swyx: Modular was inspired by SciFive (a company that creates free and open source processor architectures). You have had conversations with some large cloud service providers, but they are not interested in Modular. In the end, do you think Modular will be used in large-scale applications? Not the best choice on the cloud platform? How did you come up with the idea to start this company?

Chris:Around 2016, I started to get involved in many fields such as AI. At Tesla, I participated in the research and development of autonomous driving; at Google, I was responsible for a hardware project and tried to improve the architectural design of TensorFlow. At the time, I was dissatisfied with Google and its lack of emphasis on PyTorch, so I left Google and joined a hardware startup. But 2020 (right before the pandemic) wasn’t the right time because everything was so uncertain. PyTorch is still figuring it out and they have a lot of ambitious projects.

At the time, I thought Meta would solve these problems, so I joined a hardware startup to gain knowledge and experience in business strategy, commercialization, and company building. One of our software people at the time, Tim Davis (now co-founder of Modular), was also exploring his own path. In 2017, Tim and I joined Google Brain and worked closely together. I worked on the data center TPU, and he worked on the mobile side (such as Android, etc.); I worked as an engineer, and Tim worked on the product side. We complemented each other very well at work. Later, we began to think about the next step in our career development. In mid-2021, we found that the problem of AI infrastructure still existed and had not been effectively solved.

So, we started to analyze the problem in the field of AI. Simply put, if AI is so important to the world (even before ChatGPT), why is all the software so bad? Why is model deployment so difficult? Although we have done a lot of work to simplify the training process of the model, it is still very challenging to put it into production.

We have refined these questions and believe that the world's software can be divided into three categories. The first is hardware-specific software, such as CUDA, XLA stack or Apple's neural engine (CoreML). It's not the hardware people's fault because there is no off-the-shelf solution and they have to build a vertical software stack for the hardware. But the result of this is that it inadvertently caused the division of the software field and the fragmentation of the software ecosystem.

The second type is framework software. In terms of frameworks, we have TensorFlow, PyTorch, TVM, etc., which have been around for about eight years. The framework itself is research-based, created during different eras of machine learning, evolving over time and as new hardware and use cases emerged, but never well-designed with existing context in mind. Furthermore, because AI is so important to the companies they work for, these companies invest significant resources in it. There are quite a few framework developers, but they lack a clear vision.

In hindsight, it is easier to understand what AI will look like in five years than to predict its shape. We will encounter many known problems. Taking PyTorch as an example, its deployment is quite difficult, it is not suitable for many non-NVIDIA hardware, and there are also problems with the scalability of large language models. . These are well-known problems, but it is very difficult to fundamentally solve them. The engineers at PyTorch are working very hard and they are working hard to fix these issues, but due to the environment, fixing them is really difficult.

Hardware leads to software fragmentation, while frameworks are tied to the original architecture. Many people wanted to simplify AI and make it simple, hence the concept of MLOps. I think a lot of people try to simplify AI through extremely simplified APIs. AutoML is one example.

The third one is to try to add a layer of Python encapsulation on top of this large and complex system to simplify AI operations. However, core issues such as programmability, performance, hardware capabilities, novel algorithms, or security cannot be solved by adding a layer of Python wrapper on top. So, we decided to go back to first principles and think about the root causes of this confusion. We believe that this is mainly due to the lack of a unified access platform, so we decided to build software from the most basic level to solve this thorny problem.

Alessio: You have had a long career and are an amazing engineer. All of a sudden you found your own company, became CEO, and took on the roles of product owner and financing leader, all while being responsible for Building teams, coaching employees, etc., what experiences and lessons have you learned in these aspects?

Chris:At Modular, I have a very close relationship with my co-founder Tim. We can complement each other. It is very important to have a like-minded partner. An experience you don't have when working at Google or Apple. In addition, I have built many teams, products, and technologies within the framework of others. Now, we have our own framework, which frees us from the problems of other frameworks.

Before doing anything, we must think from first principles. We understand the problems faced by the AI ​​field and the pain experienced by AI practitioners. This is also the starting point for our establishment of Modular. Tim also felt this pain when he was working at Google. Working in a startup and working at Google are two very different feelings.

When we started the company, we divided the work and worked together. As an engineering leader, I am mainly responsible for engineering construction, building engineering teams, etc., while Tim is responsible for product and business work. He has interviewed hundreds of different companies to understand the pain points of these companies and the challenges they face in order to better Good to help these companies. Through these in-depth interviews, we truly crystallized our vision for the company and truly brought our colleagues together.

The challenge with Modular is that we're trying to solve a very difficult technical problem, and it's quite abstract. Although now that the project is up and running and the various parts are working together, we can publish some results, solving this problem requires expensive experts from large technology companies, which largely determines the initial state of our company and how we think about it. starting point.

When we think from first principles, we realize that for the above reasons we must raise large amounts of money, motivate employees, provide good compensation, etc., in addition to providing a comfortable working environment, these needs shape the way we do things. This process is very interesting.

Looking back,Is TensorFlow or PyTorch a product? My answer: no. But I also have to reflect on myself. I have been involved in projects such as Swift or Xcode. In a sense, they are products. There are product managers and an engineering team involved in the development, and they will They are delivered to customers, but they are not the company's core product. Xcode doesn't make money, Apple doesn't profit from it, and its relationship with customers is indirect and somewhat independent. This can easily lead to a disconnect between the research projects of the team or support teams such as TensorFlow and PyTorch and the needs of customers.

The Modular team works directly with customers, and we understand their pain and needs. Building and deploying these things is notoriously difficult in the AI ​​world. Sure, you can add a layer of Python on top, which seems simple, but that's not a pain point faced by leading companies and the top talent building these things.

The problem is that there is a lot of useless stuff around them,and one of our visions is to unify it all and do more with less. . Our vision grew out of conversations with teams who were facing issues where the products they were building were constantly changing as their requirements evolved and these companies were using multiple models. As a result, the teams we work with are usually quite complex. For various use cases, these companies have evolved many confusing systems, causing a lot of trouble. Therefore, we often have the following questions: Do I need a large number of engineers to deploy this model? Why is this needed?

Swyx: One thing I'm confused about is that you mentioned that Modular does not have a cloud service, but people use the Modular inference engine through your cloud service, which means that Modular has cloud engineers.

Chris:We do have cloud engineers. In fact, our products are used on the user's cloud platform. We provide a Docker container that allows Modular's inference engine to run in the cloud, on a local server, or on a laptop.

Modular's design philosophy attaches great importance to modularity, and users can use a variety of different platforms according to their needs. Although some users don't want to manage it themselves, the Modular team is really focused on meeting the needs of users, so we intend to gradually build a managed product to provide users with more convenience. In our view, requiring users to move everything to Modular's cloud platform would be valuable for users who don't want to manage it themselves, but would slow adoption of the technology.

Swyx: You have a lot of experience building teams and recruiting engineers. You have a unique advantage in this regard and people are happy to work with you. In terms of project management, do you have any experience or suggestions that you would like to share?

Chris:My job is to help the team win, and I will do whatever it takes to achieve that goal. Some people think that winning is a matter of course, but you must first figure out what winning means. That is to say, the team must have a clear vision and clear goals, and team members must coordinate with each other, especially when You have a group of really good people who all want to shine in what they do. If members have the same goal, we can make progress quickly, but if members are opposed to each other, it will affect the progress of our work.

I like to do everything myself and start from scratch. Not only does this directly show team members how to do things, it also shapes our team culture.

Efficiency is very important to the team, and if you just sit around and wait for 24 hours or three weeks for CI to run, it will slow everything down. That means testing strategy, it means these core software engineering issues are very important. Once you create a culture of independence and independence from third parties in your team, you can attract a lot of great talent, and you need to tap into the potential of these talents in your recruitment and follow-up efforts. I truly believe that a truly good person can solve any problem and make progress from it.

If you take great, passionate colleagues and let them work on something they really want to do, they have superpowers. Therefore, we need to ensure that team members deal with the right problems, develop their ability to do things and solve problems, and help them grow. Especially in a rapidly developing team, you not only have to focus on specific matters, but also check whether you have made a contribution, etc.

Generally speaking, there is a lot of focus on the product, on the customer and the problems that the customer is facing, but without a team, you can't solve the problem and build the product.

The above questions ultimately create a virtuous cycle and are critical for leaders and team building. This is one of the reasons why I like Tim and can enjoy working with him. He can effectively make up for what I am not good at, and we can learn a lot from each other.

In short, there are still many things we need to do to build a team. I love building things, but I won’t slow down for it.

5

About the development of the field of AI

Alessio: What is something that has been implemented in AI that you thought would take longer?

Chris:ChatGPT is an innovation in user interface, and its explosive development has made people aware of the power of AI. But looking back, I would have thought it would be several years before AI entered the public consciousness.

Swyx: What problems are you most interested in solving in the field of AI?

Chris:Different people have different definitions of AI. Some people believe that everything should be implemented with end-to-end neural networks instead of software. I think the balance between training algorithms and intelligently designed algorithms is an open question, personally it's not an either/or choice, for example if you want to build a cat detector then use convolutions Neural networks are a very good approach, but if you want to write an operating system, then loop statements should be your choice.

But how will these technologies change over time? How can we get application developers to think about these issues more consistently and not be limited to category A or category B? These are all open questions.

AI as a software development method will eventually become one of the tools people use to think about building applications. The applications here are not limited to iPhone applications, etc., but cloud services, data pipelines, etc., involving a series of complex links, and ultimately building products available to users. I think the AI ​​industry is still in its infancy and we haven’t quite grasped this yet.

Alessio: What’s interesting is that ChatGPT has been around for less than a year, and we have already gone from talking about AI security issues to AI will destroy the world, and all this just because of the emergence of a chatbot, which has to be thought-provoking.

Chris:I have to admit that in 2017, many people were paying attention to AI security issues, but I was very confused about it at the time. I didn’t think there was anything wrong with this issue. That’s debatable, but it turns out these guys were visionaries.

This is one of the reasons why I admire people like Geoffrey Hinton. They entered the industry before AI entered the public eye, witnessed and integrated a longer time span, and were able to look at the problem from a more comprehensive perspective. and achieve greater success.

Everyone else is watching

试用OneFlow: github.com/Oneflow-Inc/oneflow/icon-default.png?t=N7T8http://github.com/Oneflow-Inc/oneflow/

Guess you like

Origin blog.csdn.net/OneFlow_Official/article/details/133594077