Antirez, the father of Redis, personally uses a large model to code: he can replace 99% of programmers in the future

Antirez, the founder of Redis, wrote his first blog post in 2024. He talked about his feelings about large language models from the perspective of an ordinary programmer, although his achievements are not ordinary. In the article, he sharply commented that the Google engine has become a sea of garbage, and objectively evaluated the current AIGC capabilities: stupid but knowledgeable about ancient and modern times.

Through long-term use, he believes that the current stage of generative AI will only make already strong programmers stronger. At present, most programming tasks are repetitive work, and there is no need for large models to have a high level of reasoning. Large models are very suitable for programs that are "throw away after use". We translated antirez's blog post and made some deletions without changing the author's original intention.

Since the birth of ChatGPT, generative AI has been widely used, including various large models later run locally. On the one hand, my personal purpose is to improve my coding ability by relying on large models, and on the other hand, I also hope to free up precious energy from tedious and limited-value work. I believe that many friends are like me, spending countless hours searching for uninspiring technical documents, being forced to learn various overly complex APIs, and writing programs that become garbage in a short time. Work shouldn't be like this, and development shouldn't be like this. Nowadays, the Google engine has become a sea of garbage, and we have to work hard to find some useful content in it.

Also, I'm not new to programming myself. Even without any external resources, I can write code, and I can even say that I have a certain level of development. It's just that as time went on, I began to use large models more and more to assist in writing high-level code: Python code the most, but less so in C.

What impresses me most about large language models is that I can accurately realize when they can be used, and when using them blindly will only slow down progress. I also found that large models are actually very similar to various video courses on Wikipedia and YouTube: they are very effective for users who are willing, capable, and more self-disciplined, but marginal for friends who already lack business capabilities. Diminishing returns. So I'm worried that, at least at this stage, generative AI will just make already strong programmers even stronger.

Let’s start the discussion step by step.

Large language model: omniscient and omnipotent
, or parrot?

One of the most worrying phenomena in the new wave of machine learning is that AI experts still have limited knowledge of large models. Although we invented the neural network, what we actually invented was just an algorithm for automatically optimizing the parameters of the neural network. Hardware has been able to train larger and larger models, using statistical knowledge extracted from the data to be processed (a priori material), and then through a large number of iterative experiments to eliminate errors and approximate the correct answer. It must be admitted that large models do perform better than other architectures in the past. But overall, neural networks themselves remain extremely opaque.

Unable to explain some of the emerging capabilities of large models, scientists are expected to be more cautious. But at the other extreme, there are also many people who seriously underestimate large language models, believing that they are just some kind of more advanced Markov chains that can at best reproduce the limited changes seen in the training set. But a large amount of factual evidence shows that the theory that this large model is just "parroting" is simply untenable.

There are also many enthusiastic people who feel that the large language model has acquired some supernatural power that actually does not exist. It's not that mysterious. At best, large models can only interpolate the data representation space they have been exposed to during training, and this is nothing new. And even when it comes to interpolation alone, its capabilities are quite limited (but enough to exceed human expectations and even bring surprises). If you can go one step further and perform continuous interpolation in the space surrounded by all the code you have touched, then the large model will be enough to replace 99% of programmers, even if it cannot create truly novel things.

Fortunately, the reality is not so exaggerated, and we developers still have room to survive. The large language model can indeed write program forms that it has not been exposed to as it is, and it also shows its preliminary ability to guide the development direction by integrating ideas with different frequencies of occurrence in the training set. However, this ability currently has great limitations, and various subtle reasoning tasks will always cause large language models to suffer catastrophic failures. However, it must be admitted that large language models have represented the greatest achievement of AI technology since its birth, and this should be the premise of all discussions.

Stupid, yet knowledgeable about the past and present

This statement is true: large language models can only perform the most basic reasoning at best, which is not accurate enough and is often full of factual illusions and fabrications. But they also possess profound knowledge.

In the field of programming and other scenarios where high-quality data can be obtained, for example, large models are like stupid scholars who know everything from ancient times to modern times. It's not wise to pair program with such a partner (well, in my opinion it's not wise to even pair program with someone): they tend to throw out ridiculous ideas, and we need to constantly strive to emphasize ourselves in development ideas.

But conversely, if we treat this learned fool as a tool at our disposal, asking questions from it as a source of inspiration for us, the effect will be completely different. The current large models cannot yet lead humans to cross the knowledge gap, but if we want to solve a problem that we are not familiar with, they can often help us quickly move from knowing nothing to having the ability to fully learn by ourselves.

In the field of programming, programmers of the previous two or three decades may not have a high opinion of the ability of large models. After all, at that time we only needed to master a few programming languages, specific classic algorithms, and a dozen or so basic libraries. The rest was purely about self-expression, using talents, and applying professional knowledge and design skills. As long as we have this ability, we are well-deserved professional programmers, with the potential to solve all problems.

However, as time goes by, various frameworks, programming languages and libraries have begun to take turns. The explosive growth has made development more difficult, and has also brought many unnecessary and unreasonable troubles to programmers' daily work. Under such reality and background, an idiot teammate like Da Mo, who knows both ancient and modern times, has become the most valuable guide for progress.

For example: My own machine learning experiments were completed with Keras for an entire year. Later, for various reasons, I switched to PyTorch. At that time, I had already learned what embeddings and residual networks are, but I really didn’t want to study the PyTorch documentation word for word (I studied it like this when I was learning Keras. If I had ChatGPT, it would definitely help me avoid a lot of painful memories). Now that I have large language models, I can write Python code using Torch very easily. The only prerequisite is to have a clear idea of the model I want to combine and to be able to ask the right questions.

Speak with cases

Please note that what I’m talking about here is not those simple requirements, such as “How does class Not much difference. In contrast, complex models are capable of much more, including capabilities we could not have imagined just a few short years ago.

Now I can tell GPT-4: "Look, this is the neural network model I implemented in PyTorch. These are the batch tasks I set up. I want to adjust the tensor size so that the batch function is compatible with the input of the neural network, and at the same time Want to express it in this specific way. Can you tell me what code needs to be rewritten?" Once the prompt is complete, GPT-4 will write the code, and all I have to do is test the tensor result in the Python CLI Whether the dimensions meet the requirements and whether the data layout is correct.

Let’s look at another example. Some time ago, I needed to develop a BLE client for some ESP32-based devices. After some research, I found that most of the multi-platform Bluetooth programming bindings cannot be used directly, and the solution is very simple, just write the code in Objective C using macOS's native API. This requires me to deal with two problems at the same time: learning Objective C’s cumbersome BLE API and adapting to various meaningless patterns (I am a minimalist, and Objective C’s BLE API is definitely a model of “good design”) Counterexample); and learn how to program in Objective C. The last time I programmed with it was ten years ago, and I have long forgotten the technical details such as event loops and intrinsic management.

The end result is the following code, which, while not elegant and concise, at least works. With the help of large models, I completed the development in a very short time, which was simply unimaginable before:

https://github.com/antirez/freakwan/blob/main/osx-bte-cli/SerialBTE.m

The code is mainly generated by ChatGPT, and my job is to paste in the requirements that I want to do but are not sure how to implement. In this way, the big model can explain to me what the problem is and how it should be solved.

Granted, the big model didn't actually require much coding, but it helped me speed up development significantly. Can I complete the project without ChatGPT? Of course it works, but the most important thing is not how much extra time I have to invest, but that I might just give up: after all, such a troublesome thing is no longer worth my waste of energy.

In my opinion, this is the real deciding factor. Without the big model, I wouldn't have written such a program at all after weighing the effort and benefits. The big model even helped me make a tweak that was more important than the program itself: In the project, I modified linenoise (the line editing library I use) so that it works in multiplexers.

disposable program

There are many more cases like the ones mentioned above, so I won’t repeat them too much here. After all, similar stories basically have the same routines and effects. In my daily work, I often face another type of problem, which is to quickly obtain some verifiable results. In this case, large models can also be used to improve exploration efficiency.

For scenarios like this, I tend to let the big model write all the code. For example, when I need to write some throwaway program, like this one:

https://github.com/antirez/simple-language-model/blob/main/plot.py

I wanted to visualize the loss curve during the learning process of a small neural network, so I showed GPT-4 the CSV file format generated by the PyTorch program, and then proposed that if I specify multiple CSV files in the command line, I hope to be able to analyze the results of different experiments. Verification loss curves are compared. The above link is the result generated by GPT-4, and it only took 30 seconds.

Likewise, I need a program to read the AirBnB CSV report and group the apartments by month and year. It then combines the cleaning fee with the number of nights booked to calculate the average rental price for different months of the year. The program worked for me, but it was extremely boring to write: there was nothing new or interesting about it. So, I selected a part of the CSV file and pasted it into GPT-4, and then described the problem I wanted the large model to solve. The output program runs successfully once. But we have to correctly understand the specific data grouping method, otherwise the data will feel scattered and disordered.

Through simple reasoning, I believe that the large model is definitely not a solution simply copied from the training materials that have been exposed. Yes, GPT-4 must have observed similar programs during training, but the specific grouping requirements corresponding to these programs are different from my prompts, especially the requirements for grouping into CSV files in a specific format. So in my opinion, a large model should be able to interpolate to some extent the space described by different programs in the training set.

It wouldn't be wise for me to waste my time writing such a simple program. It turns out that large models can take on such tasks and help me focus on the really important work, which undoubtedly improves my code productivity in disguise.

Typical tasks that cannot be solved with large models
: system programming

Although my attempts at large model programming achieved considerable success, when writing programs in C, I found that large models served more as portable documentation assistants. I myself am an expert in systems programming, and this is the type of use case where large models are of little help due to their lack of sophisticated reasoning capabilities. I believe that all my friends have similar feelings.

Let’s take a look at this experimental prompt:

"Generate an elegant, short, and efficient C implementation of a bloom filter. Focus on hash function processing and write it in high-quality C. Also consider that the implementation should be sized to store 100,000 elements , the probability of false positives shall not exceed 5%. The added element is a null-terminated string."

The answer given by GPT-4 is not good. Bloom filters are actually quite common, and the data structures involved are not special. But it's clear that writing a decent bloom filter requires more powerful abstraction capabilities: such as finding an efficient way to hash the same string N times and ensuring that each hash value is fully decorrelated. If you change your mind and explicitly ask GPT-4 to modify the hash function to produce N decorrelated outputs, then the solution it gives will be much more reliable. If it could discover this idea on its own, it would write bloom filters differently, using a single hash function to set K bits at a time.

The fact is that GPT-4 can write adequate and more general hash functions independently, but when writing larger projects such as bloom filters, it fails to show good reasoning capabilities, but gives two different but highly similar hash functions.

All in all, the reasoning capabilities of current large language models are still weak. In addition, resources on this issue may be relatively scarce, and there may even be a large number of low-quality resources, resulting in unsatisfactory results. And this is by no means an isolated case. I have tried many times to use large models in algorithm or system programming, and the results were also very poor. Even if it lowers expectations for reasoning capabilities, it cannot reproduce the level of code generation in the Python programming environment.

But at the same time, GPT-4 can decompile the functions it outputs (requiring a separate session), and can also accurately understand the meaning of doing so. Therefore, large models still have a certain role in system programming scenarios, but they are very limited.

Another interesting and promising point is the significant difference in performance between the smaller and larger models in the above situation.

Although Mixtral is an excellent model suitable for a variety of purposes, considering the inherently weak reasoning capabilities of large models, the current rule that can be concluded is that the larger the size, the better the effect. In addition, the local model deepseek-coder is set to 4 bits quantization precision because the memory of the local device is insufficient to run the model at higher precision. Even so, with 34 billion parameters, its reasoning ability on the same problem is still stronger.

In my attempts, I gave clues about the problem, and the model got the answer right, identified the real source of the problem, and ultimately came up with an alternative that worked. There is no direct answer to this type of application in any document, book, or Google search.

Whether from the perspective of original interpolation or other ideas, the model has mastered some form of reasoning ability. Only with this reasoning ability can AI find the root cause of the problem and discover potential solutions. So I think there is no need to argue anymore. Large language models do have positive auxiliary significance for programmers.

But at the same time, experience over the past few months has shown that in the field of systems programming, especially for experienced programmers, large models rarely provide any ready-to-use solutions.

The ggufflib project I am currently responsible for requires writing a library that reads and writes GGUF format files, which is the format used by llama.cpp to load quantized models. Initially, I tried using ChatGPT to understand how quantized encoding works, but ultimately decided to reverse engineer the llama.cpp code - which was faster.

The ideal large language model should be able to restore the documentation about the data format based on the data encoding "structure" declaration and decoding function it comes into contact with, thereby helping system programmers understand the design ideas. But although the function of llama.cpp is not large and can be squeezed into the context window of GPT-4, the output conclusion is meaningless.

For this kind of situation, we can only do what the most traditional programmers do: take out paper and pen, read the code line by line, and see where the bits extracted by the decoder are registered.

A correct view of large language models

With deep regret, I have to admit: most current programming tasks are repeating the same work in slightly different forms, and do not require a high level of reasoning at all. Large language models perform well in this regard, but are still subject to the hard constraints of context scale.

And this should also cause us programmers to think: Is such a program really worth our time and energy to write? Yes, this job can bring us quite generous rewards, but if large language models gradually take over this part of the task, then in five years or no more than ten years, many fellow programmers will lose their jobs.

Furthermore, do large language models have a certain degree of reasoning ability, or are they still imitating parrots, but only learning more vividly? I think that in some cases they do have the ability to reason, that is, to grasp what semioticians call the concept of "signifiers," meaning meanings that don't actually exist.

I believe that every friend who often deals with large models can understand their limitations and feel the power of reasoning embodied in them: their ability to integrate content they have been exposed to in the past far exceeds the random output of words. category. Although its learning process is mainly completed in the pre-training stage, when predicting the next token, the large model will still build some form of abstract model based on the goal. Although this model is still fragile, incomplete and perfect, through actual observation, we will realize the objective existence of this ability. As the saying goes, hearing is believing and seeing is believing. Even if it may challenge the deterministic principle of mathematics and go against the views of the greatest technical experts, I still have confidence in the cognitive level shown by the large model.

Finally, I hope everyone can actively embrace big models and try to use them to solve various problems in programming. Asking the right questions of large models will become a fundamental development skill, and the more you practice it, the better the AI will get at improving its work. Even if AI factors are not considered, this ability to clearly and clearly describe problems can help us communicate better with others. After all, big language models aren't the only conversational objects that can't keep up with our thought processes. I believe everyone has realized that although many programmers are excellent in their specific fields, their communication skills are poor, which has also become a bottleneck that limits their career development.

Today's Google engine is already broken, so even from the perspective of condensing and refining text content, large models must have huge practical significance. Personally, I will continue to use and understand large models. I have never liked learning the details of obscure communication protocols, nor have I ever wanted to deal with complicated and ostentatious library writing methods. To me, this is just a waste of time and energy. Thanks to the big language model, I was rescued from these quagmires.

Original link:

http://antirez.com/news/140

Antirez, the father of Redis, personally uses a large model to code: he can replace 99% of programmers in the future

Guess you like