Pre-training large models and financial quantification

Pre-training large models and financial quantification

Huang Wenhao  Author: Huang Wenhao (former technical director of Beijing Zhiyuan AI Research Institute)

Organized by: AINLP Official Account

Link: https://zhuanlan.zhihu.com/p/646909899

Recently, I discussed an interesting point of view with my friends: large model pre-training (mainly referring to the most costly from scratch pre-training) and financial quantification have many similarities. Reminiscent of the people who do financial quantification mentioned by the Magic Square before , they naturally have the basis for pre-training large models. It just so happens that I have the background of pre-training large models and financial quantification. After thinking about it carefully, I found that it is indeed possible to make a comparison.

large system engineering

The core data of most quantitative algorithms is public volume and price data, and the most important part of the large model pre-training data is public data. Of course, each company will have some unique data sources, but the proportion is not too large. The overall algorithm logic of quantification is actually similar to each other. In terms of analog pre-training model structure, everyone is basically similar, and there will be no earth-shaking differences. Therefore, what determines the quality of the model is actually the large-scale system engineering capability.

First of all, as large-scale system engineering, both quantization and large models require large computing clusters. The interconnection of tens of thousands of cards is the ultimate challenge for Infra . Before ChatGPT in China, the only one that realized the interconnection of tens of thousands of cards was Magic Square’s Yinghuo platform. Magic Square’s Infra talents are basically the top in China, and various NOIs in it Gold medalist. Quantification not only requires large-scale computing clusters , but also has the ultimate pursuit of performance and efficiency. The trading opportunities captured by everyone’s algorithms are actually very similar. In this case, the speed of trading instructions becomes particularly critical, and network card programming is used to pursue maximization. efficient. Although the large model is not so exaggerated, every point of improvement at the infra level can bring a lot of training efficiency optimization, and can also get experimental feedback faster and get continuous improvement.

Second, details are critical in large-scale systems engineering. Only the algorithm in the quantitative trading system is not enough. The whole system includes multiple aspects of transaction execution and risk control. Any problem in any link will lead to the failure of the entire trading system. Large model pre-training also contains a lot of details from data to evaluation. In addition to a general consensus on who has the best data cleaning and whose model is better, small details such as data ratio, data order, training strategy, etc. all play an important role in the final result of the model.

Key technology becomes private

This is also the most touching point. Financial quantification is a very closed-source system. The trading system of each company is the biggest secret weapon, and there are very few public parts. Now the big model is developing in this direction, and the core models of several giants have begun to move towards closed source, which is their core secret. OpenAI's latest generation of GPT4, Google's Bard and the future Gemini will not be released in a short time. Some people here must say that Meta's LLaMA is open source. You can go back to the previous article about the open source model. From the perspective of pre-training, the gap between Meta and OpenAI is huge, almost a generation away. If Meta finds that large models can make money in the future, it may not continue to open source. OpenAI may also open source the previous generation model in the future, which is a bit like a quantitative company making the previous trading system public after using a better trading system.

There are also several interesting parallels at this point. 1. The open source work in the community is not very helpful to the development of pre-training models. After LLaMA2 came out, the entire open source community has done a lot of work based on LLaMA2, but after a closer look, it is based on the pre-training model for SFT, LoRA and other fine-tuning work, and there are not many people who continue to pre-train. It is a pity that most people in the academic world are doing similar things. When picking these low hanging fruits, few people can really do something that is helpful for pre-training, such as the core issue of data ratio. Recently, ICML has a bunch of people doing SFT, and only one or two research data are matched. This is a bit like the papers published in the field of financial quantification are all unprofitable algorithms.

2. The work in academia cannot be applied to real systems. You can read a lot of quantified papers, especially the papers made by academic circles. There is a big difference between the setting and the actual trading system. Although it is often the difference in small details, there is basically no paper whose algorithm adjusts the setting correctly. To make money in the actual system, the use of reading paper is only to inspire yourself to do some strategies. In academia, many articles like Optimizer and data ratio work well on small models, but when they are actually used to train large models, they all become unworkable. 3. Researchers are also starting to become less open than before. Due to intense homogenization competition, many experimental conclusions of researchers cannot be shared. After all, some experimental conclusions were obtained by spending millions of real money. Now communicating with many colleagues, basically the company has regulations that information cannot be shared. This also leads to more and more closed valuable information, and the rapid spread of worthless information.

The national conditions of each country give opportunities for localization

Many global quantitative funds will not adapt to the climate when they arrive in China . At the same time, national policies also stipulate that many global quantitative funds cannot conduct large-scale business in China. This has given many domestic quantitative funds the opportunity to rise. Even if the trading system has some gaps compared with top foreign institutions, as long as they can maintain their leadership in China, they will have good returns overall. The same is true for large models. On the one hand, OpenAI, Google, and Meta's models are relatively average in Chinese, far less powerful in English, and on the other hand, they are not optimized for China's national conditions and do not meet policy requirements. This gives domestic large-scale model companies the opportunity to do large-scale model pre-training. As long as it is the first in China, even if there is a generation gap with the world's leading models, it is still a big market. Of course, such a situation exists not only in China, but also in many countries in the world. Therefore, it is not a small market to do basic large-scale model localization pre-training for governments of various countries.

Another similarity derived from this is that they are greatly affected by policies. The ups and downs of domestic quantitative funds are basically related to policies, and the development of large models is also closely related to relevant national measures. At the same time, both need to receive effective supervision in order to develop healthily.

other

In addition to the above few deep feelings, there are many similarities between large model pre-training and financial quantification, so I won’t expand them one by one.

  • An elite few make a lot of money. It doesn't take many people to make a large model, but everyone has to be extremely smart.

  • Same core issue. The next token prediction and the next stock price prediction are actually a problem.

  • Both require large amounts of data.

  • Both pursue interpretability.

  • 。。。。。。

Finally, I hope that the big model can be the same as . The market is large enough that several top institutions cannot fully absorb it, and it can give opportunities to many large model companies. Now there are hundreds of quantitative funds in China, ranging in size from large to small, and large model companies can flourish.

Guess you like

Origin blog.csdn.net/sinat_37574187/article/details/132197684