Liu Song, Producer of GOTC2023: The Future of Data Technology in the Eyes of 20 Years of IT Witnesses

 One word to introduce me, I am an experiencer, but more of an observer. I am actually a witness and observer of the entire software industry over the past 20 years. As an observer, you must have the ability to objectively summarize, and then include some views on the future, but the views on the future may not be correct. For example, no one can guarantee the views of GPT.

——Liu Song, Vice President of PingCAP

Every era needs observers, especially today as technological change accelerates. In the shock brought by GPT, technologies in various fields are looking for new directions, and database technology is no exception. Fast-moving database and service vendors have also integrated AI tools for automatic SQL generation and performance optimization . However, in the face of the "menacing" wave of AI technology, where is the direction of database technology? And how to make good use of AI? OSCHINA interviewed Liu Song, the producer of GOTC2023 and vice president of PingC AP , and asked him to talk about the recent hot AI technology and the future of data technology under the new wave. 

 Liu Song

Vice President of PingCAP

He once served as the general manager of the Technology Strategy Department of Oracle Greater China, and the vice president of Alibaba Cloud. He was in charge of Alibaba Cloud's cloud computing ecological construction, think tank cooperation, and talent training plans. Liu Song has been active in China's software industry for a long time. He has personally observed the trend of the integration of the Internet and the information industry. He has practiced for many years in the development trend of the software and Internet industry, the business model construction of cloud computing and open source industry, the development trend of database technology, and the digital transformation of enterprises. experience. 

On May 28, Liu Song served as the producer of the GOTC 2023 "Data and Database Technology" sub-forum, and delivered a keynote speech "From HTAP to Serverless, TiDB's Technology Evolution Road", so stay tuned!

For conference registration, please visit:  https://www.bagevent.com/event/8387611

OSCHINA: You have paid close attention to the dynamics of GPT recently. So far, what is your deepest feeling about it?

Liu Song:

I think one word can be used-emergency, Emergency.

I am personally a lover of complex system science. You may have heard of the Santa Fe Institute, which is dedicated to the research of complex system science. In the United States in the 1980s and 1990s, emergence has a specific meaning of the times—when a complex system exceeds a certain critical point, many things that are impossible to predict under the linear thinking of the original system will be born.

The first is the large model. After passing the 100 billion level, it suddenly becomes intelligent, which is a kind of emergence.

The second emergence is happening now. When the technology of large models such as GPT is combined with our familiar software business and various industry scenarios, there will be a larger round of emergence. Now there are a lot of non-linear explosions that I didn't think about before, and things that didn't exist suddenly appeared in a short period of time. The form of these things is another explosive state, like the logic of the so-called Cambrian explosion.

ChatGPT was opened in December last year, and after the Chinese world arrived in February this year, the topic suddenly exploded. On the one hand, the technology has passed the critical point, which has touched everyone and found that what they understand turns out to be better understood by AI. Another point is that people, especially people outside

OSCHINA: You also said a word recently - ten years of cloud stage, the new dancer is Serverless+HTAP+AI. How do you understand this sentence?

Liu Song:

I became a cloud practitioner exactly ten years ago, and 2013 was my last year at Oracle. Since then, people in Oracle China have called me Mr. Cloud. At that time, Oracle just started to transform into cloud, including database and SAAS. Later, I went to Alibaba Cloud in the second year . I was considered as the first batch of people in the commercialization of Alibaba Cloud. I was mainly responsible for the cloud ecology, and I also did some work in vertical industries such as financial cloud. So the understanding of the cloud is almost exactly ten years.

In the past two years, I think cloud 1.0 has almost come to an end. The resource-based cloud is an infrastructure that allows all industry applications to be quickly and elastically migrated to the cloud. I have been on the cloud stage for ten years. Looking back now, the biggest direction of the cloud is to undertake the ultimate proposition of the digital transformation of the whole society. There are two key technologies implied in it, one is data technology and the other is artificial intelligence, and both of these things emphasize cloud-based.

In digitization, all users hope to have integrated data services represented by HTAP technology. In terms of AI technology, the emergence of this wave of GPT has told everyone that it takes huge computing power to train AI to this level.

Then these three technologies will be more deeply integrated in the future.

We released a small product on January 10, an AIGC-based intelligent data exploration function - Chat2Query on TiDB Cloud . You can easily see that natural language generates SQL within seconds , and then quickly returns a result to be queried in a row-storage-column-storage hybrid method through HTAP technology. When resources are insufficient, it will automatically expand through Serverless .

Serverless + HTAP + AI, these three technologies also have a common promise or feature today - feedback human needs at the second level, turn your words into SQL at the second level, perform complex queries at the second level, and when resources are insufficient Then make cloud resource calls without user awareness in seconds. 

In the past ten years, many Internet companies and large enterprises have talked about going to the cloud digitally, but they essentially do two things. The first thing is to buy cloud resources, and users use the renting model. The payment method is not due to a specific Query query to pay. In the past ten years, the cloud computing market has formed a form of providing resource leasing through the Internet, but this form has entered the stage of homogeneity. To extend upward is to tamp down the foundation of Cloud 1.0.

It is very important to consolidate the foundation of computing power, whether it is CPU or GPU. Large model training also relies on this, and OpenAI training is largely based on Microsoft's cloud. Today, Amazon Cloud and Alibaba Cloud are also quickly following up, because this is definitely an opportunity for cloud vendors.

OSCHINA: Where are the key shaping factors for the future? What are the possible service scenarios of database technology?

Liu Song:

If we say that in the past ten years or so, Amazon has led the world in infrastructure innovation, including software and hardware innovation. So in the next five to ten years, the biggest stage of cloud 2.0 depends on three key shaping factors, one is the cloud's own cloud native, the other is data technology, and the third is that AI becomes a similar basic service? Of course, another point is whether the new large model and database technology can be integrated on the B side to create more new scenarios, which is also a topic of great concern to everyone. 

The fusion of cloud AI and database may happen on the B side. We now use GPT products mainly for ordinary people to solve some popular science problems in a public square , and most of them are unstructured data . But let’s imagine that the CEO of a company is very concerned about topics that cannot be directly solved by GPT products . For example, if I want to improve my company’s talent efficiency index by 10% next month , which departments should I start with?

For this problem, on the one hand, it is necessary to have a large number of professionally applied models and algorithms in the internal database of the enterprise. On the other hand, it is necessary to compare with external peers and consider the economic environment. So let's imagine, if we list a CEO's 100 frequently asked questions. So in the next five to ten years, can you ask questions through natural language? Through the combination of AI and database, including large models, the combination of internal and external data , to solve these problems for the CEO , this is a place we can look forward to.

OSCHINA: What is the difference between the development of data technology and AI technology?

Liu Song: 

The database industry is a family of four generations. Today, you can still buy Oracle database services on the cloud, such as AWS, whether it is the open source MySQL, Redis, or our distributed database New SQL like TiDB. There is a market for all database clouds. The structure of the database is relatively clearer, and the four generations live under the same roof, each of whom can try to find their own value and experience. Value is whether you can realize the value of data in a better and faster way, and experience refers to whether the database experience in the cloud will be better.

In the database field, a variety of technologies in the database field have meaning. On the other hand, no new technology can completely replace all the original technologies.

But AI technology is just the opposite. The attribute of AI technology is patricide. As soon as the new technology comes out, the original technology, whether it is grandfather or father, is meaningless. This can be seen from the subversion of NLP by GPT, and this way of thinking has completely changed. The things that come out of the next generation are likely to completely kill the current GPT model . In this regard, AI applications of large models are indeed risky, but the opportunities are also greater. 

I think the key to the survival of AI-related large models or applications lies in the professional threshold, and the future is nothing more than two directions of upward and downward development. One is the advanced nature of the underlying technology, such as the advanced nature of the large model itself , which everyone can see. Then when AI technology extends to the scene, there are two places that must be closed to form a threshold: one is data in professional fields, such as medical, automotive and other fields. If the large model can access its data, it will be stronger in this field; There is also the creativity of upward applications, which are user-oriented. It is conceivable that in the future, whether it is human resource management or all customer service systems, digital marketing, a new generation of BI, and a new generation of search in a broad sense, etc., these may be replaced by GPT. smart technology to do it all over again.  

So where is the threshold for repetition? One is the model itself, the other is the extraction and learning ability of professional data, and the other is the friendliness of application construction, or experience. That is, the experience of AI in vertical industries adds value. On this basis, AI may face greater possibilities and challenges than data technology .

OSCHINA: What does PingCAP think about the future now, and what actions has it taken?

Liu Song:

We now have a new belief that the fusion of AI and data will generate huge and all-round value for enterprise users. 

There are three levels. The first level is the easiest for users to see. Like Chat2Query , natural language replaces SQL and becomes the main query statement . When users want to obtain some insights and services, such as a courier brother, a delivery boy, or each consumer's behavior of inquiring about your products and where the takeaway is, it is actually a kind of data consumption. If such queries are all solved in natural language, the number and frequency of users of the entire database may be 100 times, 1000 times, or even greater. In turn, this brings higher requirements for the integration of data technology and AI technology.

The second level, taking the processing and query optimization of database technology as an example, there are two main schools in the field of database technology in recent years, one is AI For DB, and the other is DB For AI. To put it simply, one is the "autopilot" of the database, and maintenance can be optimized with machine learning, so that there is no need to spend too much labor cost, especially in the cloud. The other is query optimization, including performance tuning. This is a long-standing problem in the database field, which can now be solved by GPT and related AI technologies. Then these data operations and maintenance, as well as the workload of data architects, will be greatly reduced, and any project will iterate faster than before.

The last layer is the requirement for the database technology itself. When AI becomes a common tool for everyone to query and gain insight, there will be some engineering tuning in the middle, including the call of the algorithm, etc., then what is the data technology? How should it be organized?

We believe that the future of traditional databases may become a form of online data services - Online Data Service, which is broadly defined, and it is not simply a database that has become a service. This is also what PingC AP has been evolving over the past few years.

I think our biggest change is from a distributed database that serves more Internet scenarios to a cloud-based data service provider. This is a data service in a broad sense, whether it is transactions or queries. , we are now also an open architecture.

So I want to summarize that this wave of AI has become a new generation of GUI, which will increase the number of users who use data by thousands of times. First of all, for all database practitioners, AI is a huge boost to improve performance tuning and various project engineering progress. In addition, in the current form of data services, it may be better combined with AI. This also goes back to the small example I just mentioned. Regarding Chat2Query , the questions asked by users in second-level natural language are turned into Query, and then the query is realized through HTAP technology. This is actually a data service, and then quickly provided to users Feedback results and bring a decision, then the closed loop is at the second level. This is what we think that in the future, AI and data technology will form a new, combined innovation in the cloud and become a new form of data service.


 The "Data and Database Technology" sub-forum will meet with you on May 28. At that time, many big names in the field of data and database technology will come to the site to share their project experience. Interested partners are welcome to click the link below to register for the conference!

For conference registration, please visit:  https://www.bagevent.com/event/8387611

The Global Open-source Technology Conference (GOTC) is a grand open-source conference for global developers jointly initiated by the Open Atom Open Source Foundation, Shanghai Pudong Software Park, Linux Foundation Asia Pacific and Open Source China. Technology feast. From May 27th to 28th, GOTC 2023 will hold a two-day open source industry event in Shanghai. The conference will be displayed in the form of industry exhibitions, keynote speeches, special forums, and open source markets. Participants will discuss popular technical topics such as Metaverse, 3D and games, eBPF, Web3.0, and blockchain, as well as open source communities, AIGC , automotive software, AI programming, open source education and training, cloud native and other hot topics, discuss the future of open source and help open source development.

The registration channel for GOTC 2023  is now open, and open source enthusiasts in various technical fields around the world are sincerely invited to join in the grand event! 

Enter the official website for more information, please visit:  https://gotc.oschina.net/

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/oscpyaqxylk/blog/8807422