Wuhan Yuan Chuanghui Returns, Let’s Talk About Large Models on April 20th”

Nowadays, under the trend of "domesticization", the wave of entrepreneurship in the field of domestic databases is getting higher and higher. As of the end of 2023, there are nearly 300 database products on the Chinese market and about 100 database manufacturers. Well-known investment institutions such as Sequoia, Hillhouse, Tencent, etc. have all gone out of business. Each of them has at least three investment databases, which shows the favor of capital.

Some databases relied on their own strength to obtain 100 million yuan in financing, win bids for multiple projects, rise steadily, and successfully go public; however, there are also some databases that are still being questioned by the market. Among the 16 listed companies related to domestic databases, very few are profitable, which makes people wonder how long this model of "losing money and making money" can last?

So, can our domestic market really accommodate so many database manufacturers? What problems are faced with the current development of databases? What kind of database player can finally stand out? As an ordinary small and medium-sized project, how should we choose a suitable database?

In this issue of [Open Source Talk], we have invited Li Linghui, founder of cloud native database ClapDB, Qiao Jialin, co-founder & CTO of Tianmou Technology, and Ma Gong, engineer of Infra, to discuss together what problems exist in today's database market?

Sharing guests:

Li Linghui

Founder of cloud-native database ClapDB, former CTO of Multiplication Cloud, CTO of Meiqia, and chief architect of Didi Chuxing.

Currently working on a new paradigm of cloud-based infrastructure to provide analytical data services in the new era.

ClapDB is a database designed and implemented from the ground up based on cloud native architecture, taking full advantage of the advantages of modern cloud native technology. Developed in C++, it is expected to provide higher performance, allowing you to easily and quickly obtain analysis results on any scale of data.

Qiao Jialin

Co-founder & CTO of Tianmou Technology, Apache IoTDB PMC and founding member, PhD from Tsinghua University, member of the Open Source Technology Committee of China Communications Society, and academic secretary.

Participated in the construction of IoTDB, the first Apache top-level project in the field of IoT time series data management, and TsFile, the second top-level project.

He is an Apache Member (member of the Apache Foundation), a pioneer in open source in China, a Shuimu Scholar at Tsinghua University, and a silver medal lecturer at the Open Atomic Foundation. As one of the 10 leaders in basic software, he was awarded the 2023 Outstanding Software Engineer. Relevant results won the first prize of Beijing Science and Technology Progress Award.

Apache IoTDB is a low-cost, highly available IoT-native time series database that adopts a lightweight structure of device-edge-cloud collaboration and supports integrated IoT time-series data collection, storage, management, and analysis.

host:

horse worker

Nordic Infra engineer, manager of the public account "Swedish Horseman". Regular guest of "Open Source Talk".

01 There are so many databases, it’s not all the fault of following the trend.

Ma Gong: The current domestic database market is very prosperous. There are more than 300 database products and more than 100 manufacturers. At the same time, a lot of investment has been made, and customers are also very supportive. But at present, not many can be considered successful and have international influence. Our huge investment and extremely low output have become a huge contrast. Today we want to discuss why this contrast is formed and how we can reduce this contrast.

Let’s first ask the two people in charge of the database. There are already 400 databases in China, and there are only a few dozen in the world. China has a serious surplus, so why do you still make databases?

Li Linghui: Now, there may be thousands of companies in China that are officially doing databases; I know of maybe 50 to 100 companies that are somewhat famous. There are, in my opinion, three or four types of databases, even though they look different:

The first is based on MySQL's magic modification, the second is based on PostgreSQL's magic modification, the third is based on PostgreSQL's Greenplum magic modification, and the fourth is based on Java's ES or Hadoop ecosystem packaging... It's not even modified, it's packaged.

From a problem-solving perspective, there is no problem in reusing open source projects as long as it does not violate the open source agreement. However, for users, there is actually no need for so many choices that look the same. That will only increase the cost of choice, and no one provides functions that others do not have, although each of them says that they are different. .

What I want to say here is that each thing is different. The answer you most often see is: I have made some innovations. I believe that no database vendor will say that they have no innovation at all. Everyone will say that they have made a little innovation. This "little" may be a modest word, or it may be true.

But from the user's perspective, I think there are almost no users, or very few, who can really enjoy this little improvement. Because you may fall apart in another scenario. We who are engaged in engineering and technology all know that if you want to prove your superiority under certain conditions, basically anything will do. It is impossible to have a software or a kind of The project has no advantage under any circumstances, no way.

I have seen our domestic competing products. In order to evaluate the bids, they directly record the characteristics of the data in the disk file. There is no need to calculate. When reading out the max value, we directly get it. Would you say it is an innovation? You can't say it isn't, at least I haven't seen anyone else do it. But do you think it makes sense? That does make sense if you happen to need max, but who happens to need the maximum and minimum values in a data file without any filtering?

Our biggest difference is that we look at what users need from the user's perspective. The users we solve are those who have very little money to spend on the cloud. They are not a large enterprise, have little operation and maintenance capabilities, and do not have a DBA. Moreover, they are really not able to learn a complicated manual with thousands of pages to deploy and use it. It is too difficult, and Snowflake is not cheap. But he wants to use data analysis services. He has many complex data analysis needs, so we will meet the needs of these users and make them comfortable, cheap and enjoyable to use!

Ma Gong: From a digital perspective, you are a cheaper Snowflake, and you don’t need a professional DBA, but directly serve developers, right? This is indeed different, because many domestic databases I know are If you want to train your own DBA, you may feel that our performance is better than theirs, and that our query platform score is higher than theirs, but your thinking is indeed different. What about Jialin? Why does your laboratory need a database?

Qiao Jialin: Let me answer these two questions: The first one is why are there so many databases in China?

First, let’s take a look at what the database does? It manages data. This is recognized by everyone: manage the data, check it well, and check it quickly. Then let's see how many types of data there are: directed documents, relationships, time series, key values, graphs, and vectors. If we regard the database as a summarizer, then there are actually quite a lot of types of objects that we want to summarize. Based on this, how many application scenarios are there? For example, finance is a typical scenario, and then the Internet of Things is another typical scenario. Under each scenario, there will be subdivided industries, and they may use data differently. So this is the reason why everyone has different design concepts and goals when making databases. It is also a big reason why there are so many databases now.

In this context, time series is also one of the data types. The IoTDB we make is a database for IoT scenarios, which also determines that we are time series data management for IoT scenarios. Combining these two points, if you happen to fall into these two points, then our product is a better choice.

So why do we want to build such a database?

Because our group is called the data storage group, which specializes in helping companies research efficient methods of data management. Our laboratory itself is also a laboratory with an industrial background, so the data storage we come into contact with is also industrial and Internet of Things, and the application scenarios have been fixed from the beginning. At the beginning, we also directly used the open source database Cassandra to do business adaptation on it. But later it was discovered that its underlying core design was not exactly consistent with what users wanted. Cassandra is more like a flexible key-value store. Users want a database with partial sequential operation, so we started to try to make changes in it. However, the changes later became incompatible with the original open source project and were inconsistent with the development goals of Cassandra. , so we became independent.

02 Open source and closed source are both difficult to do

Ma Gong: I found an interesting question, that is, the backgrounds of the two of you are almost opposite. One is from academia. You see, Jialin never talked about money, and you didn’t even talk about costs! Then Linghui comes from the industry and from Party A. He talks about money from the beginning: How many cents does a query cost?

I think your two strategies are actually different in domestic databases, some are commercial databases, and some are based on open source. What do you think are the pros and cons of each in the long run?

Qiao Jialin: Whether there is index pressure will have a great impact on the selection and design of our database. The design of a database that requires one year to go online and a database that requires three years to go online is definitely different. If you are always under project pressure, then all your designs may be centered around project priorities.

But when we first started doing it in school, there was no such pressure. We probably thought more about what kind of database is needed for Internet scenarios? What does the database architecture need to look like? What are the better open source technologies today? We can make more choices and demonstrate, design and implement more technical solutions. Later, after joining the Apache Foundation and becoming a commercial company, this involved how to use open source software to support its developers so that they can continue to contribute in it.

We are now building some of my enterprise versions based on an open source database product. I don't need to open source my enterprise version. Compared with the GPL agreement, the Apache agreement emphasizes the protection of the rights and interests of software developers. It is precisely because of this that many enterprise software is now further developed based on Apache software. Therefore, open source software is one option, and the enterprise version based on open source software is another option. This enterprise version may provide users with more technical guarantees.

Ma Gong: Ling Hui doesn’t seem to agree very much with the open source agreement. How about you explain it?

Li Linghui: What I am really complaining about is using VC money or investor money to build a commercial open source company. As for Tsinghua University using money to do open source, I think it is only natural. What you are spending is taxpayers' money. Open source is to give back to society and open scientific research results to the society. I think this is the right thing to do, and the academic community should set an example.

I think more than half of all open source projects should come from academia. Many cutting-edge basic projects can only be achieved with national-scale scientific research investment, because there is a long experimental stage, and we business people have a limited time window. is very short. Running a company is not like students happily doing scientific research without getting paid. Each of us has to live. For a company, no shareholder will support you spending ten or twenty years doing this. The first question before you is how to make money.

Talking about open source, if this is an innovative thing and it is promoted to the market in this way, I think this is the right method, because others may not understand it yet. But in a mature market, such as our micro database, this market is very mature, and things that have been on the market have not been on the market for decades. In fact, the big selling point of open source is that it doesn’t cost money, but if you look at the 300 brothers around you who don’t cost money, how do you stand out? This is a question everyone wants to think about. From the perspective of business competition, what we are essentially pursuing is irreplaceability. The premise of all money collection is this irreplaceability, whether it is a person or a company. How to manage your own irreplaceability is a question that every founder must consider.

03 A good database requires a little toughness

Ma Gong: Ling Hui mentioned an interesting question. Party B does many projects and they will be customized, so its version has basically collapsed. There is no version to develop or manage. Each project is unique. of. Jialin is open source, but there is actually no way to prevent your products from being customized by others.

But in fact, from Party A’s point of view, Party A also hates this. I use a product with version management and a customized project. The latter is too risky. No Party A said that I want to use this version. Only three engineers in the world know how to play it. Only two people can understand this configuration, right? But why has the domestic database market formed such a customized market? Party A and Party B didn’t want it, but it ended up like this. Why is this abnormal state formed?

Li Linghui: I have worked for many large parties in China for a long time. When you don't have a powerful enough standardized product and the user's needs are not met, you have to let the user help you figure out what to do, and the user's imagination is not restricted. He doesn't think about the overall situation, he only thinks about his needs. I am particularly afraid that my Party A will say this to me: "I have a very simple request. You can do this..." Usually when I hear this sentence, I want to run away.

He thinks you don’t understand, and he wants to teach you. You really don’t understand their needs. For example, we once had a user who said: I can’t stand it because your information is automatically saved. I feel uneasy. Please provide me with a button and I’ll click it to save. I said that this button has no function. In fact, it has been saved. He said that I still need it.

Do you think this need should be met? To be honest, if you meet this demand, more customers will be surprised and say, didn’t you save it automatically? Why did you provide this button? This is actually a question of game: when Party A and Party B decide who is more authoritative and who can better represent the standard answer in this industry, whoever will be tougher.

You see our same Party A, when they met IBM and Microsoft, they were not so arrogant. Therefore, when you are a weak Party A, the respect you get is not enough.

Indeed, sometimes we are not professional. My client once asked me a question: I have been in this industry for 20 years, how many years have you been in it? I said I did it for two years. He said, why do you teach me what to do? You can't say what others say is wrong, but there are specialties in the art industry. So I think that when starting a business, especially making products, you cannot go beyond your own circle of competence in understanding the problem. When you do something you don’t understand, you will naturally follow the needs of users.

Ma Gong: The problem you mentioned is not actually in the database. It is the same in other industries. Blindly meeting customer needs will kill your product. This is a very common product management misunderstanding I see: let users be your own product managers.

Of course, Ling Hui has already explained: Many Party B's cognitive level is not higher than Party A's, so Party A will naturally not listen to you. I think I'm better than you, so you should listen to me. If I give you money and don't let you call me dad, it will be considered merciful. The only thing that can resist this strong position is that your knowledge is better than his. You not only sell a product, but also a set of concepts and a plan. You ask Party A to follow this plan and do it. I think this plan is good and I am willing to explore it with you. It is best if we have an equal relationship. But most product managers or companies do not have this ability. If anyone has this ability, I think one source must be academia.

Like Jialin, you can say that I came from Tsinghua University. Our entire research group has been studying it for more than ten years. We have read papers from all over the world. We rejected the method you mentioned 10 years ago. Can you do it and introduce a new and more advanced gameplay into the industry, instead of letting these old foxes think that I know better than you because I have been working for 20 years?

Qiao Jialin: What my mentor said most often is to control the complexity of the database, and don’t use it to do things that the database should not do. The simplicity of the code is the long-term source of vitality of a database. If we add many functions, we may gain one or two users in the short term, but in the long term, this code will be unmaintainable.

So why can we do this? I think it may be due to the accumulation of open source in the past. Because we only officially commercialized it after about five years of open source polishing, when we went out, this product could basically meet the needs of many open source users, including enterprise users. This product is standard enough, so users won’t have any weird requests for us. However, because we are working on a database for the Industrial Internet of Things, the industrial scenario is complex enough. We want to equally communicate with industrial users about their business scenario needs, and indeed we need to learn more.

For more live content, scan the code to watch the replay↓↓↓

[Open Source Talk]

The OSCHINA video account chat column [Open Source Talk] has a technical topic in each issue. Three or five experts sit around, each expressing his or her own opinions and chatting about open source. Bringing you the latest industry frontiers, the hottest technical topics, the most interesting open source projects, and the sharpest ideological exchanges. If you have new ideas or good projects and want to share them with your colleagues, please contact us. The forum is always open~

Why are there so many parallel imports in the domestic database industry?

01 There are so many databases, it’s not all the fault of following the trend.

02 Open source and closed source are both difficult to do

03 A good database requires a little toughness

Guess you like