I, the big factory resigned, and took tens of millions of dollars in financing to build a company that is almost all programmers

Joined Facebook and made the first generation of high-performance graph indexing system, code-named Dragon, but was attracted by an annual meeting of Ant Financial and successfully joined the company; thought that he might not be a natural entrepreneur, but turned his head and started a company that is almost entirely technical People’s company, and successfully got the Red Dot Venture Capital China Fund and Jingwei China’s nearly tens of millions of dollars in financing; none of the sales has served many large companies such as Meituan, Tencent, JD Digital; there is no strong promotion, the team’s open source plan The database project Nebula Graph has already obtained a 5.8k Star on GitHub (as of press time)... These entries that sound a bit Versailles all point to one person-Hangzhou Ouruoshuwang Technology Co., Ltd. (Vesoft Inc) CEO Sherman Ye, he still has too many stories and ideas worth exploring.

The big factory did a good job but left the job to do it on its own. What is the attraction of starting a business for this technology house? Why choose the graph database field with many competitors and not yet very popular? Why did you choose to do it in an open source way from the beginning? In the project introduction, it is said that this is the only graph database solution in the world that can accommodate hundreds of billions of vertices and trillions of edges, and provides millisecond-level query delay. What kind of technical capabilities make this group of technical people so emboldened? With these questions in mind, Huo Taiwen, founder and CEO of Geekbang Technology and founder of InfoQ China, and Sherman Ye (hereinafter referred to as Sherman), CEO of Hangzhou Oruoshu Technology Co., Ltd., had an in-depth conversation. The following is the main content.

1 has left Facebook, ants gold dress, choose entrepreneurship

"When I was interviewing for Ant Financial, the interviewer asked me if I would start a business if I had a chance in the future. My answer was that I would definitely start a business if I had a chance. I didn't expect it to come true."

Sherman has been working abroad since he got his degree from studying abroad in 1997. He didn't start contacting the graph database until he joined Facebook in early 2011. Unexpectedly, this electric shock made the graph database the main track for his future entrepreneurship.

“I first joined Facebook’s search engine team. Many of Facebook’s relational queries were based on search engines. Later, I discovered that search engines could not satisfy complicated relational queries very well. At the end of 2011, I did it with another colleague. The secondary graph index project is trying to solve this problem, which is the early prototype of the Facebook graph database project."

Although doing well at Facebook, Sherman still hopes that he can return to China in the future. Until 2014, after working at Facebook for four years, Sherman decided to return to China.

Won by an annual meeting of Ant Financial

In 2014, a friend of Sherman who worked at Ant Financial sent him an invitation, hoping that he could consider joining Ant Financial after returning to China. In May of the same year, Sherman received an invitation from Ant Financial to return to China for an interview and returned to China in a hurry. "I remember very clearly that that day was May 10th, just in time for the Ali Day event. After the interview, the other party invited me to participate in their afternoon activity at the Huanglong Stadium in Hangzhou. I felt very shocked as soon as I entered the venue. The entire stadium accommodates about 10,000 people, and the atmosphere is completely different from the domestic company I imagined. At the time, I felt that joining such a company would definitely have a very good future development."

Sherman's fate with Ant Financial began with such a grand annual meeting. In January of the following year, Ant Financial began to consider applying the relationship network to the field of financial risk control. Sherman successfully became the leader of the team and continued to study the graph database. "We tried a lot of third-party products and open source products at the time, but we didn't think it was good enough, and finally we started to do it ourselves."

In this way, Sherman led the team to successfully develop GeaBase with high performance, high availability, strong scalability and excellent portability in just over 3 years. After leaving Ant Financial, Sherman chose to move forward in the field of graph databases. From the perspective of actual combat for nearly ten years alone, this choice does not seem to be any problem. However, entrepreneurship is obviously not a matter of one person, one knife, one shot and enthusiasm. Most start-up companies need to absorb financing in the early stage to maintain the cost of the R&D process, so it is very important to choose areas that capitalize on. Compared with cloud native, artificial intelligence, and middle-stage, which are hotly talked about in the investment circle, the field of graph database is slightly deserted.

2 Why choose a "tepid" graph database?

"Maybe because I don't have any specialties other than graph databases, so I had to start a business in this field." Sherman quipped. "In fact, in the past few years, we have seen rapid development in the field of graph data. I think that in the next 3-5 years, graph databases cannot be said to replace mainstream relational databases, but they will certainly become very important auxiliary products."

According to Sherman's judgment, in the next 3-5 years, just like almost all enterprises are using relational databases now, most enterprises will also use a set of graph databases to store relationships. This judgment is based on the business and technology of the entire industry. And the judgment made by the amount of data. In the process of contacting many companies, Sherman found that more and more business parties are beginning to be interested in the associations between data and entities, because they found that they can dig out very large business value. These associations are The problem to be solved by the graph database. Based on this idea, Sherman gathered a group of R&D personnel to start the R&D process. As a typical technical nerd, although Sherman has always wanted to start a business since college, he still feels that he is not suitable for starting a business.

"Our company has not had a sales classmate up to now. Most of them are R&D personnel. The open source distributed graph database we have made is a low-level technical product, which requires relatively high skills for R&D classmates, which makes it attractive There are so many outstanding talents, mainly because these people have technical feelings and are willing to accept technical challenges, hoping to make a product that can bring value to the community and society, while also reflecting their own value." .

In such a relatively pure R&D atmosphere, the entire technical team took 7 months to come up with Nebula Graph, an open source graph database project using the Apache 2.0 license, and gained 5.8 on GitHub in a short period of time. k Star, Meituan, Tencent, Xiaohongshu, JD Digital and many other companies found the Sherman team through the open source community, hoping to cooperate.

Speaking of the logic behind open source, Sherman did not hesitate to say: "The first day we did this project, we decided to open source it."

开源意味着代码需要经过社区全体开发者的检验,这也意味着后续需要进行长期的社区维护。Sherman 对此也有着自己的思考,我们决定做开源其实主要有三个原因:一是希望图数据库这个领域可以快速被大众所熟知,毕竟当时了解图数据库的人并不多,开源可以让用户更容易地使用起来;二是学过传统数据库的人都知道大学里是有相关课程设置的,但是图数据库并没有,一些图的基本理论也都属于数学范畴,开源可以让用户很容易学习,并且大家可以在开源社区里面互相交流、互相帮助;三是我们想做国际化,任何一个产品无论是从技术层面还是架构层面都不应该只针对国内市场,技术人可能多少都有一点情怀,希望自己做的项目可以为更多人带去力量和价值。因此,Nebula Graph 这个项目从第一行代码、第一行注释、第一份文档开始全部都是英文的。

如果说团队还有一点私心的话,基本就体现在 Nebula 的名字上了。“Nebula 意为星云,我们以此代指图中错综复杂的关系和节点”。

3千亿顶点、万亿条边、毫秒级查询延时

在 Nebula Graph 的 GitHub 项目主页上有一句很“霸气”的介绍:世界上唯一能够容纳千亿个顶点和万亿条边,并提供毫秒级查询延时的图数据库解决方案。

“其实,我们说的是实话,不是为了市场宣传的。”Sherman 表示,Nebula 项目设计之初就是为了解决大数据量的问题,所以起初的架构搭建对分布式的扩缩容及弹性做了深度规划。从实际应用来看,很多用户的数据量确实达到了千亿甚至万亿级别。查询效率达到毫秒级的主要原因是数据库从设计之初就针对 OLTP 场景也就是线上实时查询场景进行了规划,包括数据分区、查询执行计划等,每次查询的延时其实和总数据量并没有关系,因为仅查询单次任务中涉及的数据量,可能仅是一个小的子图,这就是 Nebula 可以在如此大的数据量级下保持毫秒级延时的原因。

与传统数据库类似,图数据库领域也分为 OLTP 和 OLAP 两类,OLTP 指的是提供在线查询的服务,其特点是对延时要求苛刻,同时并发量较大,比如金融风控场景,每秒的交易量可以达到上百笔甚至上千笔,且交易过程较快。从用户视角来看,无论是转账还是支付都希望在极短的时间内完成,这就极大压缩了风控的过程和时间,如果调用图数据库可能只需要几十毫秒就可以完成。

在这个群雄逐鹿的图数据库市场,有些产品是为了计算而生;有些产品是为了在线查询的低延时、高并发场景而生,Nebula 显然是后者。

Nebula Graph 项目地址:

https://github.com/vesoft-inc/nebula-graph/blob/master/README-CN.md

Nebula 2.0 版本发布:从社区中来

经过了开源社区的磨炼,Nebula Graph 的 2.0 GA 已发版,新版本 2.0 最大的特点就是从社区中来,解决了很多社区开发者的实际问题。

根据 Sherman 的介绍,2.0 版本最大的改动是支持 Neo4j 的查询语言 Cypher,几乎做到了 70% 的兼容,后续版本会逐步提高兼容度。原 Neo4j 的用户在数据量较大后,查询速度会变慢,且由于 Neo4j 是单机版无法保存大量增长的数据,所以很多用户希望从 Neo4j 迁移至其他更加适合自身需求的系统,甚至为此重写了 Query,Nebula Graph 新版本会让原 Neo4j 用户更加平滑地过渡到 Nebula Graph。

为了实现这一功能,整个团队对 Nebula Graph 的架构做了重构,虽然用户可能感知不到,但这对 Nebula 的后续发展至关重要,这让整个架构更加接近传统的 SQL 数据库,有了分析器、优化器、执行器等一整套组件,可以更加容易地支持不同的查询语言,对系统进行扩展和执行不同的调度策略,使得整个体系更加灵活,扩展性也变得非常强。

此外,团队还针对社区开发者呼声较高的需求进行了响应,新版本增加了全文检索功能;在原有 64 位整型支持的基础上新增了 String 类型,这些特性的增加为整个项目的后续发展打下了坚实的基础。

与此同时,整个项目持续开发云上服务。“做图数据库对资源的消耗是非常大的,因为需要进行各种计算,而云计算按需取用的特点是天然适合这一场景”。目前,Nebula Graph 的图数据库云服务平台 Nebula Graph Cloud Service 处于公测阶段,支持一键部署 Nebula Graph。而且,公测期间不会就云服务收取任何费用,欢迎开发者免费试用。

4基础软件的浪潮来了,图数据库的未来呢?

从宏观视角来看,图数据库属于基础软件,而基础软件是国家当下非常重视的领域。一般来说,基础软件可以分为操作系统,数据库,开发工具等。目前来看,Sherman 表示,中国的市场非常大,所有智能设备的装机量可以占到全球市场的四分之一到三分之一,所有的这些设备都需要操作系统、数据库的支持,其上应用的研发又需要开发链和开发工具,所以市场空间是非常巨大的,只是目前国产软件的市场占有率还是偏低的,这也意味着中国的基础软件公司还有很大的潜力。

到底什么样的公司可以脱颖而出,Sherman 认为简单一句话就是做市场需要的产品,这句话说起来容易,做起来并不简单。首先,软件需要具备一定的技术门槛,无论是谁都有能力开发的产品很难在市场上做大;其次,要做符合市场需求的产品,反之技术再牛也无法得到用户的认可;最后,聆听社区的声音,感知用户的真实反馈,并在技术上不断提高,逐步建立自己的技术壁垒才可能成功。

作为数据库的分支,图数据库本身的市场占有率还不足 2%。究其原因,Sherman 表示其实图数据库的需求就在最近 2-3 年比较旺盛,尤以金融领域为最。金融领域存在着大量账户之间的关联关系,这些关系组成了一张非常大的网络增加了风控的难度,单靠用户画像做出判断是远远不够的,图数据库的出现可以更好地解决这一问题。从另外一个角度来说,仅用两三年的时间可以在发展了四五十年的数据库领域占据 2% 的市场已然非常快了。

在基础软件全面崛起的浪潮下,图数据库存在巨大的市场空间。根据分析机构的预测,到 2025 年,图数据库将占据整个数据库市场 9% 至 10% 的份额,这其实是一个非常大的数字,基本是以每年 50% 至 60% 的速度递增,虽然目前的占有率还不高,但可以认为处于爆发的前夜。

目前,图数据库还存在许多挑战需要解决,比如数据的完备性、一致性,分布式事务的支持以及 AP 和 TP 融合等。对 Nebula 而言,本身更多偏向于 TP,未来要想做到真正的 AP 和 TP 的有机融合在技术上还面临着非常大的挑战。


Guess you like

Origin blog.51cto.com/15057858/2675969