MotherDuck, from SQLite to Docker in the data world

first encounter

The other day, I stumbled across motherduck.com , a site that is too modest .

But behind it is an all-star team.

Looking at their tagline "Data Infrastructure and Analytics" (data infrastructure and analysis) and team background, it seems to be challenging Snowflake / Databricks? But if there is another cloud-native data warehouse that claims to have "performance/price advantages", it sounds a bit boring. Also, MotherDuck is an unusual name, why was it chosen? After reading about their $47.5 million Series A round , and then checking out their new official website, it dawned on me.

name

The name MotherDuck comes from DuckDB, an analytical database like the SQLite architecture. MotherDuck commercialized the open source DuckDB, which is now the standard routine for infra startups.

declaration

At first, I was confused about Serverless. Although serverless is an overused term, most people still associate it with cloud computing. MotherDuck packaged itself as Serverless, but told everyone not to wait for the cloud (Why wait for the cloud?), but they may still provide cloud services in the end (who else wouldn't? ). Commonly speaking, serverless means that the cloud service provider hides the existence of the server. The servers are still there, but users don't need to care about them anymore. However, MotherDuck's Serverless is a different story: they don't have a server at all, because the underlying DuckDB is just an embeddable library, not a standalone server . So the more accurate term here should be No Server.

Data democratization

Snowflake introduced new ideas to separate computing and storage. This architectural innovation gave them a huge competitive advantage. Although from a product perspective, they are still tightly coupled because the data is locked into the Snowflake platform. MotherDuck is different. Suppose you have a single data file: whether in Parquet, CSV, SQLite, or some other format, that file is stored on your local disk, on S3, on GitHub, or wherever. Then you mount the file from your computing environment with MotherDuck and you have a powerful tool to analyze the file. Because of MotherDuck's zero dependencies (thanks to SQLite-inspired DuckDB), it only takes a few seconds to get the MotherDuck binary (even after including it in the precompiled distribution, this step is omitted).

Snowflake separates computing and storage, while MotherDuck connects computing to storage. With MotherDuck, as long as you have access to data files, you have analytical capabilities, on top of which, you can also build data solutions.

For example: OSS Insight , a website that captures GitHub-related events in real time and provides insights. Although its technology stack has been simplified by adopting TiDB , MotherDuck can do similar things with a simpler technology stack in the future.

Because of MotherDuck, all of your application's runtime-dependent artifacts can be stored in a single file.

Of course this is not a new idea. The legendary Hypercard first introduced this practice, storing an independent application and data in a file.

There is also the venerable FileMaker Pro , where the entire application and data are stored in a single `.fmp12` file.

However, MotherDuck can take this idea to the next level because of their ability to efficiently handle terabytes of data, define open protocols between datasets/MotherDuck Engine/MotherDuck Platform, integrate with GitHub to facilitate collaboration, and more.

MotherDuck will start to compete with Snowflake in all aspects as soon as it comes up. Data teams still need Fivetran to move data; dbt  to transform data; even meltano  to assemble data platforms; but for analytics tasks, they will have to decide where to run them from: Snowflake Cloud or any computing environment with MotherDuck?

And when looking further afield, everything has not yet been cultivated. MotherDuck could unleash a whole new class of data solutions. Only sky is the limit.

MotherDuck's idea was novel and the timing was perfect. The field of data analytics is waiting for the next paradigm shift. Practitioners will be happy with a better Snowflake, and visionaries will ask for more. MotherDuck wisely chose a different battleground rather than a performance/cost spin.

 

write at the end

In the past, in order to obtain the ability to process data, everyone could only hand over the data to Snowflake (or similar manufacturers). And with MotherDuck, everyone has similar processing power, and at the same time they can control their data: they can store it wherever they want, share it only with who they want to share it with, and share it with others when they need it. Use it anytime.

Docker as a runtime and a standard that makes applications ubiquitous, MotherDuck can also be a runtime and a standard that makes data usage ubiquitous.

And this time, MotherDuck must be better prepared than Docker, Inc to get the lion's share of the vision when it comes to fruition.

Good luck to the MotherDuck team. Curious if Ducker is a better name :)

 

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/6148470/blog/5597354