【Modern Data Architecture】Modern Data Stack for Startups

76b02d923ddc9dc093e124db82d7ceba.png

"Use the right tool for the job!"


This statement sounds simple at first, but it is very complicated to implement in practical terms.
Early-stage startups find it difficult to choose among the various tools available in the ecosystem because how their data will evolve is very unpredictable.

The need for a modern data stack


Over the past 10 years, the software industry has grown in the following ways:

  • Computing Power: Public cloud providers like AWS, Google Cloud, etc. provide huge computing power at standard market cost.

  • Data sources: The rise of the Internet of Things ecosystem and smart devices has led to an exponential increase in the amount of data generated every day. In 2020, every person on the planet generates about 1.7MB of data every second.

  • Data literacy for business stakeholders: In the original software industry, analysts used to manually dig through excel spreadsheets to gain some valuable insights about the data. Today, many BI tools are proving useful in harnessing the power of data and providing valuable insights, thereby building literacy among business stakeholders.

  • Open source adoption in data projects: Over the past 10 years, the industry has seen tremendous growth in the open source community. Lots of cool data tools (~Apache Airflow, DBT, Metabase) are thriving and growing in the open source community.


The transition from traditional ETL to modern ELT


In this modern era where most enterprises are leveraging data-driven solutions, we are seeing a consistent shift from raw legacy ETL architectures to ELT architectures.

54dec12ad73dc9ec660a4bc6bb75f408.jpeg

Modern ELT processing is preferred over traditional ETL for the following reasons:

  • Cheap, affordable and efficient cloud storage and analysis services.

  • Traditional ETL pipelines are not as flexible and cannot easily adapt to exponential data growth.

  • Compared to traditional ETL, modern ELT is faster because there is no strict transformation phase involved before loading the data into the warehouse.

  • Given that no user-defined transformations are required, ELT tools are very good at simply plugging source data into the target system with minimal manual effort from the user.

  • Analysts can use tools such as DBT to perform transformations on data in the warehouse as needed, without prior consideration of insights and data types.


Adoption Strategies for Startups


As mentioned earlier in this blog, it is difficult for startups to predict the evolution of data and they will have to deal with it.
Therefore, early-stage startups should consider the following when selecting tools for their data stack:

  • High adoption and awareness from other startups and customers.

  • This fits the ELT model of the data stack.

  • A database paradigm (e.g. structured, geospatial, entity-relationship, search engine) appropriate to the requirements of storing and querying the data generated by its domain and market.

  • An equivalent open source alternative to paid SaaS tools.

extract and load


Collect data from all event sources such as web, application, backend services and send them to data warehouse.

  • Paid SaaS tools: Stitch, Fivetran

  • Free and open source alternatives: Singer, Meltano, Airbyte


database


A structured, non-volatile, single source of truth for all organizational data where we can store and query it all.

  • Paid: AWS Redshift, Google BigQuery, Snowflake

  • Free and Open Source Alternative: Apache Druid


Transformation and Modeling


Use documentation to create models from raw data for better use.

  • Paid: Dataform, DBT

  • Free and Open Source Alternatives: Talend Open Studio, Apache NiFi


arrangement


Software for executing and orchestrating jobs that process streams of data.

  • Paid: Prefect.io

  • Free and Open Source Alternatives: Apache Airflow, Dagster


Visualization and Analysis


To better understand and interpret data from different data sources.

  • Paid: Tableau, Microsoft PowerBI, Grafana

  • Free and open source alternatives: Metabase, D3js, DyGraphs

This article: https://architect.pub/modern-data-stack-startups
Discussion: Knowledge Planet [Chief Architect Circle] or add WeChat trumpet [ca_cto] or add QQ group [792862318]
No public
 
【jiagoushipro】
【Super Architect】
Brilliant graphic and detailed explanation of architecture methodology, architecture practice, technical principles, and technical trends.
We are waiting for you, please scan and pay attention.
WeChat trumpet
 
[ca_cea]
50,000-person community, discussing: enterprise architecture, cloud computing, big data, data science, Internet of Things, artificial intelligence, security, full-stack development, DevOps, digitalization.
 

QQ group
 
[285069459] In-depth exchange of enterprise architecture, business architecture, application architecture, data architecture, technical architecture, integration architecture, security architecture. And various emerging technologies such as big data, cloud computing, Internet of Things, artificial intelligence, etc.
Join the QQ group to share valuable reports and dry goods.

video number [Super Architect]
Quickly understand the basic concepts, models, methods, and experiences related to architecture in 1 minute.
1 minute a day, the structure is familiar.

knowledge planet [Chief Architect Circle] Ask big names, get in touch with them, or get private information sharing.  

Himalayas [Super Architect] Learn about the latest black technology information and architecture experience on the road or in the car. [Intelligent moments, Mr. Architecture will talk to you about black technology]
knowledge planet Meet more friends, workplace and technical chat. Knowledge Planet【Workplace and Technology】
LinkedIn Harry https://www.linkedin.com/in/architect-harry/
LinkedIn group LinkedIn Architecture Group https://www.linkedin.com/groups/14209750/
Weibo‍‍ 【Super Architect】 smart moment‍
Bilibili 【Super Architect】

Tik Tok 【cea_cio】Super Architect

quick worker 【cea_cio_cto】Super Architect

little red book [cea_csa_cto] Super Architect  

website CIO (Chief Information Officer) https://cio.ceo
website CIOs, CTOs and CDOs https://cioctocdo.com
website Architect practical sharing https://architect.pub   
website Programmer cloud development sharing https://pgmr.cloud
website Chief Architect Community https://jiagoushi.pro
website Application development and development platform https://apaas.dev
website Development Information Network https://xinxi.dev
website super architect https://jiagou.dev
website Enterprise technical training https://peixun.dev
website Programmer's Book https://pgmr.pub    
website developer chat https://blog.developer.chat
website CPO Collection https://cpo.work
website chief security officer https://cso.pub    ‍
website CIO cool https://cio.cool
website CDO information https://cdo.fyi
website CXO information https://cxo.pub

Thank you for your attention, forwarding, likes and watching.

Guess you like

Origin blog.csdn.net/jiagoushipro/article/details/131278712