"Use the right tool for the job!"
This statement sounds simple at first, but it is very complicated to implement in practical terms.
Early-stage startups find it difficult to choose among the various tools available in the ecosystem because how their data will evolve is very unpredictable.
The need for a modern data stack
Over the past 10 years, the software industry has grown in the following ways:
-
Computing Power: Public cloud providers like AWS, Google Cloud, etc. provide huge computing power at standard market cost.
-
Data sources: The rise of the Internet of Things ecosystem and smart devices has led to an exponential increase in the amount of data generated every day. In 2020, every person on the planet generates about 1.7MB of data every second.
-
Data literacy for business stakeholders: In the original software industry, analysts used to manually dig through excel spreadsheets to gain some valuable insights about the data. Today, many BI tools are proving useful in harnessing the power of data and providing valuable insights, thereby building literacy among business stakeholders.
-
Open source adoption in data projects: Over the past 10 years, the industry has seen tremendous growth in the open source community. Lots of cool data tools (~Apache Airflow, DBT, Metabase) are thriving and growing in the open source community.
The transition from traditional ETL to modern ELT
In this modern era where most enterprises are leveraging data-driven solutions, we are seeing a consistent shift from raw legacy ETL architectures to ELT architectures.
Modern ELT processing is preferred over traditional ETL for the following reasons:
-
Cheap, affordable and efficient cloud storage and analysis services.
-
Traditional ETL pipelines are not as flexible and cannot easily adapt to exponential data growth.
-
Compared to traditional ETL, modern ELT is faster because there is no strict transformation phase involved before loading the data into the warehouse.
-
Given that no user-defined transformations are required, ELT tools are very good at simply plugging source data into the target system with minimal manual effort from the user.
-
Analysts can use tools such as DBT to perform transformations on data in the warehouse as needed, without prior consideration of insights and data types.
Adoption Strategies for Startups
As mentioned earlier in this blog, it is difficult for startups to predict the evolution of data and they will have to deal with it.
Therefore, early-stage startups should consider the following when selecting tools for their data stack:
-
High adoption and awareness from other startups and customers.
-
This fits the ELT model of the data stack.
-
A database paradigm (e.g. structured, geospatial, entity-relationship, search engine) appropriate to the requirements of storing and querying the data generated by its domain and market.
-
An equivalent open source alternative to paid SaaS tools.
extract and load
Collect data from all event sources such as web, application, backend services and send them to data warehouse.
-
Paid SaaS tools: Stitch, Fivetran
-
Free and open source alternatives: Singer, Meltano, Airbyte
database
A structured, non-volatile, single source of truth for all organizational data where we can store and query it all.
-
Paid: AWS Redshift, Google BigQuery, Snowflake
-
Free and Open Source Alternative: Apache Druid
Transformation and Modeling
Use documentation to create models from raw data for better use.
-
Paid: Dataform, DBT
-
Free and Open Source Alternatives: Talend Open Studio, Apache NiFi
arrangement
Software for executing and orchestrating jobs that process streams of data.
-
Paid: Prefect.io
-
Free and Open Source Alternatives: Apache Airflow, Dagster
Visualization and Analysis
To better understand and interpret data from different data sources.
-
Paid: Tableau, Microsoft PowerBI, Grafana
-
Free and open source alternatives: Metabase, D3js, DyGraphs
This article: https://architect.pub/modern-data-stack-startups | ||
Discussion: Knowledge Planet [Chief Architect Circle] or add WeChat trumpet [ca_cto] or add QQ group [792862318] | ||
No public |
【jiagoushipro】 【Super Architect】 Brilliant graphic and detailed explanation of architecture methodology, architecture practice, technical principles, and technical trends. We are waiting for you, please scan and pay attention. |
|
WeChat trumpet |
[ca_cea] 50,000-person community, discussing: enterprise architecture, cloud computing, big data, data science, Internet of Things, artificial intelligence, security, full-stack development, DevOps, digitalization. |
|
QQ group |
[285069459] In-depth exchange of enterprise architecture, business architecture, application architecture, data architecture, technical architecture, integration architecture, security architecture. And various emerging technologies such as big data, cloud computing, Internet of Things, artificial intelligence, etc. Join the QQ group to share valuable reports and dry goods. |
|
video number | [Super Architect] Quickly understand the basic concepts, models, methods, and experiences related to architecture in 1 minute. 1 minute a day, the structure is familiar. |
|
knowledge planet | [Chief Architect Circle] Ask big names, get in touch with them, or get private information sharing. | |
Himalayas | [Super Architect] Learn about the latest black technology information and architecture experience on the road or in the car. | [Intelligent moments, Mr. Architecture will talk to you about black technology] |
knowledge planet | Meet more friends, workplace and technical chat. | Knowledge Planet【Workplace and Technology】 |
Harry | https://www.linkedin.com/in/architect-harry/ | |
LinkedIn group | LinkedIn Architecture Group | https://www.linkedin.com/groups/14209750/ |
Weibo | 【Super Architect】 | smart moment |
Bilibili | 【Super Architect】 | |
Tik Tok | 【cea_cio】Super Architect | |
quick worker | 【cea_cio_cto】Super Architect | |
little red book | [cea_csa_cto] Super Architect | |
website | CIO (Chief Information Officer) | https://cio.ceo |
website | CIOs, CTOs and CDOs | https://cioctocdo.com |
website | Architect practical sharing | https://architect.pub |
website | Programmer cloud development sharing | https://pgmr.cloud |
website | Chief Architect Community | https://jiagoushi.pro |
website | Application development and development platform | https://apaas.dev |
website | Development Information Network | https://xinxi.dev |
website | super architect | https://jiagou.dev |
website | Enterprise technical training | https://peixun.dev |
website | Programmer's Book | https://pgmr.pub |
website | developer chat | https://blog.developer.chat |
website | CPO Collection | https://cpo.work |
website | chief security officer | https://cso.pub |
website | CIO cool | https://cio.cool |
website | CDO information | https://cdo.fyi |
website | CXO information | https://cxo.pub |
Thank you for your attention, forwarding, likes and watching.