Real-time data to calculate the evolution of architecture

Traditional data infrastructure
biggest feature is the conventional single data architecture centralized data storage, into most computing and storage layers.

Storage layer, is mainly responsible for the data storage company produces a variety of systems, such as the number of active users order volume Web business systems, order systems, CRM systems, ERP systems, monitoring systems, such as data system, website, each user turnover.
All operations are required to achieve with the aid of a database.
Monomers initial architecture is very efficient, but as time goes on, more and more services on-line iterative soon.
But as more and more post-service system gradually becomes bloated. Database into a single source of accurate data, each application needs to access the database to retrieve the corresponding data, if the database is changed or there is a problem, the whole business system will be affected.

Micro-service architecture
micro-system services will be split into different independent service modules, each module has its own independent database, do not interfere with each other between different services, micro-service architecture solves the problem of expansion of business systems, but also along It brought new problems.
Business data is too dispersed in different systems, it is difficult to centralized data management. For use as an internal corporate data warehouse, data mining, it is necessary to extract each business system database data into the data warehouse, data is decimated in the data warehouse, transform and load (the ETL), to construct different data marts applications, services provided to systems.

Big data architecture
at first, the data is built on a relational database, but with the amount of enterprise data explosion, relational database has been unable to prop up the store and analyze large data sets, so based on HADOOP build enterprise-class big data platform has become the consensus.
Later, off the high latency gradually unable to meet the business needs, such as some time of high demand applications, real-time reporting statistics, it requires very low latency to show results. For this reason the industry put forward a set of lambda architectural approach to handle different types of data.

包含了批量计算的 Batch Layer和实时计算的 Speed Layer,通过在一套平台中,将批计算和流计算结合在一起。
lambda 架构是构建大数据应用程序的一种很有效的解决方案,但还不是最完美的方案

有状态流式架构
数据产生的本质,其实是一条条真实存在的事件,而前面讲的不同的架构所用到的技术,如hadoop,spark,多少都在一定程度上违背了这种本质,需要在一定延时的情况下对业务数据进行处理。
而有状态的流计算架构,基于实时的流式数据,维护所有计算过程的状态,所谓状态就是计算过程中产生的所有中间计算结果,每次计算新的数据进入到流式系统中都是基于中间状态结果的基础上进行计算,最终产生正确的统计结果。

这种架构好处是,不需要从原始数据重新从外部存储中拿出来,从而进行全量计算;另外用户也无需协调各种批量计算工具,从数据仓库中获取统计结果,然后再落地存储,这些操作全部都可以基于流式操作来完成

Guess you like

Origin www.cnblogs.com/nicekk/p/11546384.html