Ali cloud-based data warehouse architecture

Ali cloud-based data warehouse architecture

product comparison

Ali cloud offerings Similar products Brief introduction
RDS MySQL、PostgreSQL Relational database service, provided by Ali cloud database, there are various versions, such as MySQL version, PostgreSQL version, SQLServer version, etc.
DTS Canal、DataX、Sqoop、Flume Data transmission services, feature-rich, including the collection of data migration, data subscription, real-time data synchronization capabilities for RDMS, NoSQL, Big Data, and other products
DataHub Kafka Data bus, and Kafka main function is similar, but there are more interfaces, functions
MaxCompute Hadoop GM offline computing platform (formerly known as ODPS), support for SQL, MapReduce, UDF, Graph, Spark on MaxCompute peer computing model. Fu is the scheduling system, the storage system is Pangea
RealtimeCompute Spark, Flink Real-time computing framework (the previous version is StreamCompute), based on the underlying Blink
DataWorks - Visualization of large data-stop workshops, including data integration, development, management, service, quality, safety and other functions, specifically to facilitate your use MaxCompute, RealtimeCompute
AnalyticDB GreenPlum、LibrA Analytical database, based on the MPP architecture, including the version of MySQL, PostgreSQL version
DataV Table, PowerBI Visual data presentation tools, mainly big-screen display
QuickBI Table, PowerBI Compared to DataV more flexible, mainly to do data analysis, operations, analysts use more

Offline number of positions

  • Architecture design
    Offline number of bins architecture of FIG.
  • Explanation
    • Raw data mainly from two parts
      • User behavior log data generated by the server
      • Business database generated data
      • Of course, you can also import a variety of data, such as web crawler data, market data, and so on purchase
    • Data lead-in portion
      • Flume import log data using either DataHub (TailDirSource + MemoryChannel + DataHubSink)
      • Business synchronous data directly into the platform to take advantage of MaxCompute
    • Data warehouse building component, needs to be divided into multiple layers
      • ODS (raw data layer) - Raw data, only the most simple format checking, and data compression
      • DWD (data of detail) - the data level of detail, various ETL needs cleaning, extraction, separation, dimensionality reduction, to give entity table, dimension tables, fact table
      • DWS (summary data layer) - do a mild aggregation of detail for the data, and a variety of statistical indicators preliminary summary, convenient application layer directly behind
      • ADS (application layer data) - application layer is the final data results, including various types of indicators will eventually need, but also need to import into a relational database for easy end Web Query
    • Database analysis
      • This part can be selected AnalyticDB, RDS or self-built relational database, it can be, mainly to facilitate follow-up system query
      • If the amount of data is small, a small amount of analysis, the direct use of RDS or self-built relational database can be
      • If the data analysis due to business needs require a lot of changes, it is recommended to use AnalyticDB
    • Data show part
      • Ali selected according to the needs of QuickBI or design their own customized Web interface can display data

Real-time warehouse number

  • Architecture design
    The number of real-time warehouse architecture diagram
  • Explanation
    • Raw data mainly from two parts
      • User behavior log data generated by the server
      • Business database generated data
    • Data lead-in portion
      • Flume import log data using either DataHub (TailDirSource + MemoryChannel + DataHubSink)
      • Real-time traffic data need to use DTS to import DataHub
    • Data warehouse construction section, you can use the Kappa architecture (two links traditional Lambda architecture reduced to a lower maintenance cost)
      • DataHub raw data into first, followed by the washing, the association RealtimeCompute, detailed data in real time
      • Real-time data into DataHub detail, followed by a mild RealtimeCompute, highly polymerized, aggregated data in real time
      • Real-time data is aggregated into the DataHub (can also go directly to the analysis library), and then imported into AnalyticDB
    • Analysis of the database (recommended with off-line part, but still more recommended AnalyticDB)
      • This section summarizes the data obtained from the library front DataHub
        • Generating statistical results may then be performed within the application layer data directly to display
        • Or handed over to a subsequent self-service application calls analysis (analysis of the situation for a variety of constantly changing)
    • Data show part
      • This part of the same number of offline storage, but usually are done in real-time part of the large-screen display, includes all kinds of statistical indicators can be used directly Ali DataV
Published 151 original articles · won praise 70 · views 190 000 +

Guess you like

Origin blog.csdn.net/alionsss/article/details/105130469