Big data weekly meeting - summary of learning content this week 015

Meeting time: 2023.05.28 15:30 offline meeting

Table of contents

01【fhzny project】

02【Spark】

03【Research-Data Warehouse Construction】

3.1 [Data Warehouse Construction, Flowchart, Architecture Diagram, Usage Scenario]

scene selection

Component Design

build process

04【Patent】

05 [Tutor Comments]


01【fhzny project】

  1. GitLab
  2. MyBatis-Plus
  3. Springboot, dark horse St. Regis takeaway project video
  4. Algorithm module and mirror module code
  5. docker

02【Spark】

sparkSql

03【Research-Data Warehouse Construction】

Share "data warehouse construction", flow chart, architecture diagram, usage scenarios, five minutes.

Data warehouse construction (scenarios ( real-time, offline ) , components, processes) (second week) [metadata management, master data]

3.1 [Data Warehouse Construction, Flowchart, Architecture Diagram, Usage Scenario]

Data Warehouse is a storage system for integrating, managing and analyzing internal and external data of an organization. The process of data warehouse construction involves many aspects, including scene selection (real-time and offline), component design and construction process. The following is an overview of common data warehouse construction.

scene selection

  1. Real-time scenario (Real-time): Applicable to situations that require fast access to the latest data and real-time analysis and decision-making. This scenario usually involves data flow processing and streaming computing, requiring low latency and high throughput.
  2. Offline scenario (Offline): Applicable to batch analysis and decision support for historical data. This scenario usually uses batch jobs and offline computing, which can handle large-scale data sets.

Component Design

  1. Data extraction (Extraction): Extract data from various data sources (such as databases, log files, APIs), and perform necessary cleaning and transformation to meet the requirements of the data warehouse.
  2. Data storage (Storage): Select the appropriate storage technology and architecture, such as relational database, columnar database or distributed file system, to store data in the data warehouse.
  3. Data transformation and integration (Transformation and Integration): Transform and integrate the extracted data for analysis. This includes operations such as data cleansing, format conversion, field mapping, and more.
  4. Loading: Load the converted and integrated data into the data warehouse to ensure data integrity and consistency. You can use bulk loading or incremental loading.
  5. Data modeling (Modeling): Design and create logical models of data warehouses, including dimensional models and fact models. This helps provide a user-friendly way of accessing and analyzing data.

build process

  1. Requirements analysis: clarify business requirements and data analysis goals, and determine the types and sources of data that need to be collected and analyzed.
  2. Data source identification and access: determine the data sources that need to be accessed, and formulate corresponding data access strategies and technical solutions.
  3. Data extraction and cleaning: implement the process of data extraction and cleaning to ensure the accuracy and consistency of data.
  4. Data conversion and integration: convert and integrate the cleaned data to form a unified data model.
  5. Data storage and loading: choose the appropriate storage technology and loading method, and load the integrated data into the data warehouse.
  6. Data modeling and optimization: Design and create the logical model of the data warehouse, model and optimize the data to meet the query and analysis needs of users.
  7. Data access and analysis: providing user-friendly data

04【Patent】

Make a tool to realize the complete consistency of mysql and es data.

Linux, a three-node es cluster.

Write a data table containing json type fields in mysql, after inserting the data in mysql into es, the effect of inserting the json data in mysql into es is as follows:

05 [Tutor Comments]

Model innovation, model transformation and innovation, and a new application scenario.

  1. Algorithm innovation
  2. scene innovation

mysql->es

Canal records data changes, adds triggers, and writes to another table once the table changes.

canal change configuration files, many companies do not allow configuration files to be changed.

An algorithmic thing that quickly finds out whether the number of data items is the same.

es—>es

flink-cdc, change binLog, the enterprise does not allow changes.

Data backup from es to es, fast positioning - binary search, quantity check,

Cluster data backup

Data backup and migration and data changes of college entrance examination big data.

System, Analysis Model, xxx.

Guess you like

Origin blog.csdn.net/weixin_44949135/article/details/130914859