Data Governance Professional Certification CDMP Study Notes (Thinking Guide Data Governance Professional Certification CDMP Study Notes (Mind Map and Knowledge Points) - Chapter 11 Data Warehouse and Business Intelligence...

Hello everyone, I am Dugufeng, a former port coal worker, currently working as the person in charge of big data in a state-owned enterprise, and the manager of the big data flow of the official account. In the last two years, because of the needs of the company and the development trend of big data, I began to learn about data governance.

Data governance requires systematic learning to truly master, and professional examination certification is also required to prove one's learning ability and knowledge mastery in data governance. If you have any questions about data governance and data governance certification CDMP, you can refer to my previous article for a detailed introduction.

5000 words explain how to get started with data governance (with international data governance certification exam-CDMP study group)

What exactly is CDMP - a super-comprehensive introduction to the international certification of data governance

Total text: 5935 words 12 pictures

Estimated reading time: 15 minutes

This document is based on the collation of learning materials related to data governance, and is collated for the study notes (mind map and knowledge points) of the data governance professional certification CDMP .

The article is long, it is recommended to read after bookmarking.

For subsequent documents, please pay attention to the big data flow of the official account , and will continue to update~

This document is part of data warehouse and business intelligence , which is divided into 5 parts.

Due to the display of the page, some levels cannot be fully expanded. The structure is shown in the figure below.

0848921533d680c2c0919cd1ae900992.png

1. Overview of Data Warehousing and Business Intelligence

Data Warehouse (DW) : Started in the 1980s, developed in the 1990s, and then developed together with Business Intelligence (BI) as the main driving force for business decision-making. Empowering organizations to integrate data from different sources into a common data model, the integrated data can provide insight into business operations, and open up new possibilities for enterprise decision support and creation of organizational value.

A data warehouse provides a way to reduce data redundancy, improve information consistency, and enable businesses to use data to make better decisions.

The data warehouse is recognized as the core of enterprise data management.

Business Drivers : Operational support functions, compliance needs, and business intelligence activities.

50ad0094c05ec74d0e3f3ecad5ec8788.png

The goals of data warehouse construction : 1) Support business intelligence activities. 2) Empower business analysis and efficient decision-making. 3) Find innovative methods based on data insights.

Data warehouse construction should follow the principles : 1) Focus on business objectives. 2) Begin with the end in mind. 3) Overall thinking and design, local action and construction. 4) Summarize and continue to optimize, rather than doing it from the beginning. 5) Promote transparency and self-service. 6) Establish metadata with the data warehouse. The key to the success of DW is the ability to interpret the data accurately. 7) Synergy . Collaborate with other data activities, especially data governance, data quality, and metadata management activities. 8) Don't be cookie cutter . Provide the right tools and products for every type of data consumer.

For the convenience of understanding, organize the mind map of this part as follows:

a3c76e471c7a5555a16c872b02833e16.png

2. Basic concepts

1. Business intelligence

Business Intelligence : The first level of meaning, business intelligence refers to a data analysis activity that understands organizational demands and seeks opportunities . The results of data analysis are used to improve the success rate of organizational decision-making. In the second meaning, business intelligence refers to the collection of technologies that support such data analysis activities .

Business + Technology.

2. Data Warehouse

Data Warehouse: An integrated decision support database and associated software programs for collecting, cleaning, transforming and storing data from various operations and external sources .

A data warehouse is broadly defined to include any data storage or retrieval operation that provides data for the realization of business intelligence objectives.

An enterprise data warehouse (EDW) is a centralized data warehouse.

A data mart is a copy of a subset of data in a data warehouse.

3. Data warehouse construction

The operation process of data extraction, cleaning, transformation, control, loading and so on in the data warehouse. The focus of the data warehouse construction process is to implement an integrated and historical business environment on operational data by enforcing business rules and maintaining appropriate business data relationships. Also includes processes for interacting with metadata repositories . Construction in the traditional sense has focused on structured data.

4. Data warehouse construction method

The method of data warehouse construction. Two thought leaders, Bill Inmon and Ralph Kimball, approach data warehouse modeling using paradigm modeling and multidimensional modeling, respectively.

Bill Enmen defined in "Building the Data Warehouse": A data warehouse is a subject-oriented, integrated, time-related, and unmodifiable collection of data in enterprise management and decision-making.

Ralph Kimball proposed in "The Data Warehouse Toolkit" (The Data Warehouse Toolkit): Advocating a bottom-up (DMDW) approach and pushing for the construction of a data mart, which he defined as "a custom-made data mart for query and analysis A copy of the transaction data.

The core concepts they follow are similar: 1) The data stored in the data warehouse comes from other systems. 2) Integrate data in a way that increases its value. 3) Facilitate data access and analysis. 4) All in order to allow authorized stakeholders to access reliable, integrated data. 5) The construction purpose covers workflow support, operation management and predictive analysis.

5. Enterprise Information Factory (Inmon)

Enterprise information factory CIF is one of the data warehouse construction models, the difference between a data warehouse and a business system.

1. Subject-oriented. 2. Integrated. 3. Changes over time. 4. Stable.

5. Aggregated data and detailed data. 6. Historical.

Corporate Information Factory (Corporate Information Factory, CIF) consists of: 1) applications. 2) Data temporary storage area 3) Integration and conversion. 4) Operational Data Storage (ODS). 5) Data mart. 6) Operational Data Mart (OpDM). Operational data marts are data marts focused on operational decision support. Obtaining data directly from an operational data store, rather than from a data warehouse, has the same characteristics as an operational data store: it contains current or recent data, which is frequently changing. 7) Data Warehouse. One-way flow to the data mart. 8) Operation report. Operational reports are output from the data store. 9) Reference data, master data and external data.

3cb4c7b39566dc4b1c8ea9509a96a68e.png

6. Multidimensional data warehouse (Kimball)

Multidimensional data warehouse (Kimball): star schema, consisting of fact table (contains quantitative data about business processes, such as sales data) and dimension table (stores descriptive attributes related to fact table data, and answers questions about fact table for data consumers questions, such as how much product X was sold this quarter). The fact table is associated with many dimension tables, and the whole picture looks like a star.

e6ddd83fbb672d395b9f4633937b6859.png

The bus matrix of a data warehouse shows the intersection of business processes that generate factual data and data subject domains that represent dimensions . Technology-independent, used to represent the content requirements of data warehouse/BI system long-term data, helping organizations determine the scope of manageable development work.

The multidimensional data warehouse is more scalable than Inmon's data warehouse, and the data warehouse includes all components of the data temporary storage and data display areas.

Kimball's data warehouse is divided into four parts: business source system, data temporary storage area, data display area, and data access tools.

1. Business source system. Operational, transactional applications in the enterprise.

2. Data temporary storage area. Include processes that need to be integrated and transformed data for presentation.

3. Data display area. It is similar to the data mart in the enterprise information factory.

4. Data access tools. Focus on the data needs of end users.

7. Data Warehouse Architecture Components

A data warehouse environment consists of a series of architectural components organized to meet the needs of the enterprise.

1. Source system

Includes business systems and external data to flow into data warehouses, business intelligence.

a68613ddd66524e425fc4264278084d5.png

2. Data integration

Data integration includes extraction, transformation and loading.

3. Data storage area

The data storage area includes: 1) Temporary storage area . An intermediate data storage area between the original data source and the centralized data repository. 2) Consistency dimension of reference data and master data . 3) Central data warehouse . The design elements of the data structure include: ① The relationship between the business primary key and the surrogate primary key designed based on performance considerations. ② Create indexes and foreign keys to support dimension tables. ③Change Data Capture (CDC) technology for detecting, maintaining and storing historical records. 4) Operational data storage ODS . Operational data stores contain data for a window of time rather than the full history, and thus can be refreshed more frequently than data warehouses. 5) Data mart. Target a specific subject area, a single department, or a single business process. 6) Data cube Cubes.

8. How to load data

1. Historical data

Historical data processing: 1. Inmon-type data warehouses suggest that all data be stored in a single data warehouse layer. Cleaned, normalized, and governed atomic-level data is stored in this layer. 2. The Kimball-type data warehouse proposes that the data warehouse be amalgamated from departmental data marts that contain cleansed, standardized, and governed data. The data mart will store historical records at the atomic level, and the consistent dimension tables and consistent fact tables will provide enterprise-level information. 3. Data Vault, as part of the data temporary storage process, also performs data cleaning and standardization. Historical data is stored in a standardized atomic structure, and each dimension defines a surrogate key (Surrogate key), a primary key (Primary key), and an alternate key (Alternate key).

2. Batch change data capture

Bulk change data capture. The data warehouse is a data loading service through a batch window every night. Because different source systems may require different change capture techniques, the loading process can include various change detections.

Differences between various change data capture techniques.

3831188758e0df58591253d99e630512.png

Quasi-real-time and real-time data loading: 1) Trickle loading (accumulation at the source) . Unlike nightly window batch loading, it does bulk loading at a more frequent cadence or threshold. 2) Message transfer (bus accumulation) . Small datagrams are sent to the message bus, and the target system subscribes to the bus. 3) Streaming (destination accumulation) . The target system collects data using a buffer or queue and processes it sequentially.

For the convenience of understanding, organize the mind map of this part as follows:

63eda7470e43641d2a632b3e2709304f.png

3. Activities

【Activity 1】Understand the needs.

Start by considering business goals and business strategy, identifying the business domain and framing the scope. Then identify and interview relevant business people to understand what they want to do and why they do it, record their concerns and how to classify information.

Develop Vision + Align Business Strategy + Valuable Needs

【Activity 2】Define and maintain data warehouse/business intelligence architecture.

1. Determine the data warehouse/business intelligence technical architecture . It should be able to support transaction-level and operational-level reporting requirements in an atomic data processing manner. Good prototyping can quickly prove or refute the realization of key requirements, avoiding excessive investment in certain technologies or architectures.

2. Identify data warehouse/BI management processes . Production management by coordinating and integrating maintenance processes with regular releases to business teams. Establish an effective release process that ensures management understands that this is a proactive process centered around data products, not a reactive problem-solving approach to installed products.

[Activity 3] Develop data warehouse and data mart.

There are three concurrent construction trajectories for a data warehouse/business intelligence construction project:

1) Data. Data necessary to support business analysis. Identify best sources, design rules, handle unexpected data.

2) Technology. Back-end systems and processes that support data storage and migration.

3) Business intelligence tools.

Contents: 1. Map source to target. The kinship of their respective source systems. The hardest part of any mapping effort is determining link validity or equivalence between data elements in multiple systems. 2. Correct and transform data. In order to reduce the complexity of the target system, the source system should be responsible for data repair and ensure the data is correct.

[Activity 4] Load the data warehouse.

The most work-intensive part is data preparation and preprocessing.

When deciding on a data loading method, key factors to consider are latency requirements required by data warehouses and data marts, source availability, batch windows or upload intervals, target databases, and time frame consistency, and data quality processing must also be addressed , time to perform transformations, late arriving dimensions, and data rejection issues. Another factor revolves around the change data capture process to detect data changes in the source systems, integrate those changes, and adjust the changes over time.

【Activity 5】Implement business intelligence product portfolio.

1. Group users as needed. 2. Match tools to user requirements.

[Activity 6] Maintain data products.

1. Release management. Release management is critical to the incremental development process.

2. Manage the data product development life cycle.

3. Monitor and tune the loading process. Data warehouses also need to be archived .

df872cd70a2dcd8bdce231c5e8adf714.png

For the convenience of understanding, organize the mind map of this part as follows:

d06ea361007c1edc3c6399a4edaf0ca3.png

4. Tools, Methods, Implementation Guidelines

tool

1. Metadata repository.

A. Data Dictionary and Terminology. The data dictionary is a necessary component to support the use of the data warehouse. The dictionary describes data in business terms, and the content of the data dictionary comes from the logical data model.

B. Lineage of data and data models.

The purpose of recording data blood relationship:

1) Investigate the root cause of the data issue.

2) Conduct an impact analysis of system changes or data issues.

3) Determine the reliability of the data according to the source of the data.

2. Data integration tools.

Used to load the data warehouse.

Also consider when choosing a tool:

1. Process audit, control, restart and scheduling.

2. The ability to selectively extract data elements at execution time and pass them on to downstream systems for auditing.

3. Control which operations can or cannot be performed, and restart which failed or aborted processes.

3. Types of BI tools.

1) Operation report.

Operational reporting: Business users generate reports directly from transactional systems, applications, or data warehouses. Data retrieval and reporting tools, sometimes called ad-hoc query tools , allow users to write the reports they need or create reports for others to use. The requirements in business operations reports are often different from those in business inquiry reports. Production reporting crosses the data warehouse/BI boundary, often querying transactional systems directly, producing action items such as invoices or bank statements. Traditional business intelligence tools can well display some data visualization methods such as tables, pie charts, line charts, area charts, bar charts, histograms, and candlestick charts.

2) Business performance management BPM. Designed to optimize the execution of business strategies. Performance measurement and positive feedback loops are key elements. Performance measurement and positive feedback loops are key elements.

3) Descriptive self-help analysis. Provide for the front desk to guide operational decision-making.

Online Analytical Processing OLAP : A method that provides fast performance for multidimensional analytical queries.

A common operation is slicing. Cut into pieces. Drill down/up. Convolute upwards. perspective .

Three classic OLAP implementation methods are as follows: relational online analytical processing ROLAP . Multidimensional Matrix Online Analytical Processing MOLAP . Hybrid On-Line Analytical Processing HOLAP .

method

method :

1. Prototypes that drive requirements. Use the method of demand mining to quickly determine the priority of needs.

2. Self-service business intelligence. Self-service is the fundamental delivery method for BI.

3. Audit data that can be queried. All processes should store audit information and enable fine-grained tracking and reporting.

Implementation Guide

1. Readiness assessment, risk assessment

A data warehouse needs to implement the following:

(1) Clarify data sensitivity and security constraints.

(2) Select tool

(3) Ensure resource security

(4) Create an extraction process to evaluate and receive source data.

2. Version roadmap

Data warehouses are built incrementally.

Whatever implementation method you choose, whether it's waterfall, iterative, or agile, you should take into account the desired end state.

A roadmap is a valuable planning tool.

3. Configuration management

Configuration management is aligned with the roadmap and provides necessary background tweaks and scripts.

4. Organizational and cultural change

Maintaining a consistent business focus is key to project success . Knowing a company's value chain is a great way to understand the business environment.

Aligning projects with real business needs and assessing necessary business support, the keys to success are:

1) Business Initiatives . Is there appropriate management support?

2) Business objectives and scope . Are there exact business needs, business goals, and scope of work?

3) Business resources . Are there experts? How engaged?

4) Business readiness . Is the business partnership ready for this to be a long-term incremental delivery project? What is the average knowledge level or skills gap within the target organization?

5) Consistent vision . How well does the IT strategy support the business vision?

For the convenience of understanding, organize the mind map of this part as follows:

c46aa5c76a27c553c8ef3cefad61f579.png

5. Data Warehouse and Business Intelligence Governance

1. Business acceptance

1) Conceptual data model. Group core information? Key business concepts? How are they related to each other?

2) Data quality feedback loop. How to identify and fix problem data? How to understand how the problem arises? How to be responsible for solving the problem? What is the process for remediating problems arising from the data integration process of the data warehouse?

3) End-to-end metadata . How does the architecture support integrated end-to-end metadata flow? Do you understand the meaning of the context? How do data consumers answer basic questions like "what does this report mean" or "what does this metric mean"?

4) End-to-end verifiable data lineage. Can items publicly accessed by business users be traced back to source systems in an automated, self-maintainable manner? Is all data documented?

2. Customer and user satisfaction

3. Service Level Agreement

4. Report strategy

Report strategy to solve: 1) security access. Ensure only authorized users have access to sensitive data. 2) Describe the access mechanism for users to interact, report, inspect or view their data. 3) The type of user community and the appropriate tools to use it. 4) Report summary, details, exceptions and nature of frequency, timing, distribution and storage format. 5) Unleash the potential of the visualization function through graphical output. 6) The trade-off between timeliness and performance.

5. Metrics

1. Use indicators. Including the number of registered users, connected users or concurrent users. 2. Subject domain coverage. Measure how well each department accesses the warehouse 3. Response time and performance metrics. The follow-up to the metrics is validation and service level adjustments.

For the convenience of understanding, organize the mind map of this part as follows:

28473dac2342c2eedb4bb3783f562e20.png

To be continued~

    I also organized a CDMP self-study exchange group here, only for students who want to learn data governance and students who intend to take the CDMP certification exam .

    (Because more than 200 people cannot enter directly, if you need to enter, please add my WeChat invitation to enter and  note CDMP )

I'm Dugufeng, if you like my article, I hope you can forward it, like it, watch it and support me, see you in the next article!

Recommendation of Popular Articles on Big Data Flow

    From a port coal worker to a state-owned enterprise big data leader: How did the once Internet-addicted teenager do it?

    Big Data Data Governance | WeChat Exchange Group~

    5000 words explain how to get started with data governance (with international data governance certification exam-CDMP study group)

    What exactly is CDMP - a super-comprehensive introduction to the international certification of data governance

    Open Source Data Quality Solutions - Apache Griffin Getting Started

    One-stop Metadata Governance Platform - Datahub Getting Started

    Pre-research on data quality management tools - Griffin VS Deequ VS Great expectations VS Qualitis

    Thousand-character long text - Datahub offline installation manual

    Metadata Management Platform Datahub2022 Annual Review

Big data flow: big data, real-time computing, data governance, and data visualization practice self-media. Regularly publish data governance and metadata management implementation technology practice articles, and share relevant technologies and materials for data governance implementation implementation.

Provide learning exchange groups such as big data introduction, data governance, Superset, Atlas, Datahub, etc.

Big data flows, and the learning of big data technology will never stop.

Long press, identify the QR code, follow me!

Guess you like

Origin blog.csdn.net/xiangwang2206/article/details/129457792