Testing or data warehouse ETL entry test

Outline

Before we learn ETL test, first understand business intelligence (ie BI) and data warehousing.

What is BI?

BI (Business Intelligence) namely business intelligence, it is a complete solution for the enterprise in the existing data (raw data or business data or business data, etc.) for effective integration, providing fast and accurate reports and proposed decision-making basis to help companies make informed business decisions.

The original data recorded its daily affairs, such as interaction with the customer information, financial information, employee records, and so on.

These data can be used for reporting, analysis, mining, data quality, interactive, predictive analysis, etc.

What is a data warehouse

Data warehouse for query and analysis rather than for transaction processing and database design.

The data warehouse is built up through the integration of various heterogeneous data sources.

There is a data warehouse allows businesses or organizations to integrate, analyze data and transaction processing work separate.

Data can be converted for integration into a higher quality of information at different levels to meet the needs of enterprise customers.

What is ETL

ETL is Extract-Transform-Load Abbreviation (Extraction - Conversion - loading), is a complete extract data from the source system, the conversion process is loaded into the data warehouse.

We extracted from online transaction data in the database, the conversion process, the matching data warehouse schema, and then loaded into the data warehouse database.

Under normal circumstances, most data warehouse to integrate data from non-online transaction database systems, such as the source text files, logs, spreadsheets, and so on.

Here we take a look at how the ETL work.

For example a company, its different departments of the relevant data records, sales, marketing, logistics and so on. Customer information is handled by each department independently, and data storage is relatively different, if the sales team have stored the customer's name, storage and logistics team is the user's ID.

Now we want to check the customer's historical data, and want to know what he / she purchased in different marketing activities for different products yes. This will be a very boring job.

The solution is to use the data warehouse to store the unified structure of the application through ETL treated different sources of data.

ETL is possible to convert different configuration / type of dataset is uniform structure, BI for subsequent use and analysis tool to generate meaningful in statement.

Here we look at the complete ETL process flow diagram:

ETLProcess.png

  1. Extract
    the extracted valid data

  2. Transform

  • The extracted data into a data warehouse schema / format

  • Construction of keys: one key is a unique identifier example one or more data attributes, key type may be a primary key (primary key), the foreign key (foreign key), replace key (alternate key), the composite key (Composite key) and a proxy key (surrogate key). These key allow only data warehouse maintenance and management, and do not allow any other entity for distribution.

  • Data cleansing: After extracting the data is good, then go to the next node: data cleaning. The extracted data errors are identified and repaired. Conflict resolution are not compatible between different sets of data, so that data consistency, so that the data can be used to set the target data warehouse. Typically, the processing conversion system, we can create some meta data (meta data) to solve the problem of the source data and improve data quality.

  1. Load

  • The converted data is loaded into the data warehouse

  • Construction of aggregation: Create gather data and store aggregated data to the table to improve the end-user query experience.

What is ETL Testing

ETL test is to ensure that the data from the source to the destination after the completion of the conversion business is accurate.

It also relates to verification of data, i.e., verification data from a data source to a destination various stages.

ETL is an Extract-Transform-Load acronym.

ETL testing process

And other similar testing process, ETL also need to experience different test stages. Process is as follows:

ETLTestingProcess.png

ETL testing process can be divided into the following five stages:

  1. Analysis of requirements, business and data source

  2. retrieve data

  3. Dimensional modeling and business logic

  4. Construction and padding data

  5. Generate reports

ETL Testing Type

  1. Production Validation Testing
    This type of testing is the data migration ETL to the production system for the. To ensure the normal operation of the production operations, the production system data must be sorted in the correct order. In this type of ETL testing should pay attention to the implantation of automated testing and management capabilities from the data level.

  2. Source to Target Testing (Validation Testing)
    whether the data tuple conversion major test of this type of conversion is expected to meet the target

  3. Application Upgrades (update test)
    the test type ETL is automatically generated, can save a lot of development time test. The main application check whether old or store the extracted data is identical with the data for the new application or a new repository.

  4. Metadata testing (test metadata)
    the metadata test includes checking data type, data length, and index / constraint checking.

  5. Data Completeness Testing (test data integrity)
    when all the desired data from the source to the load when the destination, even if the data integrity test is completed. In the data integrity during the test, we can also be counted between the simple conversion or no conversion of the source and the target, and polymerization of the actual test data comparison and verification.

  6. Data Accuracy Testing (test data accuracy)
    of the type of test data to verify the correct loading and completion of the conversion target as expected.

  7. Data Transformation Testing (data conversion test)
    to test data transfer is a complex process, not simply a source of writing SQL queries and compared with a target to achieve. May need to run multiple SQL queries for each row to verify that the conversion rules

  8. Data Quality Testing (test data quality)
    data quality test comprises grammar and benchmarks. To avoid errors in the course of business due date or a unique number (eg order number) caused by the data quality tests.

  • Grammar Test: According invalid characters, character mode, incorrect capitalization, such as the order issued by dirty test results

  • Benchmark: data model based on the inspection data, such as customer ID data quality test, comprising: a digital inspection, inspection date, inspection precision, inspection data, and the like check zero

  1. Incremental ETL Testing (Test incremental ETL)
    The main types of tests to verify the integrity of the old data and the new data and add new data. Incremental Test in incremental ETL process, insert and update meets the expected requirements.

  2. The GUI / Navigation Testing
    the UI Big Data reporting this type of test is mainly to check the generated \ navigation is normal

How to create ETL test case

ETL is a test concept can be applied to the field of information management in the different tools and databases.

The purpose of the test is to ensure that the ETL business after the conversion has finished loading the data from the source to the destination is correct.

The same test ETL authentication data further involves various stages during the conversion between the source and destination.

When testing in ETL, ELT two documents are used in real-time testers:

  1. ETL mapping table: ETL a mapping table contains all the information source and destination tables, each column including constraints and reference lists. ETL testers need more graceful SQL query, because it may need to write large queries with multiple connections at various stages of ETL testing to verify the data. ETL mapping tables provide a lot of useful information when writing queries as data validation.

  2. Source, destination database mode: This mode should be easy to verify that all the details of the mapping table.

ETL test scenarios and test cases

No. testing scenarios Test Case
1 Mapping Doc Validation (validation map file) Verify ETL mapping file provides information in response, and each map update documents log record
2 Validatioin (verification) 1. In accordance with the corresponding mapping file to verify the source and destination of the data warehouse table structure
2. Verify the type of source and destination data consistent with
a length of 3 to verify the source and destination data coincides
4. authentication data field types and formats are specified type
5 data type length verify the source of not less than the length of the target data type
6. verify the name of the column of the data table for the mapping table
3 Constraint validation Verify that the target table constraints to meet our expectations design
4 Data consistency problems 1. 要防止语义定义相同,但特定属性的数据类型和长度不一致的问题
2. 防止完整性约束滥用
5 完整性问题 1. 要确保所有期望的数据都已经完整的加载到目标表中
2. 要比较源和目标数据的个数(即确保计数上的完整)3. 检查出现的任何不合格的记录
4. 检查目标表列中的数据没出现被截断的情况
5. 对边界值进行分析检查
6. 要检查比较目标数据仓库和源数据的关键字段的唯一性
6 正确性问题 1. 数据要没有拼写错误或不准确的记录
2. 无null、非惟一或超出范围的数据记录存在
7 转换 验证转换逻辑的正确性
8 数据质量 1. 数值型验证,验证是否为数值类型
2. 日期型验证,验证是否为日期格式,并且在所有日期类型数据的格式应该统一
3. 精度验证,小数点的精度要满足期望的精度
4. 数据检查:检查数据的正确性,完整性
5. null检查
9 拷贝验证 1. 验证目标表中业务要求所有惟一性指标均正确的实现(例如主键、惟一标识的键、或其他任一惟一表示的列)
2. 验证从源数据多列合并而成的数据是正确的
3. 验证仅仅根据客户要求对源数据进行了多列合并至目标表中
10 日期验证 日期是ETL开发过程中常用的数据,主要用于:
1. 了解数据行创建的日期
2. 用于识别活动记录
3. 根据业务需求透视表确定活动记录
4. 便于基于时间插入、更新记录
11 数据完整性验证 在验证源和目标表中的数据集的完整性时,我们需要用到交集运算,以确定目标数据的完整性
12 数据清理 对于不需要的列在载入至数据仓库前应该进行删除

ETL的bug类型

序号 bug类型 描述说明
1 用户接口bug 1. 主要涉及应用的GUI
2.字体、样式、颜色、对齐、拼写错误、导航等等
2 边界值bug 数据的边界值范围
3 等价类划分bug 有效和无效类
4 输出/输出bug 1.未接受的有效值
2. 无效的值被接受
5 计算类bug 1. 数学计算错误
2. 最终输出错误
6 载入条件bug 1. 不运行多用户操作
2. 不运行用户载入期望的数据
7 崩溃bug 1. 系统宕机或挂起
2. 系统无法运行在用户的平台上
8 版本控制bug 1. 无匹配标识
2. 没有可用的版本信息
3. 一般版本控制bug发生在回归测试时
9 硬件问题 一般发生在应用程序不兼容设备
10 文档错误bug 错误的帮助文档信息

ETL测试与数据库测试的不同

序号 ETL测试 数据库测试
1 验证数据是否按照预期进行了移动 主要验证数据是否遵循了设计预定的数据模式规则或标准
2 验证数据经过业务转换后是否满足预定的转换逻辑以及验证源和目标数据计算是否一致 主要表的主、外键等越苏是否正常
3 验证ETL过程数据表的主外键关系是否保存 验证没有冗余表,数据库最佳化
4 验证已载入的数据拷贝是否满足预期 验证需要的是否缺少数据

ETL测试工程师的主要责任

对于一个ETL测试工程师而言,其关键的责任有三大类:

  • 源数据分析(数据库、文本等类型数据分析)

  • 业务转换逻辑实现

  • 将经过转换的数据载入至目标表

其他有:

  • 掌握ETL测试软件

  • ETL数据仓库测试组件

  • 在后端执行数据驱动测试

  • 创建、设计、执行测试用例、计划等

  • 标识问题、提供问题解决方案

  • 梳理业务需求和设计测试策略

  • 写SQL或数据库操作代码完成实现各种测试场景

等等其他工作内容。。。

Guess you like

Origin www.cnblogs.com/xiaowenshu/p/10972207.html