What is big data testing? What are the steps to implement big data testing?

In the past two years, the Internet industry has been chanting the slogan of "big data" at every turn. The birth of big data has allowed many companies to save manpower and material resources to achieve precise marketing and obtain huge profits. With the continuous advancement of data engineering and data analysis technology, big data testing is inevitable.

Big data is a term used for large amounts of structured or unstructured data, which may provide some information. When talking about big data, there is no way to tell the specific amount of data, but it is usually on the order of Petabytes and Exabytes. Such a large amount of data is difficult to integrate. Big data, lively and fast-moving data, help to better understand customers and products, thereby driving business growth. Although there are many technologies available, it is still difficult for technicians to figure out where to start.

Big data test

Testing big data applications is more about verifying its data processing, rather than testing individual functions of software products. When it comes to big data testing, performance and functional testing are key. In big data testing, QA engineers use clusters and other components to verify the successful processing of terabytes of data. Because the processing is very fast, it requires a high level of testing skills.

The testing of big data applications is more about verifying its data processing rather than verifying its single functional characteristics. Of course, when testing big data, functional testing and performance testing are equally critical. For big data test engineers, how to efficiently and correctly verify at least one terabyte of data successfully processed by big data tools/framework will be a huge challenge. Because of the efficient processing and testing speed of big data, it requires testing software engineers to have high-level testing techniques to deal with big data testing.

Three characteristics of big data processing: 1) large batches, 2) real-time, and 3) interactive. In addition, data quality is also an important dimension of big data testing.

Therefore, before application testing, data quality must be ensured, and data quality must be considered as part of database testing. Tests involving various characteristics of data, such as consistency, accuracy, repeatability, continuity, validity and completeness, etc.

Big data application testing can be roughly divided into three steps:

Insert picture description here

Step 1: Data phase verification

The first step of big data testing, also known as the pre-hadoop stage, includes the following verifications:

1) Data resources from all aspects should be verified to ensure that the correct data is loaded into the system.

2) Compare the source data with the data pushed to the Hadoop system to ensure they match.

3) Verify that the correct data is extracted and loaded into the correct location in HDFS.

At this stage, tools Talend or Datameer can be used for data stage verification.

Step 2: "MapReduce" verification

The second step of the big data test is the verification of MapReduce. At this stage, the tester verifies the business logic on each node, and then verifies them after running multiple nodes to ensure the correctness of the following operations:

1) The Map and Reduce processes work normally.

2) Implement data aggregation or isolation rules on the data.

3) Generate key-value pairs.

4) Verify the data after executing the Map and Reduce processes.

Step 3: Output stage verification

The final or third stage of big data testing is the output verification process. Generate output data files and move the files to an EDW (Enterprise Data Warehouse: Enterprise Data Warehouse) or move the files to any other demand-based system. Activities in the third phase include:

1) Check that the Transformation rules are applied correctly.

2) Check the data integrity and successfully load the data into the target system.

3) Check that there is no data corruption by comparing the target data with the HDFS file system data.

to sum up:

1) Big data testing is different from traditional testing, not only in types and strategies, but also in specific technologies such as tools.

2) Due to the complexity of big data, the challenges faced by its testing will be different from traditional testing.

3) Big data performance testing will be one of the goals that software test engineers will struggle to overcome.

Guess you like

Origin blog.csdn.net/newdreamIT/article/details/100139950