What is Big Data Testing? What types are there? How should I measure it?

As the number of big data applications or application big data technology scenarios in various countries in the world is increasing exponentially, correspondingly, the knowledge required for testing big data applications and the demand for big data test engineers are also increasing simultaneously.

The related 大数据测试technology has gradually become a general technology that current software testers need to understand and master.

1. What is big data?

Traditional ones 关系型数据库(e.g. MySQL, Oracle, SQL Server) are good at working with structured data that can be stored in rows and columns. However, if we have unstructured data that does not follow the structure, then choosing a relational database is not the right approach.

For big data, the large amount of data we have may be stored in any type of format, such as images, audio, etc., and the structure and format of each data record may be different. Compared with traditional databases, big data has the characteristics of high capacity, high speed and diversity, and traditional databases will also have difficulty handling these problems.

    • Capacity: Big data applications collect a large amount of data, which may be generated from different sources, such as smart sensors, industrial instrument readings, financial business transactions, etc.;
    • Speed: The data of big data applications are created at high speed, so the processing speed must be fast. Similar Internet of Things devices, smart meter devices, etc. can automatically generate data at an unprecedented speed;
    • Variety: Data can come in a variety of formats. It can be numbers, text, audio, video, satellite imagery, weather data, etc.

2. Types of big data testing

Testing a big data application is more about validating it 数据处理than testing individual features of a software product. Performance and functional testing are key when it comes to big data testing.

In big data testing, QA engineers can process data in three types: batch, real-time, and interactive.

At the same time, data quality is also an important factor in big data testing. It involves checking various fields like accuracy, duplication, consistency, validity, data integrity etc.

现在我也找了很多测试的朋友,做了一个分享技术的交流群,共享了很多我们收集的技术文档和视频教程。
如果你不想再体验自学时找不到资源,没人解答问题,坚持几天便放弃的感受
可以加入我们一起交流。而且还有很多在自动化,性能,安全,测试开发等等方面有一定建树的技术大牛
分享他们的经验,还会分享很多直播讲座和技术沙龙
可以免费学习!划重点!开源的!!!
qq群号:110685036

3. Job Requirements for Big Data Test Engineer

  • Proficient in database-related knowledge, familiar with large-scale distributed databases, proficient in SQL query and optimization;
  • Have certain experience in data analysis, data warehouse, and big data testing, and have high sensitivity to data;
  • Familiar with the Linux system and common commands, with the ability to maintain and deploy the environment;
  • Shell, Python script programming ability and practical experience;
  • Cheerful and optimistic personality, strong sense of responsibility, proactive work, good communication skills and teamwork skills;
  • Have strong logical thinking and problem-solving ability, and be able to actively carry out technical research and study;
  • Experience in using common big data technologies or components such as Hadoop, Spark/Flink, Hive is preferred;
  • Those with testing experience in BI, data big screen, data warehouse, recommendation system, AI algorithm, etc. are preferred.

4. How should big data testing be done?

In general, big data test engineers need to be familiar with data warehouse specifications and data testing processes.

(1) Familiar with data warehouse specifications

1. Data Quality Specifications

Data quality specification is the key to ensure the data quality of data warehouse. In the project, we formulated a series of data quality specifications, including data cleaning, data verification, data standardization, data deduplication, and data processing. We use ETL tools and custom scripts to clean and process data, and combine data quality specifications to ensure data accuracy and consistency.

2. Data Model Specification

Data model specification is the foundation of data warehouse construction. In the project, we used dimensional modeling and star schema to design the data model, and standardized the design of the data model, including field naming, data type, primary key, foreign key, index, partition, etc. We followed some data modeling best practices, such as avoiding ambiguous abbreviations, following naming conventions, ensuring uniqueness of primary keys, etc.

3. Data Security Regulations

Data security specification is the key to ensure the data security of the data warehouse. In the project, we adopted multi-level security measures to protect data security, including data encryption, user rights management, data backup and recovery, and data auditing. We use some security technologies and tools, such as SSL encryption, data desensitization, access control list (ACL), etc.

4. Data visualization specification

Data visualization specifications are key to presenting data warehouse data to end users. In the project, we used a variety of data visualization tools and techniques to present data, such as reports, dashboards, charts, etc. At the same time, we also follow some data visualization specifications, including data display methods, color matching, font size, data labels, trend analysis, etc. We help end users better understand and utilize data by designing a concise, understandable, and easy-to-use data visualization interface.

5. Data backup and recovery specifications

Data backup and recovery specification is the key to ensure the reliability and availability of data warehouse data. In the project, we formulated a series of data backup and recovery specifications, including backup strategy, backup frequency, backup storage location, recovery test, etc. We use a variety of backup technologies and tools, such as full backup, incremental backup, cold backup, hot backup, etc., and also verify the reliability of backup and the accuracy of recovery through regular recovery tests.

6. Data standardization specification

Data standardization specification is the key to ensure data consistency and maintainability of data warehouse. In the project, we formulated a series of data standardization specifications, including data dictionary, metadata management, data vocabulary, data encoding, data format, etc. We use data dictionaries and metadata management tools to manage data, uniformly define data specifications and data vocabularies, and ensure data consistency and maintainability.

(2) Data testing process

7. Data preparation phase oracle mysql

Data preparation is a very important step in the data testing process. In the project, we usually collect data from multiple data sources, and clean, transform, process and integrate the data to meet business needs. In the data preparation stage, we need to formulate data collection plans, data cleaning specifications, data conversion specifications, data integration specifications, etc., and use ETL tools and custom scripts to achieve data preparation.

8. Data verification phase kettle etl datax

After the data preparation is complete, we need to validate the data. In projects, we usually use data validation tools and custom scripts to implement data validation, including data integrity, data accuracy, data consistency, and data repeatability. We will formulate data verification plans and test cases, and verify the data one by one to ensure that the data meets business requirements and data quality specifications.

9. Web display in the data analysis stage

After the data verification is complete, we need to analyze the data. In projects, we usually use data analysis tools and custom scripts to implement data analysis, including data exploration, data mining, data visualization, etc. We will develop data analysis plans and test cases, and analyze the data one by one to find trends, anomalies and regularities in the data.

10. Data reporting phase

After the data analysis is complete, we need to report on the data. In projects, we usually use reporting tools and custom scripts to implement data reporting, including report design, report generation, and report distribution. We will develop report plans and test cases, and verify the reports one by one to ensure the accuracy and readability of the reports.

11. Data maintenance phase kettle etl datax

In the data testing process, data maintenance is also a very important step. In the project, we need to maintain the data regularly, including data backup, data recovery, data update, data cleaning, etc. We will formulate data maintenance plans and test cases, and verify the maintenance process one by one to ensure the reliability and availability of data.

5. The complete process of big data testing

Follow: requirements research + analysis -> test strategy planning -> test case writing -> test execution -> online verification test -> test summary process.

1. Demand research + analysis

If the test is not clear about the background and status quo of the requirements, it is impossible to really do a good job of testing and ensure product quality.

The more thorough the requirements analysis, the smoother the follow-up work. This step requires product, development, and testing to complement and assist each other.

2. Test strategy plan formulation

Through the development technical architecture review meeting, the test meeting will understand the development architecture logic, table structure design, and development schedule, so as to formulate test strategies and methods, test focus, test tool selection, test schedule, risk estimation, etc.

3. Write and review test cases

Test cases need to cover all test scenarios: normal, abnormal, functional logic, interface, performance, etc.

The purpose of the test case: It has always been to better execute the test, to better ensure the high coverage rate and high pass rate when executing the test, and it is definitely not to write the use case for the purpose of writing the use case.

The choice of test tool follows two important criteria:

1) Clearly show the testing ideas and logic.

2) Facilitate quick review and test execution.

At present, the use case management tools we use are matrix+easytest+freemind+excel, and we can choose different tools according to different demand scenarios.

4. Use case execution test

Executing a test consists of two parts:

The first part: iterative version testing twice a week on average, this type of execution testing is supplemented by manual testing as the main tool.

The second part: the regular execution part, which mainly depends on the execution of tools. It is used for web and interface functional testing and performance testing, including tools such as selenium+git+idea, easytest, jmeter, and beyondcompare. Different execution cycles are set up, and regular regression testing of the entire product line is performed to further ensure product functionality. Correctness and availability of logic and interface functions.

5. Online acceptance test

After going online, conduct an online regression test for the online update content as soon as possible, and quickly feed back to the development and product to make a decision; after the online verification is completed, send the online test report to all members of the project according to the actual online results;

6. Test summary

Including: document collation, technical summary, and project overview.

1) Documentation

After the project, summarize and sort out the actual scenarios such as environmental data and business data involved in the project.

2) Technical summary

It is mainly aimed at the tool technology used in the project, the difficulties encountered or new breakthroughs and improvements.

3) Project overview

Including requirements coverage rate, requirement omission change rate, development self-test pass rate, development bug repetition rate, use case coverage rate, problem omission rate, project bug type and quantity statistics, etc.

Summary: The data testing process is an indispensable part of the data warehouse construction, which can ensure the quality, accuracy, consistency and reliability of the data, while improving the usability and comprehensibility of the data. During the testing process, it is necessary to test the data testing process to ensure the implementation and execution effect of the process.

The following are supporting learning materials. For friends who do [software testing], it should be the most comprehensive and complete preparation warehouse. This warehouse also accompanied me through the most difficult journey. I hope it can help you too!

Software testing interview applet

The software test question bank maxed out by millions of people! ! ! Who is who knows! ! ! The most comprehensive quiz mini program on the whole network, you can use your mobile phone to do the quizzes, on the subway or on the bus, roll it up!

The following interview question sections are covered:

1. Basic theory of software testing, 2. web, app, interface function testing, 3. network, 4. database, 5. linux

6. web, app, interface automation, 7. performance testing, 8. programming basics, 9. hr interview questions, 10. open test questions, 11. security testing, 12. computer basics

Information acquisition method:

Guess you like

Origin blog.csdn.net/IT_LanTian/article/details/132319600