[Database] Invalid Data: Handling of Invalid Data in Software Testing

Table of contents

1. Common scenarios of invalid data

(1) Testing phase 

(2) Test method

2. The concept of invalid data 

3. The impact of invalid data

4. Identification of Invalid Data

5. Processing method of invalid data

(1) Reject invalid data

① The concept of rejecting invalid data 

② Method of rejecting invalid data

(2) Filter invalid data

① The concept of filtering invalid data 

② Method of filtering invalid data

(3) Convert invalid data

① The concept of converting invalid data 

② Method of converting invalid data 

(4) Data verification 

① The concept of data verification 

② Data verification method 

(5) Data cleaning 

① The concept of data cleaning

② Data cleaning method

③ Tools for data cleaning 

④ Challenges of data cleaning

⑤ Precautions for data cleaning

6. Methods to reduce invalid data

7. Examples of actual cases

(1) Medical industry

(2) E-commerce industry

(3) Audio and video industry



1. Common scenarios of invalid data

(1) Testing phase 

During testing, invalid data may appear during the following test phases: 

  • R & D self-test
  • smoke test
  • function test
  • interface test
  • concurrency test
  • pressure test
  • penetration testing

(2) Test method

During testing, invalid data may appear for the following test methods:

  1. Null value test : Test the input is empty or undefined data, such as empty string, empty array, null, etc.

  2. Boundary test : Test data on boundary values, such as testing minimum and maximum values, positive and negative numbers, integers and decimals, etc.

  3. Data format test : test whether the format of the data meets the requirements, such as testing date format, time format, currency format, phone number format, etc.

  4. Data type test : Test whether the data type meets the requirements, such as testing integers, decimals, Boolean values, strings, objects, etc.

  5. Data range test : test whether the data is within the valid range, such as whether the age is between 0 and 120, whether the amount is between 0 and 1,000,000, etc.

  6. Illegal input test : Test whether the input meets the requirements, such as testing whether the input is empty, whether it is an illegal character, whether it is invalid data, etc.

  7. Invalid character test : Test whether the data contains invalid characters, such as testing whether the input contains special characters, illegal characters, etc.

  8. Invalid length test : Test whether the data length meets the requirements, such as testing whether the length of the string meets the requirements, testing whether the length of the array meets the requirements, etc.

  9. Invalid data test : In the test, the tester may intentionally input invalid data, but if the tester does not correctly identify the invalid data, the test data may be invalid. For example, the age of the personal information is 25 years old, but the birthday is last year, and the demand statistics Age and birthday data need to be used at the same time to complete the analysis. At the same time, personal information is not forcibly associated. Invalid data generated by the production environment test will affect the actual analysis.

  10. Random testing : Randomly generated test data may contain invalid data because it has not been validated.

  11. Environmental testing : If the tester does not consider the impact of the test environment, it may lead to invalid test data

  12. Repeated testing : In repeated testing, if the tester does not properly handle the previous test data, it may lead to invalid test data



2. The concept of invalid data 

  1. In the testing process, invalid data refers to data that does not meet the expected input requirements or is illegal.
  2. Invalid data could be an invalid format, invalid type, invalid range, or invalid association.
  3. Invalid data needs to be fully considered and tested during the testing process to ensure that the system can handle these data correctly and will not behave abnormally.
  4. At the same time, testers need to cover as many invalid data situations as possible in test cases to improve test coverage and test quality.


3. The impact of invalid data

Reasons why invalid data affects the system:

  1. Insufficient system fault tolerance : When the system receives invalid data, if the system does not have sufficient fault tolerance mechanism to process the data, it will cause the system to crash or generate errors, thus affecting the normal operation of the system.

  2. Data processing errors : Invalid data may cause abnormal data processing by the system. For example, invalid data may cause system calculation errors or logic errors, thereby affecting the correctness and reliability of the system.

  3. Security issues : Invalid data may be used by malicious attackers to carry out attacks, such as SQL injection attacks, cross-site scripting attacks, etc., resulting in threats to system security.

  4. Database exceptions : Invalid data may cause database exceptions, such as too long data, wrong data types, etc., which will affect the data storage and query functions of the system.

  5. User experience issues : Invalid data may cause system exceptions or error prompts, thereby affecting user experience and reducing user satisfaction.

To sum up, the impact of invalid data on the system is very serious. Testers need to fully consider these factors during the test process to ensure that the system can correctly process various types of data and improve system stability and reliability.



4. Identification of Invalid Data

Here's how to identify invalid data during testing:

  1. Verify that the data is in the correct format and type : During testing, we should check that the data is in the expected format and type. For example, if we are testing an email application, we can check whether the email address contains the "@" symbol.

  2. Check for error messages : During testing, we should check that the system handles invalid data correctly and generates appropriate error messages. For example, if we're testing a login application, we can try using invalid usernames and passwords to test the system's reaction and check that the system generates error messages correctly.

  3. Use Boundary Value Testing : Boundary value testing is a testing method that uses minimum and maximum values ​​to test the behavior of the system. For example, if we were testing a calculator application, we could use boundary value testing to determine whether the system was handling minimum and maximum values ​​correctly.

  4. Testing with Random Data : Random data testing is a testing method that uses randomly generated data to test the behavior of the system. For example, if we are testing a search engine application, we can use random search terms to test the system's search functionality.

  5. Use black box testing : use black box testing method to test, test by inputting various data, find invalid data and record it. For example, various data such as numbers, letters, and special characters can be input for testing, and invalid data is found and recorded for subsequent analysis and repair.

  6. Use human testing : During testing, we should use human testing to determine whether the system handles invalid data correctly. For example, if we are testing a shopping cart application, we can try adding invalid items and check how the system reacts.

  7. Use requirements analysis and testing : analyze according to requirements documents and functional specifications, determine the range and type of valid data, and then define invalid data as data that is not within this range and type. For example, if a system requires that the input age must be between 18 and 60 years old, then entering data younger than 18 or older than 60 can be considered invalid.

  8. Use scenario analysis test : judge according to the characteristics of input data and business logic, and determine the characteristics of invalid data. For example, if a system requires that the input phone number must be 11 digits, then the input of non-numeric characters or data whose length is not 11 digits can be considered as invalid data.
  9. Use automated testing : Use automated testing tools for data verification and filtering to exclude invalid data. For example, regular expressions or data verification tools can be used to check the input data, and if the data does not meet the requirements, it can be automatically filtered out.

In general, the identification of invalid data needs to be analyzed and judged in combination with specific business requirements and testing methods. Only by testing with multiple methods can the invalid data be accurately identified.



5. Processing method of invalid data

(1) Reject invalid data

① The concept of rejecting invalid data 

Rejection of invalid data : refers to the method by which testers refuse to use or accept invalid data for testing during the testing process. It is possible to verify that the input data meets the requirements by writing test cases. If the input data does not meet the requirements, the data can be rejected and a corresponding prompt message will be given.

② Method of rejecting invalid data

  1. Confirm test requirements and specifications : Before testing begins, testers should have a clear understanding of test requirements and specifications so that they can identify invalid data and reject them for testing.

  2. Set data validation rules : Testers can set data validation rules to ensure that only data that conforms to the rules can be used for testing. For example, you can set rules that input data must conform to a certain format or length.

  3. Use mock data : Testers can use mock data instead of invalid data for testing. Simulation data is data generated according to test requirements and specifications, which can ensure the accuracy and reliability of the test.

  4. Mandatory data input : Testers can use mandatory data input to ensure that only data that conforms to the specification can be used for testing. For example, during testing, testers can be prohibited from manually inputting data, and can only select data from specific data sources for testing.


(2) Filter invalid data

① The concept of filtering invalid data 

Filtering invalid data : refers to the method of filtering or eliminating data that does not meet the test purpose or cannot produce valid test results through certain screening methods during the test process, thereby improving test efficiency and accuracy. Filters can be used to filter out invalid data. The filter can judge whether the data is valid according to the preset rules, and filter out the invalid data.

② Method of filtering invalid data

  1. Data cleaning : Preprocess the test data to remove invalid data such as duplicate data, empty data, abnormal data, etc., to ensure the accuracy and integrity of the test data.

  2. Data sampling : Randomly or selectively sample the test data to avoid the test data set being too large or too small, thus affecting the accuracy of the test results.

  3. Data analysis : Statistically analyze the test data, screen out the data that has an important impact on the test results, give priority to testing, and improve test efficiency.

  4. Data filtering : According to test requirements and test purposes, test data is screened to remove unnecessary data and reduce test time and cost.


(3) Convert invalid data

① The concept of converting invalid data 

Converting invalid data : refers to a method of converting invalid data into valid data for testing during the testing process. For example, for data with incorrect date format, it can be converted to the correct date format.

② Method of converting invalid data 

  1. Data conversion : Convert invalid data to valid data to make it conform to expected input requirements or business logic. For example, replace illegal characters with legal characters, truncate super long strings, convert illegal date formats to legal date formats, etc.

  2. Data cleaning : Clean the data before testing to remove invalid data. For example, remove spaces, remove duplicates, remove invalid characters, etc.

  3. Data generation : Generate data that meets expected input requirements or business logic for testing. For example, generate random numbers, generate numbers within a specified range, generate data that conforms to business logic, etc.

  4. Data validation : Validate the converted data to ensure that it meets the expected input requirements or business logic. For example, verify whether the date format is correct, verify whether the string length meets the requirements, verify whether the data conforms to business logic, and so on.


(4) Data verification 

① The concept of data verification 

The data verification method refers to the method for verifying the input data during the test process. The data verification method can be verified according to data type, data range, data format and other aspects. The purpose of the data verification method is to ensure the correctness and legality of the input data to avoid errors or abnormalities in the system. 

② Data verification method 

  • Data type verification : Verify the data type of the input data to ensure that the type of the input data is consistent with the type required by the system.

  • Data range verification : Verify the range of input data to ensure that the range of input data is within the range required by the system.

  • Data format verification : verify the format of the input data to ensure that the format of the input data conforms to the format required by the system.

  • Data integrity verification : verify the integrity of the input data to ensure the integrity and consistency of the input data.

  • Data legality verification : Verify the legality of the input data to ensure that the input data meets the legality requirements of the system.


(5) Data cleaning 

① The concept of data cleaning

Data cleaning refers to the preprocessing of data during data processing to eliminate noise, errors, and inconsistencies in the data, making the data cleaner and more useful. 

② Data cleaning method

  • Filling of missing values : For missing data, the missing values ​​are filled by methods such as interpolation.
  • Outlier processing : For abnormal data, outliers are processed by methods such as deletion or replacement.
  • Duplicate value processing : For duplicate data, it is processed by deleting duplicate values ​​and other methods.

③ Tools for data cleaning 

  • OpenRefine : An open source data cleaning tool that can filter, transform, merge and other operations on data.
  • Trifacta : A commercial data cleaning tool that can automatically identify data types and provide an interactive data cleaning interface.

④ Challenges of data cleaning

  • Data Quality Issues : The number one issue with data cleaning is data quality. Data quality issues may include missing data, duplicate data, invalid data, formatting errors, etc. These problems may affect the accuracy and reliability of data analysis, so it takes a lot of time and effort to clean and repair the data.

  • Data volume problem : The amount of data generated by modern software systems is very large, possibly involving millions of pieces of data. Therefore, data cleaning needs to deal with a large amount of data. This requires the use of efficient algorithms and techniques to process data to ensure the speed and accuracy of data cleaning.

  • Data provenance issues : Another challenge in data cleaning is the data provenance issue. Data may come from different systems and platforms, and may be stored in different formats and structures. Therefore, data cleaning needs to consider different data sources and formats, and take corresponding measures to process the data.

  • Data protection issues : Data cleaning involves a large amount of sensitive data, such as personally identifiable information, financial information, etc. Therefore, data cleaning needs to take corresponding measures to protect the security and privacy of data, such as data encryption, access control, etc.

  • Data visualization problem : After data cleaning, the data needs to be visualized for analysis and decision-making. Therefore, data cleaning needs to consider how to visualize data and provide easy-to-use interfaces and tools to help users perform data analysis.

⑤ Precautions for data cleaning

  • Keep the original data : When cleaning the data, care should be taken to keep the original data for comparison and verification when needed.
  • Quality of data sources : The first step in data cleaning is to ensure the quality of data sources. Data sources should be reliable, accurate, complete, and free of duplicate data.

  • The goal of data cleaning : Before starting data cleaning, the goal of data cleaning needs to be clarified. This will help determine what data needs to be cleaned, and how to clean it.

  • Data cleaning process : Data cleaning should be a systematic process, including data collection, data preprocessing, data transformation and data validation. This process should be documented so that it can be reused in future data cleaning.

  • Tools for Data Cleansing : Using the proper tools can make data cleaning more efficient and accurate. For example, tools such as Excel, Python, and R can be used for data cleaning.

  • Rules for data cleaning : Before data cleaning, you need to define the rules for data cleaning. These rules should include data type, data format, data range, etc.

  • Validation of data cleaning : After data cleaning, data validation is required to ensure the accuracy and completeness of the data. You can use data visualization, data statistics, etc. to verify the data.

  • Documentation of data cleaning : The results of data cleaning should be documented so that they can be reused in future data cleaning. These records should include data cleaning goals, processes, rules, tools, and validation results.



6. Methods to reduce invalid data

  1. Determining test objectives and requirements: In the testing process, it is first necessary to clarify the test objectives and requirements, which can avoid wasting time and energy on unnecessary tests during the testing process, which will help determine the type and scope of test data, thereby Reduce the amount of invalid data.

  2. Select appropriate test cases : Test cases should be selected according to requirements and goals, avoiding repeated use cases and unnecessary use cases, so as to reduce the amount of invalid data.

  3. Use real data : Testing with real data can help testers better simulate real-world situations, thereby reducing the amount of invalid data.

  4. Use a random data generator : During the test, a large amount of test data can be generated by using a random data generator, which can simulate various situations, thereby testing the system more comprehensively, avoiding mistakes and duplication of manual input or copy-pasting data , to reduce the amount of invalid data.

  5. Use data filters : During testing, you can use data filters to filter out invalid data. This improves testing efficiency by focusing attention on the really meaningful data.

  6. Deduplication : During testing, a large amount of duplicate data may appear. Not only is this data a waste of time and resources, but it can lead to inaccurate test results. Therefore, testers should delete duplicate data to reduce the amount of invalid data and avoid using too much redundant data, which can reduce test time and test data storage space.

  7. Regular cleaning of invalid data : During testing, invalid data should be cleaned regularly to avoid confusion of test data and excessive use of storage space.



7. Examples of actual cases

(1) Medical industry

In the healthcare industry, the handling and successful management of invalid data is critical to ensuring patient safety and treatment outcomes. Here's a real-world example of how to deal with invalid data and successfully manage it to improve the quality of healthcare.

Case: A hospital uses an electronic medical record system, but finds that there are some invalid medical data in the system, such as duplicate records, wrong data entry, etc. These invalid data may lead doctors to make wrong diagnosis and treatment decisions, thereby affecting the treatment effect and safety of patients.

Resolution: To resolve this issue, the testing team took the following actions:

  1. Data cleaning: The test team cleaned up the data in the system, deleted duplicate records and wrong data input, and ensured that the data in the system was accurate.

  2. Data verification: The test team verified the data in the system to ensure that each data complies with the standards and norms of the medical industry and is consistent with the actual situation of patients.

  3. Data management: The test team established a data management system to ensure that all medical data are properly recorded and managed, including data storage, backup and recovery, etc.

Results: Through the above measures, the hospital successfully dealt with invalid data and successfully managed medical data, thereby improving the quality and efficiency of medical care. Physicians can diagnose and treat patients more accurately, and patients can receive treatment more safely and effectively.


(2) E-commerce industry

Case: In the e-commerce industry, a real case is about a situation where a user enters an invalid address. On an e-commerce platform, the user entered an invalid address when placing an order, resulting in the order being unable to be successfully submitted. The platform's testing team discovered the problem during testing and came up with a solution.

Solution: The solution proposed by the testing team is to verify the validity of the address through regular expressions when the user enters the address. If the address is invalid, the user is prompted to re-enter it. In addition, the test team also suggested adding an address verification module to the background system to automatically verify whether the address entered by the user is valid.

Result: Through the efforts of the test team, the e-commerce platform successfully solved the problem of invalid addresses. When the user places an order, the system will automatically verify the validity of the address. If the address is invalid, the user will be prompted to re-enter. This can avoid the problem that the order cannot be submitted due to the user entering an invalid address, and improves the user experience of the platform. At the same time, the address verification module of the background system can also automatically verify the address entered by the user, ensuring the accuracy and integrity of the data.

  1. During testing of an e-commerce site, we discovered that some users were using invalid zip codes during the order submission process. Not only are these zip codes not recognized by the system, but the order cannot be submitted successfully. To solve this problem, we decided to validate the zip code before the user submits the order. We wrote a zip code validator that checks if the zip code entered by the user is valid. If the zip code is invalid, the user will be prompted to re-enter it. This solution not only improves the success rate of order submission, but also improves the user experience.

  2. The "shopping cart" function of the e-commerce platform did not successfully manage the user's multiple orders, resulting in user confusion and increased return behavior. During the test, we found that the system could not correctly combine multiple orders into one order for management, accounting and calculation, which caused users to purchase some duplicate products or enter some wrong products. We came up with a solution where the system can group orders in the shopping cart, count, price and prevent users from adding duplicate orders, etc.

  3. The shopping site ignored the processing of invalid data entry, resulting in the failure of the order transaction. In the test, we simulated the input of some invalid data (such as wrong address, wrong phone number, etc.), and then found that the system did not return wrong results or support users in a timely manner, which would affect users' trust and consumption of the website. Therefore, we propose an improvement scheme that can effectively handle invalid data under different input conditions, and return a corresponding error message to prompt the user to modify it.


(3) Audio and video industry

Case: In the audio and video industry, a real case is about the testing of video streams. During the test, the testers found some invalid data, which caused the quality of the video stream to degrade. After analysis, testers found that these invalid data were caused by network delays and bandwidth limitations. In order to solve this problem, the testers adopted some technical measures, such as data compression and network optimization, to ensure the quality of the video stream.

Solution: In order to effectively deal with invalid data, testers need to adopt some technical means and tools. For example, they can use data analysis tools to analyze the data and determine which data is invalid. They can also use network optimization tools to optimize the network to ensure the stability and reliability of data transmission. Additionally, they can use data compression techniques to reduce the bandwidth consumption of data transfers.

Results: By effectively handling invalid data, testers can ensure the accuracy and reliability of test results. In the audiovisual industry, this means that testers can ensure the quality and stability of the video stream. This will help improve user satisfaction and trust, thereby promoting business development and growth.

Guess you like

Origin blog.csdn.net/qq_39720249/article/details/130892703