Application of Rule Engine in Data Analysis

Preface: The rule engine realizes the dynamic management and modification of business rules without affecting the requirements of the software system by separating the business rules from the technical decision-making of developers. The following examples illustrate the application of the rule engine in data analysis based on a series of scenarios such as SQL queries and custom rules.
In modern enterprise-level project development, business decision logic or business rules are often hard-coded and embedded in code everywhere in the system. However, the business rules of the external market may change at any time, so developers must always be ready to modify and update the system, which reduces the efficiency. In this context, the rule engine emerges as the times require, which realizes the dynamic management and modification of business rules without affecting the requirements of the software system by separating the business rules from the developer's technical decision-making. The rule engine has a wide range of application areas, and it is also suitable for data analysis and cleaning.
Suppose we have a table structure as shown below
Field Name Field Type Description
Name Varchar (50) Name
Sex Int Gender (1: Male, 0: Female)
Department Varchar (50) Department
Salary Int Salary
We may need to judge salary (Salary) The number of fields does not exceed 5000, and the data in the table is cleaned and analyzed according to this rule.

In data analysis, data is usually stored in database tables as shown above, and the amount of data is relatively large. It is not possible to import into memory all at once for use by the rules engine. So we will read and import data into memory in batches through the rules engine.
Data analysis through the rule engine will follow the structural steps shown below:

1: Data to be analyzed
2: Data read
3: Data written to memory
4: Rule base
5: Rule engine
6: Analysis results
Working principle
First , read the data in batches from the database to be analyzed, then put the read data into the memory, and then filter and analyze the data in the memory according to the rules. When the data in the memory is analyzed, the memory is cleared. Then read the next batch of data for a new round of analysis until all the data is processed.

Rule Base The business content
for judging wages is represented by the natural language provided by VisualRules to form a rule base, as shown in the following figure:


In the above example, our main job is to continuously judge the wages of employees. If it is greater than 5000, a warning message will be issued, and the data will be extracted and stored in other designated places.


Next, we will use a practical example as an example of a rule engine to illustrate how to use VisualRules to represent business rules
in There is a data table as shown below in the electronic file system of the DMV: PF_Table, which is used to record files The basic information of the picture, we perform data analysis on this table, ignoring the integrity and validity of the data, we only see how much data violates the business rules described below
fNo (index) paNo (page number) Path (storage path) caNo (file) Bano (traffic)
0217233. 1 \ 2008032403 \ 0217233 \ 1.jpg 406 101 2008032403
0217233 2 \ 2008032403 \ 0217233 \ 2.jpg 406 102 2008032403
0217233. 3 \ 2008032403 \ 0217233 \ 3.jpg 406 105 2,008,032,403
0,217,233. 4 \ 2008032403 \ 0217233 \ 4.jpg 406108 2008032401

Business Rules
1: The path consists of three parts: business type, index file, and page number.
Business type: must be consistent with baNo
Indicator file: must be consistent with fNo
Page number: must be consistent with paNo



We can simply summarize the analysis of data by VisualRules into 3 steps
1: The rule engine reads data from the database, and will read Load the data into the memory 2: Take out the data
in the memory for analysis, verification, and processing It reads data from the database through the rule engine and stores the data in the memory. The VisualRules rule engine provides a unique function that allows the rule engine to directly access the database without any other external program code to assist. The process is the same as the traditional coding method, write SQL query statement, then execute the query, and store the result in memory. In the process of analyzing a database, the amount of data must be huge, so in the process of writing SQL statements to read data, we have to make an explanation: it may require a DBA or a professional database operator to complete this and the rules The engine does not matter, the rule engine is only responsible for executing the query and subsequent actions. Here I only use a simple query statement to illustrate the function provided by VisualRules: select top(10) * from PF_Table This means that I only read the first 10 data of the PF_Table data table for processing. Add in the rule engine object library The test.dbs database connection object, and then through the connection, you can directly access the database, write queries, insert, delete, update and other statements After the SQL statement is written, we can execute the statement in the rules













Add a rule to the rule package, and then paste the copied method of executing SQL into the rule. In

this way , when the rule runs, the query will be executed, and the data obtained by the query will be put into the memory, here We have defined the rule object of the memory table, which can intuitively see the data in the memory.


Step 2 : Data analysis and processing
After the data is loaded into the memory, we need to take it out and use the configured rules to analyze and filter
because the path consists of 3 parts Composition, each part has corresponding rules, so we first separate these three parts according to specific characters, and then see whether the first part is consistent with the business type, the second part is consistent with the index file, and the third part is the number of the same. The page numbers are the same. If any of them are inconsistent, then the data is wrong.



From the above rule configuration, the rule is actually an abstraction of business knowledge, and its representation has nothing to do with the specific database. Therefore, the user is in the process of data quality analysis. Rules written in can be reused in other similar business environments. If you need to monitor data on other tables in the future. In this case, the user does not need to rewrite the rules and only needs to use the VisualRules rules previously defined in the data analysis.

Finally, we can see that the data analysis system based on the VisualRules rule engine has the following advantages:

1: The rules can be specified from all applications to a centralized rule base. In this rule base, specialized rule management tools such as TemaServer can be used to manage these rules
2: The rules themselves are dynamic, allowing changes in business rules to be implemented throughout the system without rebuilding system components
3: VisualRules The extensibility allows users to customize new operation functions and cleaning methods 4: Data analysis based on VisualRules
rules can achieve good interactivity

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326720937&siteId=291194637