The domestic Java structured data open source library SPL is too convenient to use

Modern Java application architectures increasingly emphasize the separation of data storage and processing to achieve better maintainability, scalability, and portability, such as hot microservices. This architecture usually requires business logic to be implemented in Java programs, rather than placing many operations in the database as in traditional application architectures.
insert image description here

Most of the business logic in an application involves structured data processing. The database (SQL) has rich support for such tasks, and business logic can be implemented relatively easily. However, the Java language has always lacked such basic support, making it very cumbersome and inefficient to implement business logic in Java. As a result, although there are various advantages in the architecture, the development efficiency has dropped significantly.

If we also provide a complete set of structured data processing and computing class libraries in Java, then this problem can be solved: that is, we can enjoy the advantages of the architecture without reducing the development efficiency.

insert image description here

What kind of abilities are needed?

What characteristics should an ideal structured data processing class library under Java have? We can summarize from SQL:
insert image description here

1 Aggregate computing power

Structured data often comes in batches (in the form of sets), and in order to easily compute such data, it is necessary to provide sufficient set computing power.

If there is no set operation class library, only the basic data type of array (equivalent to a set), we need to write four or five lines of loop statements to do a simple sum of the members of the set, and operations such as filtering, grouping and aggregation need to write Hundreds of lines of code out.

SQL provides rich set operations, such as aggregation operations such as SUM/COUNT, WHERE for filtering, GROUP for grouping, and also supports basic operations such as intersection, union, and difference for sets. The code written this way will be much shorter.

2 Lambda syntax

Is it enough to have set computing power? If we develop a batch of set operation class libraries for Java, can we achieve the effect of SQL?

It's not that simple!

Take the filtering operation as an example. Filtering usually requires a condition to keep set members that satisfy the condition. In SQL, this condition appears in the form of an expression, such as writing WHERE x>0, which means to keep those members that make x>0 evaluate to true. The expression x>0 is not computed before executing this statement, but is computed for each collection member during traversal. Essentially, this expression is essentially a function, a function that takes the current collection member as an argument. For the WHERE operation, it is equivalent to using a function defined by an expression as the parameter of WHERE.

This way of writing has a term called Lambda syntax, or functional language.

If there is no Lambda syntax, we will often define functions temporarily, the code will be very cumbersome, and name collisions are prone to occur.

The Lambda syntax is widely used in SQL, not only in filtering and grouping operations, but also in unnecessary scenarios such as computed columns, which greatly simplifies the code.

3 Directly referencing fields in Lambda syntax

Structured data is not simply a single value, but a record with fields.

We found that when referring to a record field in an expression parameter of SQL, the field name can be directly used in most cases without specifying the record to which the field belongs.

Although the new version of Java also supports Lambda syntax, it can only pass the current record as a parameter to the function defined by Lambda syntax, and then write the calculation formula with this record. For example, when calculating the amount by unit price and quantity, if the parameter name used to represent the current member is x, it needs to be written in the verbose form of "x. unit price * x. quantity". In SQL, it can be written as "unit price * quantity" more intuitively.

4 Dynamic data structures

SQL also supports dynamic data structures well.

In structured data computing, the return value is often structured data, and the resulting data structure is related to the operation, so there is no way to prepare it before coding. Therefore, it is necessary to support dynamic data structure capabilities.

Any SELECT statement in SQL will generate a new data structure, and fields can be added or deleted at will in the code without having to define the structure (class) in advance. Languages ​​such as Java do not. The structures (classes) used must be defined in the code compilation stage. In principle, new structures cannot be dynamically generated during the execution process.

5 Interpreted languages

From the analysis of the previous articles, we can already conclude that Java itself is not suitable for use as a language for structured data processing. Its Lambda mechanism does not support feature 3, and as a compiled language, feature 4 cannot be implemented.

In fact, the Lambda syntax mentioned above is not very suitable for the implementation of compiled languages. The compiler cannot determine whether the expression written to the parameter position should calculate the value of the expression on the spot and then pass it, or compile the entire expression into a function and pass it, and need to design more syntax symbols to distinguish it. Interpreted languages ​​do not have this problem. Whether the expression as a parameter is calculated first or when it traverses the members of the set can be determined by the function itself.

SQL is indeed an interpreted language.

Introduce SPL

Stream is a structured data processing class library launched by Java8 in an official capacity, but it does not meet the above requirements. It has no professional structured data types, lacks many important structured data calculation functions, is not an interpreted language, does not support dynamic data types, and has a complex interface to Lambda syntax.

Kotlin is a part of the Java ecosystem. It has been slightly improved on the basis of Stream and also provides structured data calculation types. However, due to insufficient structured data calculation functions, it is not an interpreted language, and does not support dynamic data types. Lambda syntax The interface is complex, and it is still not an ideal structured data computing class library.

Scala provides a wealth of structured data computing functions, but the characteristics of compiled languages ​​make it not an ideal structured data computing class library.

So, what else can be used in the Java ecosystem?

esProc SPL.

SPL is a programming language interpreted and executed by Java. It has rich structured data computing class library, simple Lambda syntax and easy-to-use dynamic data structure. It is an ideal structured processing class library under Java.

insert image description here

Rich set operation functions

SPL provides a specialized structured data type, the ordinal list. Like the SQL data table, the sequence table is a collection of batch records and has the general functions of a structured data type. The following examples illustrate.

Parse the source data and generate a sequence table:

Orders=T("d:/Orders.csv")

Generate a new sequence table from the original sequence table by column name:

Orders.new(OrderID, Amount, OrderDate)

Calculated column:

Orders.new(OrderID, Amount, year(OrderDate))

Field rename:

Orders.new(OrderID:ID, SellerId, year(OrderDate):y)

Use fields by sequence number:

Orders.groups(year(_5),_2; sum(_4))

Sequence table renamed (left association)

join@1(Orders:o,SellerId ; Employees:e,EId).groups(e.Dept; sum(o.Amount))

The sequence table supports all structured calculation functions, and the calculation result is also a sequence table, not a data type such as Map. For example, for the results of grouping and summarization, continue to perform structured data processing:

Orders.groups(year(OrderDate):y; sum(Amount):m).new(y:OrderYear, m*0.2:discount)

Based on the sequence table, SPL provides a wealth of structured data calculation functions, such as filtering, sorting, grouping, deduplication, renaming, calculated columns, associations, subqueries, set calculations, ordered calculations, etc. These functions have powerful computing power, and can complete the calculation independently without hardcoding assistance:

Combined query:

Orders.select(Amount>1000 && Amount<=3000 && like(Client,"*bro*"))

Sort:

Orders.sort(-Client,Amount)

Group summary:

Orders.groups(year(OrderDate),Client; sum(Amount))

Inner association:

join(Orders:o,SellerId ; Employees:e,EId).groups(e.Dept; sum(o.Amount))

Concise Lambda syntax

SPL supports simple Lambda syntax, no need to define function name and function body, you can directly use expressions as function parameters, such as filtering:

Orders.select(Amount>1000)

When modifying the business logic, there is no need to refactor the function, just simply modify the expression:

Orders.select(Amount>1000 && Amount<2000)

SPL is an interpreted language. When using parameter expressions, it is not necessary to explicitly define the parameter types, which makes the Lambda interface simpler. For example, to calculate the sum of squares, if you want to calculate the square in the process of sum, you can write it intuitively:

Orders.sum(Amount*Amount)

Similar to SQL, SPL syntax also supports the direct use of field names in single-table calculations:

Orders.sort(-Client, Amount)

dynamic data structure

SPL is an interpreted language, which naturally supports dynamic data structures, and can dynamically generate new sequence tables according to the calculation result structure. It is especially suitable for calculations such as calculated columns, grouping summaries, and associations, such as directly recalculating the results of grouping summaries:

Orders.groups(Client;sum(Amount):amt).select(amt>1000 && like(Client,"*S*"))

Or directly recalculate the result of the associated calculation:

join(Orders:o,SellerId ; Employees:e,Eid).groups(e.Dept; sum(o.Amount))

More complex calculations are usually split into multiple steps, and the data structure of each intermediate result is almost different. SPL supports dynamic data structures without first defining the structure of these intermediate results. For example, according to the customer payment collection record table in a certain year, calculate the customers whose monthly payment amount is in the top 10:

Sales2021.group(month(sellDate)).(~.groups(Client;sum(Amount):sumValue)).(~.sort(-sumValue)) .(~.select(#<=10)).(~.(Client)).isect()

Execute SQL directly

An SQL interpreter is also implemented in SPL, which can directly execute SQL, from basic WHERE, GROUP to JOIN, and even WITH:

$select * from d:/Orders.csv where (OrderDate<date('2020-01-01') and Amount<=100)or (OrderDate>=date('2020-12-31') and Amount>100)
$select year(OrderDate),Client ,sum(Amount),count(1) from d:/Orders.csv
group by year(OrderDate),Client
having sum(Amount)<=100
$select o.OrderId,o.Client,e.Name e.Dept from d:/Orders.csv o
join d:/Employees.csv e on o.SellerId=e.Eid
$with t as (select Client ,sum(amount) s from d:/Orders.csv group by Client)
select t.Client, t.s, ct.Name, ct.address from t
left join ClientTable ct on t.Client=ct.Client

More language advantages

As a professional structured data processing language, SPL not only covers all the computing capabilities of SQL, but also has more powerful advantages in terms of language:

Discreteness and its more radical aggregation

Aggregation is a basic feature of SQL, that is, it supports data to participate in operations in the form of collections. However, the discreteness of SQL is very bad. All set members must participate in the operation as a whole and cannot be separated from the set. High-level languages ​​such as Java support good discreteness, and array members can be operated independently.

However, a more complete set needs to be supported by discreteness. Set members can be free from the set and form a new set with other data to participate in the operation.

SPL combines the aggregation of SQL and the discreteness of Java, so that more thorough aggregation can be achieved.

For example, it is easy to express "set of sets" in SPL, which is suitable for calculation after grouping . For example, to find students who are in the top 10 in each subject:

A
1 =T(“score.csv”).group(subject)
2 =A2.(.rank(score).pselect@a(<=10))
3 =A1.(~(A3(#)).(name)).isect()
The fields of the SPL sequence table can store records or record sets, so that the association relationship can be expressed intuitively in the way of object reference , even if there are more relationships, it can also be expressed intuitively. For example, to find male employees subordinate to a female manager based on the employee table:
Employees.select(性别:"男",部门.经理.性别:"女")

Ordered computing is a typical combination of discreteness and aggregation. The order of members is meaningful in the set, which requires aggregation. In orderly computing, each member must be distinguished from adjacent members, which will emphasize discreteness. . SPL is both aggregated and discrete, and naturally supports ordered computing.

Specifically, SPL can refer to members by absolute position. For example, order 3 can be written as Orders(3), and records 1, 3, and 5 can be written as Orders([1,3,5]).

SPL can also refer to members by relative position, for example, to calculate the growth rate of each record relative to the previous record: Orders.derive(amount/amount[-1]-1)

SPL can also use # to represent the serial number of the current record. For example, the employees are divided into two groups according to the serial number, one group with odd serial numbers, and one group with even serial numbers: Employees.group(#%2==1)

More convenient function syntax

A large number of powerful structured data computing functions, which is a good thing, but makes it difficult to distinguish functions of similar functions. Invisibly increase the difficulty of learning.

SPL provides a unique function option syntax, functions with similar functions can share a function name, and only use function options to distinguish the difference. For example, the basic function of the select function is to filter. If only the first record that meets the conditions is filtered out, you only need to use the option @1:

Orders.select@1(Amount>1000)

When the amount of data is large, use parallel computing to improve performance, just change to option @m:

Orders.select@m(Amount>1000)

For sorted data, use dichotomy to quickly filter, use @b:

Orders.select@b(Amount>1000)

Function options can also be combined, for example:

Orders.select@1b(Amount>1000)

The parameters of structured operation functions are often very complex. For example, SQL needs to use various keywords to separate the parameters of a statement into multiple groups, but this will use a lot of keywords and make the statement structure inconsistent.

SPL supports hierarchical parameters . The parameters are divided into three layers from high to low by semicolons, commas, and colons, and the expression of complex parameters is simplified in a general way:

join(Orders:o,SellerId ; Employees:e,EId)

Extended Lambda Syntax

Ordinary Lambda syntax must not only specify the expression (that is, the parameters in the function form), but also completely define the parameters of the expression itself, otherwise the mathematical form is not rigorous enough, which makes the Lambda syntax very cumbersome. For example, use the loop function select to filter the set A, and only keep the members whose values ​​are even. The general form is:

A.select(f(x):{
    
    x%2==0} )

The expression here is x%2==0, the parameter of the expression is x in f(x), x represents the member in the set A, that is, the loop variable.

SPL uses the fixed symbol ~ to represent the loop variable . When the parameter is a loop variable, there is no need to define the parameter again. In SPL, the above Lambda syntax can be shortened to: A.select(~ %2==0)

The common Lambda syntax must define every parameter used in the expression. In addition to the loop variable, the commonly used parameter is the loop count. If the loop count is also defined in the Lambda, the code is more complicated.

SPL uses the fixed symbol # to represent the loop count variable . For example, use the function select to filter the set A, and only keep the members whose serial numbers are even numbers, SPL can be written as: A.select(# %2==0)

The relative position often appears in difficult calculations, and the relative position itself is difficult to calculate. When the relative position is to be used, the parameter writing method will be very cumbersome.

SPL uses a fixed form [number] to represent relative positions :

A B
1 =T(“Orders.txt”) /order list
2 =A1.groups(year(Date):y,month(Date):m; sum(Amount):amt) /Group by year and month
3 =A2.derive(amt/amt[-1]:lrr, amt[-1:1].avg():ma) / Calculate the ratio of the previous period and the moving average

Seamless integration, low coupling, hot switching

As a scripting language interpreted in Java, SPL provides a JDBC driver that can be seamlessly integrated into Java applications.

Simple statements can be executed directly like SQL:

Class.forName("com.esproc.jdbc.InternalDriver");
Connection conn =DriverManager.getConnection("jdbc:esproc:local://");
PrepareStatement st = conn.prepareStatement("=T(\"D:/Orders.txt\").select(Amount>1000 && Amount<=3000 && like(Client,\"*S*\"))");
ResultSet result=st.execute();
...

Complex calculations can be saved as script files and called as stored procedures

Class.forName("com.esproc.jdbc.InternalDriver");
Connection conn =DriverManager.getConnection("jdbc:esproc:local://");
Statement st = connection.();
CallableStatement st = conn.prepareCall("{call splscript1(?, ?)}");
st.setObject(1, 3000);
st.setObject(2, 5000); 
ResultSet result=st.execute();
...

Externalizing the script to the Java program can reduce the coupling of the code on the one hand, and on the other hand, it can also support hot switching by using the characteristics of interpretation and execution. When the business logic changes, as long as the script is modified, it will take effect immediately, unlike when using Java, which often needs to be restarted the entire application. This mechanism is particularly suitable for writing business processing logic in a microservice architecture.

SPL Information

Guess you like

Origin blog.csdn.net/sinat_40770656/article/details/124109509