A database artifact that is easier to use than SQLite. .

ChatGPT has been booming for 160 days, and the world is no longer what it was before.

A new artificial intelligence Chinese website https://ai.weoknow.com
will be updated every day with available domestic chatGPT resources.


Many small and micro applications also need some data processing and computing power. It would be too heavy to integrate a database. In this case, SQLite is a good choice. It has a simple structure, is easy to integrate, can store data persistently, and provides SQL implements computing power.

However, SQLite still has shortcomings for some more complex scenarios.

SQLite’s shortcomings in complex scenarios

Data source support

SQLite is like a database, which can provide better support for its own library files, but applications sometimes need to process other forms of data, such as text files, Excel, other databases, Restful and other data on the Web. SQLite only supports reading csv files and does not support other data sources unless hard-coded. Moreover, although SQLite supports csv files, the use process is very cumbersome. You need to use the command line to create the database first, then use the create command to create the table structure, then use the import command to import data, and finally use SQL to query the data.

In addition to conventional structured data, modern applications often encounter data in complex formats such as Json and XML. SQLite has the ability to calculate Json strings, but it does not support direct reading of multi-layer data sources, including Json files/RESTful. It needs to write codes hard, or use third-party libraries to spell insert statements into data tables. The code is very cumbersome. SQLite cannot calculate XML strings, let alone read XML files/WebServices.

Applications sometimes need to write data to files in a common format for output, transfer, or exchange, and sometimes actively write data to other data sources. However, SQLite can only persist data to its own library files and cannot directly write to external data sources, including basic csv files.

Complex calculations

SQLite uses SQL statements for calculations, and the advantages and disadvantages of SQL will be inherited. SQL is close to natural language, has a low learning threshold, and is easy to implement simple calculations, but it is not good at complex calculations, often making the code cumbersome and difficult to understand.

Even some less complex calculations are not easy to implement in SQL. For example, calculate the three orders with the largest sales for each customer:

select * from (select *, row_number() over (partition by Client order by Amount desc) as row_number from Orders) where row_number<=3

In this example, to calculate the first N records in a group, it is necessary to use a window function to generate a pseudo-column of serial numbers in the group, and then filter the pseudo-column, so the code becomes complicated.

For more complex calculations, the SQL code becomes more lengthy and difficult to understand. For example, the maximum number of consecutive rising days for a certain stock:

select max(continuousdays)from (    select count(*) continuousdays    from (        select sum(risingflag) over (order by day) norisingdays        from (           select day, case when price>lag(price) over (order by day) then 0 else 1 end risingflag            from tbl        )    ) group by norisingdays)

It is difficult to directly express the concept of continuous rise in SQL. It can only be implemented in disguised form by using a different method, that is, calculating the number of consecutive days of rise by accumulating the number of days without a rise. This method is highly technical, difficult to write, and difficult to understand. Moreover, SQL is difficult to debug, making maintenance difficult.

Let's look at another example: Find the top n customers who account for half of the sales, and sort them by sales from large to small.

with A as    (select client,amount,row_number() over (order by amount) ranknumber    from sales)select client,amountfrom (select client,amount,sum(amount) over (order by ranknumber) acc     from A)where acc>(select sum(amount)/2 from sales)order by amount des

It is difficult for SQL to handle customers who are just about to cross the line. It can only be implemented in disguised form in another way, that is, to calculate the cumulative sales value from small to large, and in turn find the customers whose cumulative value is not in the bottom half. This approach is tricky, verbose, and difficult to debug.

In addition, SQLite's date and string functions are not rich enough, such as the lack of quarterly increase and decrease, working day calculation, etc. These shortcomings limit SQLite and are not suitable for scenarios with more complex computing requirements.

process processing

SQL itself lacks process processing capabilities. The database will use stored procedures to implement complete business logic. However, SQLite does not support stored procedures, so it cannot directly implement complete business logic. It can only use the capabilities of the main application to convert SQL data objects into applications. The data objects in the program (such as Java's resultSet/List<EntityBean> and other objects), then use the for/if and other statements of the main program to process the process, and finally transfer back to the SQL data objects, the code is very cumbersome. Complex business logic requires multiple conversions between SQL objects and objects of the main application, which is even more troublesome and will not be shown here.

esProc SPL solves SQLite difficulties

If you want to provide data processing and computing power for Java micro-applications, there is a better choice: esProc SPL.

esProc SPL is an open source data processing engine with a simple architecture, easy integration, persistent data storage, and sufficient computing power. These features are similar to SQLite.

SPL has a simple architecture. There is no need to configure services, nodes, or clusters. You only need to introduce the SPL Jar package and it can be deployed in a Java environment.

SPL provides a JDBC interface , which can be easily integrated into Java applications, and simple queries are similar to SQL.

GitHub:https://github.com/SPLWare/esProc

Class.forName("com.esproc.jdbc.InternalDriver");Connection conn =DriverManager.getConnection("jdbc:esproc:local://");Statement statement = conn.createStatement();ResultSet result = statement.executeQuery("=T(\"D:/Orders.csv\").select(Amount>1000 && like(Client,\"*s*\"))");

SPL supports data persistence and can save data in its own data format (set file), such as adding records in batches:

A
1 =create(OrderID,Client,SellerID,Amount,OrderDate)
2

=A1.record([201,"HDR",9,2100.0,date("2021-01-01"),
202,"IBM",9,1900,date("2021-01-02"),

203,"APPLE",4,1900,date("2021-01-03")])

3 =file("d:/Orders.btx").export@ab(A2)

The above A3 code export@ab, @a means append, @b means set file format

In addition to direct persistence, you can also first process the sequence table in memory (SPL's structured data object, which can be compared to an SQL result set), and then overwrite the sequence table and write it to the set file. The specific method is to change export@ab to export @b. The performance of this method is not as good as SQLite, but the data volume of small and micro applications is generally not large, and the overwriting speed is usually acceptable.

Group table is another proprietary data format of SPL, which supports high-performance batch addition, deletion and modification, and is suitable for high-performance computing of large data volumes (this is not the focus of this article).

In addition to its own format, SPL can also save data to csv files, just change A3 to:

file("d:/Orders.csv").export@tc(A2)

SPL has enough computing power to support various SQL-style calculations, including post-group calculations (window functions):

A B
1 =Orders.new(Client,Amount) //Select some fields
2 =Orders.select(Amount>1000 && like(Client,\"*s*\")) // Fuzzy query
3 = Orders.sort(Client,-Amount) // sort
4 = Orders.id(Client) // Remove duplicates
5 =Orders.groups(year(OrderDate):y,Client;sum(Amount):amt).select(amt>3000) //Group summary
6 =[Orders.select(Amount>3000),A1.select(year(OrderDate)==2009)].union() // Union
7 =Orders.groups(year(OrderDate):y,Client;sum(Amount):amt).select(like(Client,\"*s*\")) // subquery
8 =A5.derive(amt/amt[-1]-1: rate) // cross row

SPL provides basic SQL syntax , such as group summary:

$select year(OrderDate) y,month(OrderDate) m, sum(Amount) s,count(1) c from {Orders} Where Amount>=? and Amount<? ;arg1,arg2

In addition to these basic capabilities, SPL can also overcome various shortcomings of SQLite, fully support various data sources, have stronger computing power, facilitate process processing, and can face more complex application scenarios.

Data source support

SPL reads csv files in just one step , embed the following SPL code in Java: T("d:/Orders.csv").select(Amount>2000 && Amount<=3000)

Function T can not only read set files, but also read csv files and generate sequence tables. When SPL imports data, the data type is automatically parsed and does not need to be specified manually. The whole process requires no extra coding and is much more convenient than SQLite.

If the csv format is not standardized, you can also use the import function to specify the delimiter, field type, skip row number, and handle escape characters, quotation marks, brackets, etc., which is much richer than the functions provided by SQLite.

SPL has a variety of built-in data source interfaces , including tsv, xls, Json, XML, RESTful, WebService, and other databases, and even supports special data sources such as Elasticsearch and MongoDB.

These data sources can be used directly, which is very convenient. For other data sources not listed, SPL also provides interface specifications. As long as they are output as SPL structured data objects according to the specifications, subsequent calculations can be performed.

SPL can directly parse multi-layer data sources . Read and calculate Json files:

json(file("d:/xml/emp_orders.json").read()).select(Amount>2000 && Amount<=3000)json(httpfile("http://127.0.0.1:6868/api/orders").read()).select(Amount>2000 && Amount<=3000)

XML file:

A
1 =file("d:/xml/emp_orders.xml").read()
2 =xml(A1,"xml/row")
3 =A2.select(Amount>1000 && Amount<=2000 && like@c(Client,"*business*"))

WebService:

A
1 =ws_client("http://127.0.0.1:6868/ws/RQWebService.asmx?wsdl")
2 =ws_call(A1,"RQWebService":"RQWebServiceSoap":"getEmp_orders")
3 =A2.select(Amount>1000 && Amount<=2000 && like@c(Client,"*business*"))

SPL sequence table supports multi-layer structured data, which is easier to express Json/XML than the two-dimensional structure of SQL library table, and the calculation code is also simpler. This part is not the focus of this article and will be skipped.

Cross-source computing

SPL has good openness and can directly calculate a variety of data sources. These data sources can be used for cross-source calculations with SPL set files. For example, perform intra-associative grouping and summarization on set files and csv:

join(T("d:/Orders.btx"):o,SellerId; T("d:/Emp.csv"):e,EId).groups(e.Dept;sum(o.Amont))

Cross-source calculations can also be easily performed between external data sources. For example, csv and RESTful left association:

join@1(json(httpfile("http://127.0.0.1:6868/api/orders").read()):o,SellerId; T("d:/Emp.csv"):e,EId)

It’s easier to read in a multi-step format:

A
1 =Orders=json(httpfile("http://127.0.0.1:6868/api/orders").read())
2 =Employees=T("d:/Emp.csv")
3 =join@1(Orders:o,SellerId;Employees:e,EId)

Cross-source calculations can be implemented using only the SPL language without resorting to Java or the command line. The code is short and easy to understand, and the development efficiency is much higher than SQL.

Persistence from any data source

In addition to supporting the persistence of its own data formats, SPL also supports other data sources, also through sequence tables. for example:

file("d:/Orders.csv").export@t(A2)          //csv文件file("d:/Orders.xlsx").xlsexport@t(A2)      //xls文件file("d:/Orders.json").write(json(A2))      //json文件

In particular, SPL supports persistence in any database, taking Oracle as an example:

A B
1 =connect("orcl") / Connect to external oracle
2 =T=A1.query("select * from salesR where SellerID=?",10) / Batch query, sequence T
3 =NT=T.derive() / Copy the new sequence NT
4 =NT.field("SELLERID",9) / Batch modify new sequence
5 =A1.update(NT:T,sales;ORDERID) / Sustainability

The persistence of the database uses the table sequence as the medium, and its advantages are quite obvious: the function update can automatically compare the table sequence before and after modification (addition, modification, deletion), and can easily realize the persistence of batch data.

Calculate ability

SPL supports ordered calculations, set calculations, step-by-step calculations, and associative calculations, which can simplify complex structured data calculations.

A simple example is to calculate the three orders with the largest sales for each customer:

Orders.group(Client).(~.top(3;Amount))

The SPL code is very intuitive. It is grouped by Client first, and then TopN is calculated for each group (ie symbol ~). The reason why the code of SPL is simple is that on the surface there is no top function in SQL and SPL provides it directly. In essence, it is because SPL has a real row number field, or in other words, SPL supports ordered sets. The SPL code is simple, and because the collection is more thorough, it can realize real grouping, that is, only grouping without summarization, which can intuitively calculate the data in the group.

For more complex calculations, SPL is not difficult to implement . Maximum number of consecutive rising days:

A
1 =tbl.sort(day)
2 =t=0,A1.max(t=if(price>price[-1],t+1,0))

SPL 容易表达连续上涨的概念,先按日期排序;再遍历记录,发现上涨则计数器加 1。这里既用到了循环函数 max,也用到了有序集合,代码中 [-1] 表示上一条,是相对位置的表示方法,price [-1] 表示上一个交易日的股价,比整体移行(lag 函数)更直观。

再看个例子,求销售额占到一半的前 n 个客户:

A B
2 =sales.sort(amount:-1) / 销售额逆序排序,可在 SQL 中完成
3 =A2.cumulate(amount) / 计算累计序列
4 =A3.m(-1)/2 / 最后的累计即总额
5 =A3.pselect(~>=A4) / 超过一半的位置
6 =A2(to(A5)) / 按位置取值

SPL 集合化成更彻底,可以用变量方便地表达集合,并在下一步用变量引用集合继续计算,因此特别适合多步骤计算。将大问题分解为多个小步骤,可以方便地实现复杂的计算目标,代码不仅简短,而且易于理解。此外,多步骤计算天然支持调试,无形中提高了开发效率。

上面例子使用了有序计算、集合计算、分步计算,SPL 从简单到复杂的计算都可以很好的完成。此外,SPL 还支持游离记录,可以用点号直观地引用关联表,从而简化复杂的关联计算。

SPL 还提供了更丰富的日期和字符串函数,在数量和功能上远远超过传统数据库。

值得一提的是,为了进一步提高开发效率,SPL 还创造了独特的函数语法。

流程处理

SPL 本身提供了流程控制语句,配合内置的序表对象,可以方便地实现完整的业务逻辑。

分支结构:

A B
2
3 if T.AMOUNT>10000 =T.BONUS=T.AMOUNT*0.05
4 else if T.AMOUNT>=5000 && T.AMOUNT<10000 =T.BONUS=T.AMOUNT*0.03
5 else if T.AMOUNT>=2000 && T.AMOUNT<5000 =T.BONUS=T.AMOUNT*0.02

循环结构:

A B
1 =db=connect("db")
2 =T=db.query@x("select * from sales where SellerID=? order by OrderDate",9)
3 for T =A3.BONUS=A3.BONUS+A3.AMOUNT*0.01
4 =A3.CLIENT=CONCAT(LEFT(A3.CLIENT,4), "co.,ltd.")
5

In addition to the above code, SPL also has more process processing functions for structured data, which can further improve development efficiency. For example, each round of looping takes a batch instead of one record; when the value of a certain field changes, it loops once.

The above business logic can be saved as a script file, placed outside the application, and called in the form of a stored procedure:

Class.forName("com.esproc.jdbc.InternalDriver");Connection conn =DriverManager.getConnection("jdbc:esproc:local://");CallableStatement statement = conn.prepareCall("{call queryOrders()}");statement.execute();

SPL is an interpreted code. After modification, it can be run directly without compiling or restarting the application, which can effectively reduce maintenance costs. The external SPL script can not only effectively reduce system coupling, but also has the characteristics of hot switching. SQLite does not support stored procedures, so business logic cannot be externalized to the main application. The coupling is high and the application structure is poor.

SPL is obviously better than SQLite under Java, but it is a bit more troublesome for non-Java applications. It can only use independent ODBC services or HTTP services. The architecture is not light enough and the integration is also reduced. It should be noted that android belongs to the Java system, and SPL can run normally, but iOS currently does not have a relatively mature JVM environment, so SPL cannot support it.

GitHub:https://github.com/SPLWare/esProc


 

ChatGPT has been booming for 160 days, and the world is no longer what it was before.

A new artificial intelligence Chinese website https://ai.weoknow.com
will be updated every day with available domestic chatGPT resources.

Guess you like

Origin blog.csdn.net/zyqytsoft/article/details/131226049