Chapter 8 Function Writing

Chapter 8 Function Writing

PostgreSQL, like most databases, can combine several SQL statements and process them as a unit, and different parameters can be entered each time it runs. This mechanism has different names in different databases, some are called stored procedures, some are called user-defined functions, and PostgreSQL collectively calls them functions.

8.1 PostgreSQL function analysis

PostgreSQL functions can be divided into four categories: basic functions, aggregate functions, window functions and trigger functions.

8.1.1 Introduction to basic knowledge of functions

The basic structure of the function

CREATE OR REPLACE FUNCTION func_name(arg1 arg1_datatype DEFAULT arg1_default)
RETURNS some type | set of some type | TABLE (..) AS
$$
BODY of function
$$
LANGUAGE language_of_function

Parameters can have default values. The function caller can ignore the parameters with default values, that is, they don't need to input values ​​for them, just use the default values. In the function definition, optional parameters must be arranged after mandatory parameters.

Input parameters support two forms: named parameters and anonymous parameters. The former must give the parameter a name, while the latter does not need. We recommend using named parameters because it can be referenced by parameter names in the function body, which is very convenient and intuitive. If the parameter is anonymous, you can only access it by serial number: $1, $2, and $3.

big_elephant(ear_size numeric, skin_color text DEFAULT 'blue',
name text DEFAULT 'Dumbo')

If named parameters are used, the following calling methods with parameter names can also be used when calling the function. The advantage of this method is that the parameter input order and the order of definition need not be exactly the same:

big_elephant(name => 'Wooly', ear_size => 1.2)

In PostgreSQL 9.5 and higher, the call syntax with parameter names is similar to name =>'Wooly', but in PostgreSQL 9.4 and earlier, the syntax is name :='Wooly'.

LANGUAGE (programming language used)
  indicates the programming language used by this function. Of course, this language must be installed in the database where the current function is located. Execute SELECT lanname FROM pg_language; to find the list of installed languages.
  
VOLATILITY (result stability)
  This tag can tell the query planner whether the result obtained after the function is executed can be cached for the next use. It has the following optional values.

  • IMMUTABLE (result is constant) In
      any case, as long as the function is called with the same input, you will always get the same output. In other words, the internal logic of this function is completely independent of the outside world. The most typical example of this type of function is a mathematical calculation function. Note that the IMMUTABLE function must be used when defining the function index.
  • STABLE (results are relatively stable)
      If you call this function multiple times in the same query statement, you will always get the same output as long as you use the same input for each call. In other words, the internal logic of the function has a constant output in the context of the current SQL.
  • VOLATILE (unstable result) The result of
      each call to this function may be different, even if the same input is used each time. Functions that change data and functions that depend on environment settings such as system time belong to the VOLATILE type. This item is also the default value.
      Please note that the VOLATILITY tag only provides a reminder to the planner, and the planner does not necessarily process it according to this setting. If the function is marked as VOLATILE, then the planner will re-parse and re-execute this function every time it encounters it; if it is marked as another type, the planner may not cache its execution results, because the planner may I think it will be faster to recalculate it again.

STRICT (strict mode)
  For a function in strict mode, if any input is NULL, the planner will not execute the function at all and return NULL directly. If the STRICT mode is not explicitly specified, the function defaults to non-strict mode. When writing functions, be sure to use STRICT with caution, because it may cause the planner to not use indexes after using it.
  
COST (Execution Cost Estimate)
  This is a relative measure of the intensity of computational operations in the marking function. If you are using SQL or PL/pgSQL language, the value is 100; if you are using C language, the value is 1. This value will affect the priority of the planner when executing the function in the WHERE clause, and also affect the possibility of caching the result set for this function. The larger the value, the more time the planner thinks it will take to execute the function.
  
ROWS (estimated number of rows in the returned result set)
  This tag is only useful when the function returns a result set. This value is an estimate of the number of records in the returned result set. The planner will use this value to analyze the function to find the best execution strategy. (It can be used to estimate the statistical value. The statistical estimate is very inaccurate under abnormal conditions. For example, when it is too late to clean up the dead data, the error is large when the original table is frequently operated).
  
SECURITY DEFINER (security control character)
  If the security control character is set, this function will be executed with the authority of the user who created this function; if it is not set, this function will be executed with the authority of the user who called this function. If a user does not have operation authority on a table and needs to operate the table, then the user who created the table can provide a function with the SECURITY DEFINER logo to operate the table . It can be seen that this security control symbol is still very useful when you need to control the access rights of the table.
  
PARALLEL (degree of parallelism)
  This tag is newly introduced in PostgreSQL 9.6. This mark indicates that the planner is allowed to run in parallel mode. By default, the function is set to PARALLEL UNSAFE, which means that any statement that calls the function will not be distributed to multiple worker processes for concurrent execution. The supported options are as follows.

  • SAFE
      This option indicates that the function is allowed to be executed in parallel. If the function is of the IMMUTABLE type, or the function does not update data or does not modify the transaction status or other variable values, it is generally no problem to set it to SAFE.
  • UNSAFE
      If the function will modify non-temporary data, access the sequence number generator, or transaction status, then it should be set to UNSAFE. If UNSAFE functions are executed in parallel mode, it may cause table data to be destroyed or other system states to be destroyed, so parallel execution is not allowed.
  • RESTRICTED
      can use this option for functions that use temporary tables, pre-parsed statements, or client connection status. The statement set to RESTRICTED will not be forbidden to execute in parallel, but it can only run in the lead group in the parallel group, which means that the function itself will not be executed in parallel, but it will not prevent the SQL that calls it The statements are executed in parallel. For PostgreSQL versions before 9.6, remove the PARALLEL mark when executing the example.

8.1.2 Triggers and trigger functions

Any well-functioning database supports the trigger function. With the help of the trigger mechanism, the data change event can be automatically captured and processed accordingly. PostgreSQL supports not only the creation of triggers on tables, but also the creation of triggers on views.

For statement-level triggers, each SQL statement executed will only be triggered once; for record-level triggers, each modification of a record during the execution of the SQL statement will be triggered once.

Set the trigger timing more finely. The system supports three timings: BEFORE, AFTER, and INSTEAD OF. The trigger of the BEFORE class will be triggered before the statement is executed or the record line is modified. You can take this opportunity to cancel the modification or make a pre-backup of the data to be modified. AFTER triggers are triggered after the statement is executed or after the record row is modified. You can take this opportunity to obtain the new modified value. This type of trigger is generally used to record the modification log or perform data replication. The INSTEAD OF trigger will replace the operation content of the original statement. BEFORE and AFTER triggers can only be used for tables, while INSTEAD OF triggers can only be used for views.

You can add the WHEN condition when you define the trigger to limit the trigger to be activated only when the records that meet the filter condition are modified; you can also add the "UPDATE OF + field list" clause to specify that only the specific The trigger is activated only when it is listed.

The trigger function never needs parameters, because the data can be accessed and modified inside the function. The return value of the trigger function is always the trigger type. A trigger function can be shared by multiple triggers. Each trigger has one and only one matching trigger function. If the logic must be dispersed into multiple trigger functions due to business needs, then multiple triggers must be created to call them, and the trigger events of these triggers can be the same or different. If the trigger events are the same, the system will sort the trigger names in lexicographic order, and then trigger them one by one. The latter trigger can see the modified result of the previous trigger. Each trigger is not an independent transaction, so if a rollback operation is performed in a trigger, then the trigger modification performed before this trigger will be rolled back.

8.1.3 Aggregation operation

Aggregate functions are generally implemented based on one or more sub-functions. First of all, there must be at least one state transition function, which will be executed multiple times to aggregate the input multiple rows into a single result. You can also create functions for handling initial and final states, but these two functions are optional.

No matter what programming language you use to write these sub-functions, the syntax for finally integrating them into an aggregate function is the same.

CREATE AGGREGATE my_agg (input data type) (
SFUNC=state function name,
STYPE=state type,
FINALFUNC=final function name,
INITCOND=initial state value, SORTOP=sort_operator
);

SFUNC state switching function (this name is not intuitive enough, the so-called "state" here refers to the intermediate result obtained after each record is processed in the process of aggregation operation) is the logical main body that realizes the aggregation operation. The calculation result generated after the call is used as the input of this calculation, and the current new record to be processed is also input, so that after all the records are accumulated and processed one by one, the "status" based on the entire target record set is obtained. That is the final aggregation result. In some cases, the result of SFUNC processing is the final result required by the aggregate function, but in other cases, the final processing of the result after SFUNC processing is the aggregate result we want. FINALFUNC is responsible for this final processing. The function of the step. FINALFUNC is optional, because its function is to do the final processing of the output of the SFUNC function, so its input must be the output of the SFUNC function. INITCOND is also optional. If this item is set, its value will be used as the initial value of the "state" of the SFUNC function.

The last SORTOP is also optional, and its value is similar to operators such as> or <. Its function is to specify sorting operators for sorting operations such as MAX and MIN. After the SORTOP operator is specified, the planner will use the index to perform aggregation operations such as MAX and MIN. Since the index is ordered, it can quickly locate the head or tail of the index to find the MAX and MIN values. It is necessary to judge the size and value of all records one by one, and the overall calculation speed can be greatly improved. However, there is a prerequisite for the use of the SORTOP operator, that is, on the target table of the aggregation operation, the execution results of the following two statements must be exactly the same.

SELECT agg(col) FROM sometable;
SELECT col FROM sometable ORDER BY col USING sortop LIMIT 1;

In PostgreSQL 9.4, support for aggregate functions for moving windows has been added.
In PostgreSQL 9.6, aggregate functions also began to support parallel execution. Specify whether a function is enabled for parallel by setting the parallel attribute of the function, which can be set to safe, unsafe, and restricted values. If not set, the default is unsafe. In addition to the parallel attribute, several parallel aggregation-related attributes, combinefunc, serialfunc, and deserialfunc have been added.

8.1.4 Trusted and untrusted languages

The functional languages ​​supported by PostgreSQL can be divided into two categories according to the trust level: trusted languages ​​and untrusted languages.

Trusted language
  Trusted language does not have the authority to directly access the underlying file system of the database server, so operating system-level commands cannot be executed directly in this type of language. Users of any permission level can create functions in a trusted language. Languages ​​including SQL, PL/pgSQL, PL/Perl and PL/V8 are all trusted languages.

Untrusted language
  Untrusted language can directly interact with the operating system, through which the functions and web service interfaces provided by the operating system can be directly called. In PostgreSQL, only the super user has the right to write functions in untrusted languages, but the super user has the right to grant the execution authority of functions based on untrusted languages ​​to ordinary users. Generally speaking, the names of untrusted languages ​​will end with U, such as PL/PerlU, PL/PythonU, etc. This is not absolute. For example, PL/R is an exception.

8.2 Use SQL language to write functions

In PostgreSQL, it is fast and simple to transform an existing SQL statement into a function: just add the function header and function tail to the existing SQL. But simple writing also means limited functions. SQL is not a procedural language, so you cannot use the features of procedural languages ​​such as conditional branch judgment, looping, or defining variables. In addition, there is a more serious limitation, that is, SQL statements that are dynamically assembled using function parameters cannot be executed.

Of course, SQL functions also have their advantages. The query planner can go deep into the SQL function and analyze and optimize each SQL statement. This process is called inlining, that is, inlining processing. For functions written in other languages, the planner can only treat them as black boxes. Only SQL functions can be processed inline, which enables SQL functions to make full use of indexes and reduce repeated calculations.

Create a SQL function whose return value is the unique ID of the newly inserted record

CREATE OR REPLACE FUNCTION write_to_log(param_user_name varchar,
param_description text)
RETURNS integer AS
$$
INSERT INTO logs(user_name, description) VALUES($1, $2)
RETURNING log_id;
$$
LANGUAGE 'sql' VOLATILE;

The function call syntax is shown below.

SELECT write_to_log('alex', 'Logged in at 11:59 AM.') As new_id;

Create a SQL function for update operation, return a scalar or not return.

CREATE OR REPLACE FUNCTION
update_logs(log_id int, param_user_name varchar, param_description text)
RETURNS void AS
$$
UPDATE logs SET user_name = $2, description = $3
, log_ts = CURRENT_TIMESTAMP WHERE log_id = $1;
$$
LANGUAGE 'sql' VOLATILE;

Use the following statement to call this function.

SELECT update_logs(12, 'alex', 'Fell back asleep.');

There are three ways to return a result set: the first is the RETURNS TABLE syntax specified in the ANSI SQL standard, the second is to use OUT parameters, and the third is to use compound data types.

Return the result set in the function

CREATE OR REPLACE FUNCTION select_logs_rt(param_user_name varchar)
RETURNS TABLE (log_id int, user_name varchar(50),
description text, log_ts timestamptz) AS
$$
SELECT log_id, user_name, description, log_ts FROM logs WHERE user_name = $1;
$$
LANGUAGE 'sql' STABLE PARALLEL SAFE;

The way to use the OUT parameter is as follows.

CREATE OR REPLACE FUNCTION select_logs_out(param_user_name varchar, OUT log_id int
, OUT user_name varchar, OUT description text, OUT log_ts timestamptz)
RETURNS SETOF record AS
$$
SELECT * FROM logs WHERE user_name = $1;
$$
LANGUAGE 'sql' STABLE PARALLEL SAFE;

The way to use compound data types is as follows.

CREATE OR REPLACE FUNCTION select_logs_so(param_user_name varchar)
RETURNS SETOF logs AS
$$
SELECT * FROM logs WHERE user_name = $1;
$$
LANGUAGE 'sql' STABLE PARALLEL SAFE;

The calling methods of the functions implemented in the above three ways are the same.

SELECT * FROM select_logs_xxx('alex');

8.2.2 Use SQL language to write aggregate functions

The geometric mean refers to the nth root of the continuous product of a positive number. It has a wide range of
applications in the fields of finance, economics, and statistics . When the range of sample numbers varies greatly, geometric average can be used instead of the more common arithmetic average. The geometric mean can be calculated using a more efficient formula: EXP(SUM(LN(x))/n), this formula uses logarithms to convert continuous multiplication operations into continuous addition operations, so the computer performs more efficiently high. In the following example, we will use this formula to calculate the geometric mean.

In order to construct the geometric mean aggregation function, two sub-functions need to be created: a state transition function, which is used to add the results of logarithmic operations. A final processing function for exponentiating the sum of logarithms. In addition, the initial value of the state must be 0.

State switching function for creating geometric mean aggregation function

CREATE OR REPLACE FUNCTION geom_mean_state(prev numeric[2], next numeric)
RETURNS numeric[2] AS
$$
SELECT
CASE
WHEN $2 IS NULL OR $2 = 0 THEN $1
ELSE ARRAY[COALESCE($1[1],0) + ln($2), $1[2] + 1]
END;
$$
LANGUAGE sql IMMUTABLE PARALLEL SAFE;

The state switching function defined here has two input items: the first is the result obtained after the previous call of the state switching function, and its type is a two-element numeric array; the second is the calculation for this round The processed sample value. If the value of the second parameter is NULL or 0, there is no need to calculate in this round, and the value of parameter 1 is directly returned; otherwise, the logarithm of ln of the sample number processed this time is added to the first element of the parameter array, and Add 1 to the second element value of the parameter array. In this way, the final result is the sum of log values ​​of ln containing all sample numbers and the total number of operations.

Create the final processing function of the geometric mean aggregation function

CREATE OR REPLACE FUNCTION geom_mean_final(numeric[2])
RETURNS numeric AS
$$
SELECT CASE WHEN $1[2] > 0 THEN exp($1[1]/$1[2]) ELSE 0 END;
$$
LANGUAGE sql IMMUTABLE PARALLEL SAFE;

Create geometric mean aggregation function based on defined sub-functions

CREATE AGGREGATE geom_mean(numeric) (
SFUNC=geom_mean_state,
STYPE=numeric[],
FINALFUNC=geom_mean_final,
PARALLEL = safe,
INITCOND='{0,0}'
);

Count the 5 counties with the best ethnic diversity based on the geometric mean

SELECT left(tract_id,5) As county, geom_mean(val) As div_county
FROM census.vw_facts
WHERE category = 'Population' AND short_name != 'white_alone'
GROUP BY county
ORDER BY div_county DESC LIMIT 5;
county | div_county
-------+---------------------
25025  | 85.1549046212833364
25013  | 79.5972921427888918
25017  | 74.7697097102419689
25021  | 73.8824162064128504
25027  | 73.5955049035237656

Try the aggregate function defined above as a window function, and list the 5 census tracts with the best ethnic diversity.

WITH X AS (SELECT
tract_id,
left(tract_id,5) As county,
geom_mean(val) OVER (PARTITION BY tract_id) As div_tract,
ROW_NUMBER() OVER (PARTITION BY tract_id) As rn,
geom_mean(val) OVER(PARTITION BY left(tract_id,5)) As div_county
FROM census.vw_facts WHERE category = 'Population' AND short_name != 'white_alone'
)
SELECT tract_id, county, div_tract, div_county
FROM X
WHERE rn = 1
ORDER BY div_tract DESC, div_county DESC LIMIT 5;
tract_id    | county | div_tract            | div_county
------------+--------+----------------------+------------------
25025160101 | 25025  | 302.6815688785928786 | 85.1549046212833364
25027731900 | 25027  | 265.6136902148147729 | 73.5955049035237656
25021416200 | 25021  | 261.9351057509603296 | 73.8824162064128504
25025130406 | 25025  | 260.3241378371627137 | 85.1549046212833364
25017342500 | 25017  | 257.4671462282508267 | 74.7697097102419689

8.3 Use PL/pgSQL language to write functions

If the SQL language can no longer meet your needs for writing functions, generally speaking, the common solution is to switch to PL/pgSQL. PL/pgSQL is superior to SQL in that it supports the definition of local variables through DECLARE grammar and support for process control grammar.

8.3.1 Writing basic PL/pgSQL functions

Use PL/pgSQL to write a function that returns a table type

CREATE FUNCTION select_logs_rt(param_user_name varchar)
RETURNS TABLE (log_id int, user_name varchar(50),
description text, log_ts timestamptz) AS
$$
BEGIN
RETURN QUERY
SELECT log_id, user_name, description, log_ts FROM logs
WHERE user_name = param_user_name;
END;
$$
LANGUAGE 'plpgsql' STABLE;

8.3.2 Use PL/pgSQL to write trigger functions

A total of two steps are required: the first step is to write a trigger function, and the second step is to explicitly attach this trigger function to the appropriate trigger. The second step separates the function that handles the trigger from the trigger itself, which is a powerful feature of PostgreSQL. You can attach the same trigger function to multiple triggers to realize the reuse of trigger function logic.

Because the trigger functions are completely independent, you can choose different programming languages ​​for each trigger function, and triggers written in these different languages ​​can work together. PostgreSQL supports activating multiple triggers through one trigger event (INSERT, UPDATE, DELETE), and each trigger can be written based on a different language.

Timestamp newly inserted records or modified records through triggers

CREATE OR REPLACE FUNCTION trig_time_stamper() RETURNS trigger AS ➊
$$
BEGIN
NEW.upd_ts := CURRENT_TIMESTAMP;
RETURN NEW;
END;
$$
LANGUAGE plpgsql VOLATILE;
CREATE TRIGGER trig_1
BEFORE INSERT OR UPDATE OF session_state, session_id ➋
ON web_sessions
FOR EACH ROW EXECUTE PROCEDURE trig_time_stamper();

➊ Define the trigger function. This function is applicable to any table with upd_ts field. This function will first update the value of the upd_ts field to the current timestamp, and then return the modified record.

➋ "Field-level trigger" is a feature supported since version 9.0, through which the trigger timing can be accurate to the field level. Before version 9.0, whenever an UPDATE or INSERT action occurs, the trigger in the above example will be triggered. Therefore, if you want to achieve field-level trigger control, you must compare OLD.some_column and NEW.some_column to find the changed field, and then you can determine whether to perform a "field-level trigger". (Please note: INSTEAD OF triggers do not support this feature.)

8.4 Use PL/Python to write functions

Python is a very flexible language, it supports a very rich function extension library. As far as I know, PostgreSQL is the only database that allows users to write functions in Python.

You can install both PL/Python2U and PL/Python3U in the same database at the same time, but you cannot use both languages ​​in the same user session. This means that you cannot call functions written by PL/Python2U and PL/Python3U in the same statement at the same time. You will see a language called PL/PythonU in the system, which is actually an alias created by the system for the PL/Python2U language in order to maintain forward compatibility.

Before using the PL/Python language, you must first build the Python runtime environment on the server. After setting up the Python runtime environment, you need to install the Python language extension package for PostgreSQL. Note, for example, if your plpython2u extension package is compiled based on Python 2.7, then the Python 2.7 runtime environment needs to be installed on the server.

CREATE EXTENSION plpython2u;
CREATE EXTENSION plpython3u;

Writing basic Python functions
PostgreSQL will automatically convert between PostgreSQL data types and Python data types. Functions written in PL/Python language support returning arrays and compound data types. You can use PL/Python to write trigger functions and aggregate functions.

Use functions written in PL/Python language to search the contents of the PostgreSQL official manual

CREATE OR REPLACE FUNCTION postgresql_help_search(param_search text)
RETURNS text AS
$$
import urllib, re ➊
response = urllib.urlopen(
'http://www.postgresql.org/search/?u=%2Fdocs%2Fcurrent%2F&q=' + param_search
) ➋
raw_html = response.read() ➌
result =
raw_html[raw_html.find("<!-- docbot goes here -->") :
raw_html.find("<!-- pgContentWrap -->") - 1] ➍
result = re.sub('<[^<]+?>', '', result).strip()return result ➏
$$
LANGUAGE plpython2u SECURITY DEFINER STABLE;

❶ Import the function library to be used next.
❷ Perform a search after concatenating the search term.
❸ Read the returned search results and save them in a variable named raw_html.
❹ Cut out the content contained between <!-- docbot goes here -> from raw_html and store it in a new variable named result.
❺ Delete the HTML tags and spaces at the beginning and end of result.
❻ Return the content of the result variable.

Use Python functions in query statements

SELECT search_term, left(postgresql_help_search(search_term),125) As result
FROM (VALUES ('regexp_match'),('pg_trgm'),('tsvector')) As x(search_term);

As mentioned earlier, PL/Python is an untrusted language, and there is no corresponding trusted version. This means that only super users can use PL/Python to write functions, and functions written in this language can directly manipulate the file system. Please note that from the perspective of the operating system, PL/Python functions are executed under the identity of the postgres operating system account created when PostgreSQL is installed, so you need to ensure that the postgres account has access to the directory used before executing this example.

List all files in a directory

CREATE OR REPLACE FUNCTION list_incoming_files()
RETURNS SETOF text AS
$$
import os
return os.listdir('/incoming')
$$
LANGUAGE 'plpython2u' VOLATILE SECURITY DEFINER;
SELECT filename
FROM list_incoming_files() As filename
WHERE filename ILIKE '%.csv';

8.5 Use PL/V8, PL/CoffeeScript and PL/LiveScript to write functions

slightly…

Guess you like

Origin blog.csdn.net/qq_42226855/article/details/110439805