[Function] HQL Summary

SQL background is almost 80% of the junior analyst job content, of course, in my plan, the mid-level analysts have work to do dimensional modeling and OLAP systems have the best tools getting better, or been able to take the initiative to write articles analyzed, minimize the inefficient use of SQL. This is my own position, someone else does not matter.
anyway, SQL analysts is a great tool, HQL namely Hive SQL, MySQL function relatively more and more complicated.
In addition, when the query function which can be used, usually in three ways:
First, remember that a function name, but forgot how to use parameters, the index is a function name;
the second is you want to achieve a certain purpose, is not clear what function can be achieved, the index is a function of the role;
Third, want to achieve a more complex purpose is not clear what functions can be combined to achieve, here typically involve multiple functions, the index is use cases.
So in subsequent articles, the first three columns will reflect this.

Number of positions synchronization table

This transactional database and MySQL difference is that the data warehouse is a subject-oriented (Subject Oriented), integrated (Integrate), relatively stable (Non-Volatile), reflects the historical changes (Time Variant) data collection, use to support management decisions. Not to mention the nature of the first three, the last one to reflect historical change it is how to embody it? This is the way synchronous online transaction table.
Line down three ways table synchronization, delta tables, the total scale, zip table.
Further to mention, the number of bins in the morning two or three are synchronous T + 1; field in addition, there is a concept of the number of bin partition for quick access to a data block, as will usually date dt partition, other services e.g. what type may also be used as a partition, it can be understood as a convenient and quick access to index number. View table partitions statement is
`show partitions library name Table name`
## delta table ##
is well understood, delta tables that line first Vincent synchronous transaction amount partition table to yesterday, a day after sync only those created, updated yesterday to a record yesterday partition.
If some records of the table will not be updated, such as logging, synchronization will normally be used to increment; if some tables will record to be updated, it is necessary to take the current state of the last taken sorting records by update the primary key ID a record; take one day snapshot history limits dt <= day take on the last day recorded; the state needs to take every day auxiliary date table left join on 1 = 1 where the date table .dt <= delta tables .dt, and then the record auxiliary tables date by the date and time of update sorting, auxiliary last updated date taken as the recording date table state can be daily.
## full-scale ##
is also well understood that regard the online transaction table data synchronization yesterday partition every day.
Take direct access to the current state of the partition yesterday, to take one day snapshot history limits dt = the day you can, take each day at a time dt limit state need to.
## Table ## zipper
Difficult to understand, its online transaction table will typically record update, the line table usually has three partitions, dp, start_date (or dt), end_date. dp has two values, and the ACTIVE EXPIRED, effective and expiration respectively, i.e., the recording is currently active state and an expired record is the record (history of state); START_DATE indicates the effective start recording from the day, indicating that END_DATE record invalid from the day.
In the first table zipper Vincent amount of synchronous online transaction table synchronization (sync is unable to hold the first day of the days before the) day after the new record will (usually create time yesterday) direct synchronization (dp = 'ACTIVE' and start_date = yesterday and end_date = '9999-12-31'), but the old record updates (usually sooner created, updated yesterday) old record set (dp = 'EXPIRED' and end_date = yesterday) and add a new updated record (dp = 'ACTIVE' and start_date = yesterday and end_date = '9999-12-31').
So take the current state of the table zipper just need to limit dp = 'ACTIVE' (or start_date <= yesterday and end_date> yesterday) you can; take state history one day limit start_date <= day and end_date> the day (which may be difficult I understand, you can stop and think about); take each day state restrictions
# commonly used functions

Common Functions
Functions & Case Function name Function role parameter understanding Example of use
           
           
           
           
           

 

#Use Cases

COALESCE ##
COALESCE (T v1, T v2, ...)
returns the first non-null value parameter; if all values are NULL, then return NULL
# higher-order functions

# function Daquan

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

Guess you like

Origin www.cnblogs.com/everda/p/11237370.html