You have heard a lot of "OLAP", but do you really understand it?

Anyone who is engaged in data-related work has heard of "OLAP" more or less. But most of them are still in a state of ignorance, only knowing that it is a collective term for data analysis technology. So, what exactly is OLAP? What does it have to do with BI? Today, wheat will find out with everyone.

1. What is OLAP?

The concept of OLAP was first proposed by EFCodd, the father of relational databases, in 1993, and he also proposed 12 guidelines on OLAP. The proposal of OLAP has caused great repercussions. OLAP, as a class of products, is clearly distinguished from online transaction processing (OLTP).

Today's data processing can be roughly divided into two categories: OLTP (On-Line Transaction Processing) and OLAP (On-Line Analytical Processing). OLTP is the main application of traditional relational databases, mainly for basic and daily transaction processing, such as bank transactions; OLAP is the main application for data warehouse systems, supporting complex analysis operations, focusing on decision support, and providing intuitive and easy-to-understand search result. OLAP enables users to access information quickly, consistently, and interactively from multiple angles, thereby obtaining a type of software technology that has a deeper understanding of data. The goal of OLAP is to meet decision support or to meet specific query and report requirements in a multidimensional environment. Its technical core is the concept of "dimension".

"Dimension" is the angle from which people observe the objective world, and is a high-level classification. "Dimensions" generally contain hierarchical relationships, and this hierarchical relationship can sometimes be quite complicated. By defining multiple important attributes of an entity as multiple dimensions (Dimension), users can compare data on different dimensions. Therefore, OLAP can also be said to be a collection of multidimensional data analysis tools.

2. OLAP operation

The basic multi-dimensional analysis operations of OLAP include drill (Roll Up and Drill Down), slice (Slice) and dice (Dice), as well as rotation (Pivot), Drill Across, Drill Through, etc.

Insert picture description here

Drilling is to change the level of dimensions and the granularity of analysis. It includes drill up (Roll Up) and drill down (Drill Down). Roll Up summarizes low-level detailed data to high-level summary data on a certain dimension, or reduces the number of dimensions; while Drill Down is the opposite, it goes from summary data to detailed data for observation or adding new dimensions.

Slicing and dicing are concerned with the distribution of metric data on the remaining dimensions after selecting values ​​on some dimensions. If there are only two remaining dimensions, it is a slice; if there are three, it is a dicing.

Insert picture description here

Rotation is to transform the direction of dimensions, that is, to rearrange the placement of dimensions in the table (for example, row and column exchange).

Insert picture description here

OLAP is online data for specific problems, which is queried and analyzed in a multi-dimensional manner. Dimension is the specific angle from which people observe data. For example, when an enterprise considers the sales of products, it usually observes the sales of products from different angles of time, region and product. The time, region, and product here are dimensions, and the multi-dimensional array composed of different combinations of these dimensions and the measured indicators is the basis of OLAP analysis.

Multi-dimensional analysis refers to the use of various analysis actions such as Slice, Dice, Drill Down and Roll Up, and Pivot for data organized in a multi-dimensional form to analyze the data and make Users can observe the data in the database from multiple angles and sides, so as to deeply understand the information contained in the data.

3. Classification of OLAP

OLAP is divided into ROLAP, MOLAP and HOLAP according to the data storage format of the memory.

MOLAP (Multi-dimensional OLAP) stores data in a multi-dimensional array model. Its characteristic is that the data needs to be pre-computed (pre-computaion), and then the pre-computed result (cube) is stored in the multi-dimensional array. Because the cube contains the aggregated results of all dimensions, the query speed is very fast. However, query flexibility is relatively low, and dimensional models need to be designed in advance. The content of query and analysis is limited to these specified dimensions, and additional dimensions need to be recalculated.

ROLAP (Relational OLAP) stores data in a relational model. Generally, fact table and dimension table are designed according to a certain relationship. It does not require pre-calculation. Using standard SQL, data of different dimensions can be queried instantly as needed. It has strong scalability and is suitable for models with a large number of dimensions. But because it is an instant calculation, the query response time is generally longer than the pre-calculated MOLAP.

Whether it is MOLAP's multidimensional data model or ROLAP's relational model, they all need to be pre-designed before they can be used.

4. The relationship between OLAP and BI

BI includes data collection, data preparation, data analysis, and data sharing. Data analysis includes multiple analysis techniques such as reporting, OLAP, data mining, and data visualization. Therefore, OLAP is only a part of BI, a data analysis technology in BI. Because OLAP requires pre-modeling, it is mainly used in BI to "describe what happened?" It is a "static" analysis technology like reports and dashboards, used to build information portals or monitor data .

Perspective analysis is the most commonly used OLAP analysis tool, which can quickly classify and summarize and compare a large amount of data, and can quickly change the statistical analysis dimensions to view the statistical results according to the user's business needs. Perspective analysis not only integrates the advantages of data analysis methods such as data sorting, filtering, combination, and classification, but also the method of aggregation is more flexible and changeable, and the data can be displayed in different ways.

However, OLAP-based perspective analysis requires a complex data processing process, cubes, dimension tables, fact tables, fixed dimension levels, aggregate indicators, etc., and data queries need to write SQL statements.

Therefore, we need to improve the "static" perspective analysis and make it a "dynamic" analysis tool, allowing users to freely explore the data. Let it not only "describe what happened?", but also "analyze why it happened?".

For example, Smartbi's perspective analysis tool adopts the design of "Excel-like PivotTable", multi-dimensional analysis no longer needs to build a model, it can combine dimensions, summary calculations, slicing, drilling, and insight into data. Not only that, any field can be directly used as an output field or filter condition, which makes it easy to query and explore data.

Insert picture description here

In the subsequent version of V10, Smartbi will also reconstruct the data set, integrate a new OLAP engine in the data set, and give OLAP analysis capabilities to various analysis tools such as reports, dashboards, and perspective analysis to create a more intelligent Big data analysis platform.

In summary, the role of OLAP is to allow users to quickly gain insights into data from different angles. With a large amount of data and numerous dimensional indicators, users can focus on the key points of analysis. Having said that, do you have a much deeper understanding of "OLAP"?

** Sematic software unified login platform**

Guess you like

Origin blog.csdn.net/Moogical/article/details/115201312