Apache Calcite official Chinese version of the document - Advanced -4 Lattices.

The second part of the Advanced (Advanced)

1. Lattices

  Lattice is a framework to create and populate materialized views, materialized views can identify and resolve specific queries.
  Lattice a representative of a star (or snowflake) model, not generic schema. In particular, all connections must be many to one relationship, built around the star schema fact table.
  Lattice's name comes from the math: lattice is a partially ordered set, any two elements has a unique maximum and minimum lower bound upper bound.
  [HRU96] found cube may constitute a set of lattice materialized, and uses an algorithm to select a set of chemical and physical composition is more optimized. Calcite recommendation algorithm is also derived from this.
 Lattice use SQL statement to represent the star model in order to define itself. SQL is a useful relationship can be expressed by a plurality of phrases join table, and may specify the alias name of a column (the invention of a new language to represent the relationship, join conditions and compared with the base, more convenient expression SQL).
  And conventional sql different order here is important. If you are in front of B from A in the module, and to achieve a join relationship between A and B, it can be said that there is a many to one relationship from A to B. (For example, in the lattice example, Sales fact table before the time dimension tables and dimension tables before the product is activated, the product dimension table in the dimension table before the product classification is triggered, further down an arm of a snowflake .))
  A means Lattice a series of constraints. In relation from A to B, there is a foreign key on Table A (e.g., foreign key each of A in the table corresponds to a value of Table B key in), and Table B of a unique key (e.g., key value only once). These constraints are very important because it allows the planner to remove unused columns of the join relationship, and know that results will not change.
  Calcite will not go check constraint, if the constraint is violated rules, Calcite return an incorrect result directly.
  Lattice is a large virtual join view. It is not materialized (due to the anti-standardized, it may be several times larger than the star model), and you may not want to query it (columns too much). So in the end it is doing it?
1) As mentioned above, lattice statement many useful constraints primary keys and foreign keys.
2) It helps the query execution will map the user's query behavior to filter-join-aggregate materialized view (DW queries most useful materialized view type for)
3) to Calcite a framework to collect statistical information about the data volume and user queries
4 ) to allow Calcite automated design and filling materialized view
  most of the star models will force the user to define a column in the end is the dimension or metrics. In the lattice, each column is the column dimension (that is, it can become a GROUP BY clause is used to query star schema on a specific dimension). All column also be used as a measure (index), you can specify the aggregate function for the specified columns to define a measure (indicator).
  If "unit_sales" generally prefer to use instead of as a measure of dimension, there is no impact. Calcite algorithm in the near future will notice that it is rarely polymerization, they will not tend to create an aggregate layer (current algorithm does not query historical reference in the design of tiles) on it.
  But someone might want to know, less than five orders of more than 100 orders and profits will more or less. In this scenario, "unit_sales" suddenly become a dimension of. Costs if you declare a column as a dimension of zero, I think we should let all columns become the dimension column.
  The model allows a table to be used multiple times to specify a different table alias. ShipDate OrderDate and can be defined in the time dimension of the model.
  Most systems require a SQL view column names are unique, more difficult to achieve in the lattice, as will often include both primary and foreign key columns in a join in. Therefore, Calcite allows users two ways to reference a column if the column is unique, it can be used as the name [ 'unit_sales']. No matter listed in the lattice is unique, it is unique in its own table, so you can be referenced to it by its table name. The following example 

  • [“sales”, “unit_sales”]
  • [“ship_date”, “time_id”]
  • [ "Order_date", "time_id" ]
      a "tile" is the lattice with a specific dimension (referred Kylin the cuboid - Cube contains different dimensions of all combinations of two, each combination that is, a Cuboid) materialized table. Lattice JSON tiles attribute element defines a set of initial tile materialized.
      If you run the algorithm, it may be omitted tiles property. Calcite will choose an initial collection. If the tiles attribute definitions, the algorithm will be to define the list started running, then start looking for other complementary tiles (eg fill in the gaps left  by the initail tiles)

Guess you like

Origin blog.51cto.com/1196740/2406997