mondrian in action first chapter translation (mainly with Baidu translation is complete, a little finishing)

 

This chapter covers

Based on the report of the complexity of a database

OLAP advantages of reporting tools

By reason of Mondrian

Business analysis is based on a historical analysis to gain insight into business performance process. Traditionally, the tools for business analysis is very expensive and difficult to maintain. In contrast, the Mondrian is an open-source business analysis tool, which enables any size organization can allow business users to access interactive data analysis, and create analyzes without IT or database administrator to help the situation report. Once the data set, the user can directly interact with it. This book will introduce you to the concept of Mondrian and technical know-how, including how to organize data for easy access, how to safely provide data and how to integrate these data into other applications.

The first chapter will introduce you to some frequently asked questions based on analysis of the report encountered. We show you the complexity

1

Involved in creating database reports and why they are not suitable for analysis. Then, we will demonstrate how to use Mondrian to overcome these challenges and explain some of the features Mondrian become the ideal choice. Finally, we will provide an overview of the rest of the book, which we will describe in detail all aspects of Mondrian, and teach you how to effectively use the Mondrian analysis.

1.1 Business requirements analysis

In his book "Money treasure" in Michael Lewis tells the story of how Oakland a sect set up a talented professional baseball at the lowest budget, highly competitive team. Prior to this, scouts scout is observed by the players, and developed into a professional player who will be intuitive. With the recruitment of players soaring costs, the cost of signing the wrong person subsequently soared.

General manager of the Oakland (Oakland) • Billy Beane (Billy Beane) decided that they needed a way more analytical. He invited analysts to study college player statistics, and determine which players are excellent candidates, but for various reasons scouted ignored. Statistics such as base percentage and number of walking each bat, previously considered to be an important consideration. This gives the advantage of Oakland in terms of recruitment of players, while other teams do not admit it, and sign them at lower prices.

Like a Member Oakland (Oakland), like some of today's businesses need to be able to optimize spending to maximize return on investment. Control various aspects of the business, such as inventory costs, waste, excess and return the machine or labor, is no longer optional, but must be highly competitive, market-driven intelligence to survive. Companies need good tools and processes to achieve this. Mr. a) wrote a lot of their own software, but this method is often very expensive, slow and dangerous. With Mondrian, any organization can use world-class analytical tools, they can be up and running quickly with minimal cost and risk.

Historically, business analysis and management is the use of spreadsheets, databases and reports the completion of operations. While these methods help view the predefined data format, but their effect is to explore and discover new information but not as good as before, because the report is often difficult to create and manipulate, and time-consuming. Online analytical processing (OLAP) is a technology that makes business data with sufficient structure, enables business users to easily explore data and discover important data relationships without having to understand the database query language database organization or the company's operations.

Here are some types of companies can be found using OLAP tools, and how to help them find their business:

α found that a particular product in high summer demand, but demand is low outside of summer. Companies can now adjust inventory on a seasonal basis, in order to avoid excessive storage costs.

α found that advertising on a variety of publications, the demand for services has changed. Companies can now coordinate advertising and staffing

 

 

In carrying out its new advertising campaign, to meet the needs without overspending.

α reveal the fact that visitors to a site different sex and age of the date and time. This information allows the website according to the date and time of customized content to reflect the different demographics.

α when the site is calculated and how much demand to peak reached. The company can now without adding too much capacity static situation, how to extend it to make informed decisions, while being able to meet the typical needs.

Understand the company's data needs tools that allow users to organize and explore data, find interesting facts. Mondrian is the engine of this tool.

Mondrian is an open source OLAP engine that provides access to data in a user's intuitive way. As an engine, it can be run in a Web Mondrian container (e.g., the JBoss or Tomcat), it is also be embedded as part of the application. Mondrian only one optional configuration, a data pattern and definition of a logical structure database populated with data. Mondrian work with the support of the majority of Java Database Connectivity database.

Figure 1.1 shows how Mondrian assist in the analysis in a typical deployment. Mondrian is located between the data, the logic described for the use of data analysis tools and data provided dashboards. Based on user data attributes to graphically explore the data, rather than through complex queries. Mondrian dynamically transform the underlying database query format logic, data provided accurate manner.

Data Figure 1.1mondrian is a business application analysis engine.

1.2 Replace static reports with online analytical processing (OLAP)

Running a business data, these data are usually presented to the user in the form of a report. Traditional reports are static, usually long and important data embedded in a large number of less important data. Users are often unable to understand the details of what they see behind the data. They can not understand the basic details or related data.

By modern online reports, many of these challenges have been overcome, allowing the user to reduce the data through the filter and create links to other reports. But these reports are still lacking flexibility actual analysis, it can be operated in a user further evidenced by the large number of export a report to Excel. This section describes an example of a company struggling under the weight of the report, and such systems typically have problems occur. In the next section, we will describe how to use modern analysis to overcome these problems and provide power to make decisions for the user needs information.

Adventure Works is a sale of bicycle parts and equipment company. Their job is to provide business analyst reports to help business users manage the business and maximize profits. Research data collected from various business affairs, he spent a lot of time.

Perhaps you have seen adventurous work elsewhere? If you've used other analysis systems, particularly in the Microsoft the Analysis Services ( MSA ), then you may have previously encountered Adventure Works database. MSAS in the field of business analysis has been a leader, Microsoft technologies and standards has been a leader, especially Multidimensional Expressions ( MDX ). Mondrian also efforts to comply with these standards, so we feel the same data using the example provided by Microsoft makes sense. Please note that we have established in the original Adventure Works on the database, the data warehouse may be different from Microsoft's.

Analysts and database administrators work closely to understand the structure of the database, in order to collect data for the report. When asked to submit a new report, he or easily create a database query, or in collaboration with more experienced database experts. He then constructed a report based on the data. If business users like this report, it will be put into the report, but in general, you want to make small changes, while analysts have to rewrite the report. To get the correct report may take several days, and users often need different reports.

Analysts also requires users to provide a variety of reports, so that they can see the different levels of data, and compared with each other different types of data. This means that analysts have to copy multiple reports, which contain essentially the same data presented in different level of detail. Users also want to limit the data and be able to click and view more detailed information in the data.

Figure 1.2 shows a portion of each city to view the total orders report of senior management. This allows managers to know which countries and cities with the largest orders. Listing 1.1 shows to generate

 

Figure 1.2 Urban Order

report. Note that it needs to know the source table with data, how to connect six tables and SQL syntax.

Listing 1.1 Query for orders by city

SELECT

`salesorderdetail`.`OrderQty`,

`salesorderdetail`.`UnitPrice`,

`stateprovince`.`Name`,

`stateprovince`.`CountryRegionCode`,

`address`.`City`

FROM

`salesorderheader`

INNER JOIN `salesorderdetail`

ON `salesorderheader`.`SalesOrderID` =

`salesorderdetail`.`SalesOrderID` INNER JOIN `customer`

ON `salesorderheader`.`CustomerID` =

`customer`.`CustomerID`

INNER JOIN `customeraddress`

ON `customer`.`CustomerID` =

`customeraddress`.`CustomerID` INNER JOIN `address`

ON `customeraddress`.`AddressID` =

`address`.`AddressID`

INNER JOIN `stateprovince`

ON `address`.`StateProvinceID` =

`stateprovince`.`StateProvinceID` GROUP BY

`address`.`City` ORDER BY `stateprovince`.`CountryRegionCode` ASC,

`stateprovince`.`Name` ASC,

`address`.`City` ASC

图1.3显示了更详细的国家和州级管理报告的一部分,显示了每个州或省的大客户。清单1.2显示了此报表的修改后的数据库查询。同样,分析师必须了解数据库的详细结构才能获得数据。要对报表进行任何更改,必须创建新查询和新报表。

1.3客户订单

Listing 1.2 Query for orders by customer
SELECT

`address`.`City`,SELECT

`contact`.`FirstName`,

`contact`.`LastName`,

`salesorderdetail`.`OrderQty`,

`salesorderdetail`.`UnitPrice`,

`customeraddress`.`CustomerID`,

`customer`.`TerritoryID`,

`stateprovince`.`Name`,

`stateprovince`.`CountryRegionCode` FROM

`address`

INNER JOIN `customeraddress`

ON `address`.`AddressID` =

GROUP BY

`address`.`City` ORDER BY `stateprovince`.`CountryRegionCode` ASC,

`stateprovince`.`Name` ASC,

`address`.`City` ASC

最近,对新报告和变更的要求开始变得势不可挡。AdventureWorks分析师无法跟上要求,工作时间很长。受挫的商业用户已经开始将他们的数据从IT中转储出来,并在Excel中进行分析,但这些数据始终是最新的,很难从多个角度进行查看。除了请求报告之外,分析师现在还接到了一些电话,帮助用户在Excel中操作他们的数据。

在完成一个特别复杂的报告后,一个多功能的连接查询会在一夜之间运行,分析师会进来找一个愤怒的数据库管理员等他。显然,报告减慢了操作数据库的速度,并导致了向客户发货的延迟。

高级经理对他们的报告感到满意,希望与区域和门店经理共享这些报告,但他们只希望让这些经理看到适用于他们的数据。他们要求为每个经理定制报告。图1.4显示了美国地区经理的报告。看起来要创建所有这些报告还有很长的一段时间。

1.4美国城市订单

随着大量的报告和越来越多的用户,系统开始变得迟钝,报告呈现时间也很长。这让业务用户感到沮丧,因为他们等待报告的时间比分析数据的时间要长。

如果分析师要保持清醒,而他的业务用户要保持快乐,就需要有更好的分析方法。幸运的是,他始终掌握最新的分析技术,并意识到一个开源的OLAP工具Mondrian可以帮助他摆脱这场危机。它将允许业务用户快速、安全地进行他们自己的分析,这不仅有助于他的职业生涯,也有助于盈利。

1.3 OLAP登場

Adventure Works希望有一种解决方案,允许用户在不等待创建报告或要求用户咨询数据库管理员的情况下执行自己的分析。他们还需要一个低成本的解决方案,具有最低的前期风险。最后,无论他们选择什么,都必须快速,这样用户就可以在几分钟内而不是几天内进行分析。

有许多可用的OLAP工具,但它们决定使用Mondrian的原因如下:

α蒙德里安支持用户驱动的分析。用户可以在没有管理员或报告编写者大量帮助的情况下进行自己的分析。

α蒙德里安是一个低成本、低风险的选择。蒙德里安是开源的,可以免费下载。蒙德里安还捆绑了许多分析工具和套件,使其易于安装和开始使用。

α蒙德里安很快。它有多种优化技术,允许用户使用交互式工具以思维的速度执行分析。

α蒙德里安具有内置的安全功能,使其成为拥有敏感数据的组织的理想选择。

α蒙德里安以开放标准为基础。它运行在各种各样的应用服务器上,并与大多数主要数据库一起工作。这意味着蒙德里安赢了——把你锁定在一个专有的解决方案中。

本节的其余部分将详细介绍蒙德里安的一些好处,以及它如何为冒险工程等组织解决问题。

1.3.1蒙德里安让用户驱动分析

蒙德里安通过消除对数据库管理员和查询编写器参与数据提取的需求,解决了许多与基于报表的分析相关的问题。在后面的章节中,我们将向您展示如何组织数据,并使分析人员能够轻松地获得这些数据。完成后,用户可以使用图形工具访问数据。他们不再需要了解数据的复杂性,可以将时间花在分析和发现上,从而改进业务。

在蒙德里安,数据是按属性组织的,例如位置和时间,这样您就可以问一些问题,例如,在

 

2011年北美?这些数据属性在OLAP术语中称为维度。多个用户界面提供了通过这些维度查看数据的拖放功能。您需要了解任何查询语言。

为什么本书很大程度上依赖于使用pentaho的例子。这是因为Pentaho是蒙德里安的主要支持者,并将其嵌入到其业务分析服务器中。虽然蒙德里安被许多其他系统使用,但宾塔霍是最常见的。

图1.5显示了Pentaho Analyzer视图,它允许商务智能(BI)用户将对象拖动到画布上。在那里),无需了解数据库的结构,也无需使用查询语言进行分析。

在维度内,可以按级别查看数据,如城市、国家或地区的销售情况。这允许您查看您感兴趣的级别的数据,以便国家经理可以查看国家级别的数据,区域经理可以查看区域级别的数据。

图1.6显示了通过将Country、State、Quantity Ordered、Price Each、Total和Year字段拖到画布上创建的州级别的订单。当每个字段放置在画布上时,数据会自动更新。

图1.7显示了相同的分析,但细节层次更精细。在本例中,用户将城市和客户的其他字段拖到报告中。此版本在几秒钟或几分钟内为您提供了更详细的信息,而无需创建其他查询或物理报告。

您可以使用仅基于某些规则(如值、字符串文本等)显示数据的筛选器轻松限制数据。蒙德里安支持所有尺寸的过滤器和

图1.5

图1.6状态级订单

图1.7客户级订单

图1.8

值,以及特殊的过滤器,如前10个和字符串模式匹配。这使您能够根据需要定制分析,而不是要求您查看包含大量额外数据的长报告。

图1.8显示了一个用户过滤一个报告,只包含英国和美国以及2004年。分析师可以只关注相关的信息,而不需要查询编写器为每个用户创建单独的报告。

1.3.2mondrian是一种低成本、低风险的解决方案

蒙德里安是一个任何人都可以下载和构建的开源项目。使用该工具没有许可费或其他相关费用,这使得蒙德里安成为分析的低风险选择。因为蒙德里安是一个引擎,你也需要一个服务器来承载它。幸运的是,蒙德里安运行在各种服务器上,包括独立模式和流行的业务分析服务器。这些服务器中最流行的是Pentaho,这是一个开源的业务分析套件,有一个社区版,你可以免费使用。Mondrian嵌入在服务器中,充当拖放工具的引擎,允许用户轻松进行分析。

图1.1显示了蒙德里安作为分析引擎的作用。图1.9显示了Mondrian如何处理来自业务用户的分析请求。

图1.9执行分析查询

1      业务用户决定使用许多不同的前端(通常是瘦客户机接口,如PentahoAnalyzer)查询一些数据。

2      该接口使用Web服务调用或直接API调用创建多维表达式(MDX)查询。MDX是一种标准化的通用查询语言,用于分析,并得到大多数分析引擎的支持。MDX的优点是它简化了对数据库的调用,同时也非常强大。它也是一种常用的方言,无论数据存储在哪个数据库中,都可以使用。有些用户界面允许用户直接在MDX中输入或修改查询,允许熟悉MDX语法的用户执行更复杂的查询并使用MDX中的许多可用功能。

3      蒙德里安使用逻辑模式,将其组织为维度(数据的属性,如日期和位置)和度量(实际数据事实,如成本、库存级别等)的多维数据集。该模式还提供性能优化和安全性的特性。蒙德里安使用此模式从内存缓存中或通过生成优化的数据库调用来检索数据。蒙德里安自动为大量数据库创建正确的SQL。

4      Mondrian根据元数据描述生成SQL查询,并发出数据库请求。

5      数据库将数据的结果集返回给蒙德里安。

6      蒙德里安使用可视化工具能够理解的标准API将数据返回到用户界面。

7      最后,这些数据以表格形式以图形形式为用户格式化,便于理解和操作。

整个过程通常只需要几秒钟,因此业务用户能够在单个分析会话中探索各种不同的替代方案。此外,如果您使用Mondrian作为Pentaho BI套件的一部分,则可以使用Mondrian作为Pentaho报表、企业仪表盘以及通过直接在Analyzer中操作瘦客户端前端的数据源。这使得蒙德里安成为各种用户友好界面的一个非常灵活的引擎,同时仍然为开发人员提供标准的数据接口。

 MONDRIAN MDX Although Mondrian strives to be compliant with

微软版的MDX,有一些小的分歧。有关差异的最新列表,请参见Mondrian网站:http://mondrian.pentaho.com/documentation/mdx.php。

1.3.3mondrian很快

蒙德里安设计得很快。OLAP数据库的结构是为在对大型数据集进行计算时的性能而设计的,分析中的更改以秒为单位显示。此外,蒙德里安还利用了几种优化技术,例如计算的内存存储,以进一步提高速度。而且由于Mondrian可以嵌入到Web应用程序中,所以它可以很容易地进行扩展,以供成百上千的用户使用。

尽管基于数据仓库的结构,性能提高有很大的差异,但是使用聚合表和内存缓存可以显著提高性能。例如,一个用户有一个事实表,其中有数亿行,八个维度表,最多有2500万行。直接使用SQL运行报表每个大约需要10分钟。加上Mondrian和Aggregations使时间缩短到8秒多一点。加上缓存,这些查询平均下降到2.4秒。图1.10说明了使用蒙德里安可以获得的显著收益(在本例中,使用蒙德里安的速度是使用蒙德里安的100倍以上)。

图1.10

因为Mondrian速度很快,它允许分析师执行高级分析,而使用SQL会非常困难或很慢。例如,蒙德里安的函数可以让分析师进行线性回归或比较各个时期的表现。它自动在所需的级别进行计算,而无需编写其他查询或程序。这使得蒙德里安成为高级分析的理想解决方案。

快速结果对于交互式分析至关重要。speed使分析师能够以多种方式探索数据,并发现有关业务的信息,例如高销量产品线、仓库库存问题以及哪些网络营销策略有效。在后面的章节中,我们将向您展示如何组织数据以获得最佳性能。我们还向您展示了如何配置Mondrian以使用聚合表和缓存来进一步提高性能。

1.3.4蒙德里安安全

除了性能之外,企业在处理公司数据时还需要考虑其他因素,例如限制对特定用户的访问以及支持多个客户机的租用环境。蒙德里安使用基于角色的方法来限制数据访问。

基于角色的安全性意味着基于与用户关联的角色限制数据。例如,人力资源经理可以访问有关员工的敏感信息,这些信息不应与其他员工共享。财务经理需要知道成本,但库存经理只需要知道库存水平。通过为这些不同类型的用户分配独特的角色,蒙德里安可以拥有一个单一的分析数据库,但只能显示每个用户所需的数据。分析和报告工具只能获取适合特定用户的数据,因此您不需要为不同的角色单独提供报告,只需限制数据即可。

在有关安全的章节中,我们将向您展示如何应用角色来限制对敏感数据的访问。我们还向您展示了一些高级方法,您可以使用这些方法使角色具有动态性,并为多个客户机分离数据,从而在多租户环境中保护每个客户机的数据。后一个功能对于那些不想在内部使用分析,但又想向外部客户公开分析的组织很有用。

1.3.5mondrian基于开放标准

由于蒙德里安是基于开放的技术标准,如Java和Web服务,它可以在各种各样的平台上运行,并被包括在桌面客户端和瘦客户机中。这使得蒙德里安和OLAP的好处很容易分发给世界各地的用户。这也意味着蒙德里安用户与任何特定的硬件、操作系统或专有软件都有联系。

蒙德里安使用各种开放、免费提供的标准。尤其是蒙德里安支持以下标准:

 

总结

αOLAP4J标准开放式OLAP标准

α基于SOAP的系统到系统交互标准

αXML)标准标记语言,允许您使用简单的文本编辑器创建蒙德里安模式。

因为它支持olap4j和xmla,所以很容易嵌入Mondrian并使用它提供许多解决方案备选方案,如交互式分析、报表和仪表板。

Mondrian与大多数数据库(包括传统的关系数据库,如Microsoft SQL Server、Oracle、PostgreSQL和MySQL)以及更新的柱状数据库(如Greenplum、Netezza和Luciddb)一起工作。这意味着,尽管企业通常希望以从蒙德里安获得最大利益的方式来组织数据,但他们通常会赢——需要新的数据库解决方案来实现这一点。数据库管理员还可以继续使用他们知道的系统和工具。

最后,蒙德里安是开源软件。您不仅可以在线下载二进制文件,还可以下载源代码,让您根据需要定制和扩展Mondrian。开源使一个由用户和开发人员组成的社区能够互相帮助,并将想法贡献回项目中。社区参与本地用户组、在线论坛和会议。

对于需要专业支持和附加企业功能的企业,蒙德里安还作为Pentaho Enterprise Edition的一部分提供,Pentaho Enterprise Edition是一个完整的业务分析平台,包括数据仓库、报告和数据挖掘工具。

在本书中,我们将向您展示如何为蒙德里安配置和使用许多工具。我们还向您展示了如何使用Mondrian作为报表和仪表盘的分析信息源。最后,我们将向您展示如何直接或使用Web服务将蒙德里安集成到您自己的应用程序中。

1.4总结

本章介绍Mondrian的业务分析。它涵盖了基于报告的分析的问题,并展示了蒙德里安如何解决这些问题,以及蒙德里安如何作为分析引擎融入分析体系结构。具体来说,您看到蒙德里安如何提供以下内容:

α用户驱动的分析,用户可以自由地浏览数据

α通过数据仓库、聚合和缓存的结构提高了性能

α企业功能,例如基于角色的数据访问,以将数据限制到不同的用户和组

你现在应该对蒙德里安能做什么以及它能帮助解决的问题有一个很好的了解。您还应该了解蒙德里安在BI解决方案的总体架构中所处的位置。最后,根据你所扮演的角色,我知道书的其余部分与你的需求最相关。

下一章将简要介绍蒙德里安,向您展示蒙德里安如何向用户提供数据,以及如何对数据进行结构化和建模以支持分析。我们有机会运行系统并使用PentahoSaiku进行分析

Guess you like

Origin blog.csdn.net/wh_xia_jun/article/details/90198990