Introduction of four BI open source tools - SpagoBI, openI, JasperSoft, Pentaho

1 Brief description of BI system

From a technical point of view, BI includes ETL, DW, OLAP, DM and other links. Simply put, it is to extract the data that has occurred in the transaction system into a data warehouse with a clear theme through ETL tools. After OLAP, a cube or report is generated and displayed to the user through the Portal. The user uses these to classify, aggregate, describe and visualize. data to support business decisions.

These numerous BI projects can be roughly divided into three types: Framework, Stand-alone Tools and BI Suit in terms of scale and completeness of BI system support.

  • Framework

Open source framework, which is not available in commercial BI systems. We can use them to build our own BI tools, or to enhance and extend our BI solutions.

  • Stand-alone Tools

Standalone BI tools, the most numerous category of open source projects. Many tools only focus on a certain link and aspect in the BI system, such as ETL, Report, OLAP and Database and so on.

  • BI Suit

A collection of tools that provides features of a variety of BI systems under a unified architecture. As far as the current situation is concerned, whether it is commercial software or open source software, no suite has provided a complete end-to-end BI solution. These open source BI Suits are formed by connecting multiple other components and tools. Since there are so many tools involved in the BI system, it is difficult to integrate a complete BI solution.

 

2 Tools in a BI solution

There are multiple tools in a complete BI solution to complete the work at various stages in a BI system.

2.1 ETL tools

Data extraction, transformation and loading tools. A good ETL tool should have the following characteristics:

  1. Workflow Management, Job Execution and Scheduling Manager. Easily define processes and automate ETL tasks;
  2. Centralized Metadata Repository and Management. Centralized storage and management of industry-standard metadata;
  3. Data Profile and Validation. The quality of the data can be checked;
  4. High Performance. It still has good performance in the execution of heavy load tasks;
  5. Scalable, Platform Independent. It has good flexibility, supports a variety of operating systems and database systems, and can operate a variety of heterogeneous data sources;
  6. Open Architecture and API. It has an open architecture and an easy-to-use secondary development interface.

At present, the more well-known open source ETL tools are:

  1.  KETL, developed by Kinetic Networks with IBM and KPMG background, has more than three years of product application history, and has been successfully used in some products, and has performed well in clickstream (ClickStream) analysis applications. KETL adopts Plug-in architecture and is developed in Java;
  2.  KETTLE, a metadata-driven ETL tool. has joined Pentaho;
  3.  Clover ETL, a Java-based ETL Framework, can be used to develop your own ETL applications;
  4. Enhydra Octopus, a Java-based ETL tool that uses JDBC to connect to various data sources, is easy to use and deploy. It has been used in telecommunication network resource analysis system.

2.2 Reporting tools

A good reporting tool usually has the following characteristics:

  1. Support multiple data sources;
  2. Intuitive visual designer, easy-to-use report customization function;
  3. Convenient data access and formatting, rich data presentation methods;
  4. Comply with common standards for data presentation and can be well integrated with applications;
  5. Easy to scale and deploy;

At present, the more well-known open source reporting tools are:

  1. JasperReports, an excellent Java reporting tool, started in 2001, and is now continuously developed and supported by JasperSoft. This tool is similar to the commercial software Crystal Report, supports PDF, HTML, XLS, CSV and XML file output formats, and is now the most commonly used reporting tool for Java developers;
  2. OpenReports provides a flexible web-based reporting solution that automatically generates dynamic PDF, XLS, HTMLCSV and Chart reports through browsers. It is developed in Java, uses JasperReports as the reporting engine, and utilizes open source technologies such as Hibernate, Veloctiy, Webwork ;
  3. JFreeReport,现在是Pentaho的一部分,它是一个优秀的用来生成报表的Java类库。它为Java应用程序提供一个灵活的打印 功能并支持输出到打印机和PDF, Excel, HTML和XHTML, PlainText, XML和CSV文件中;
  4. Eclipse BIRT,是Eclipse下面的一个企业智能和报表 工具,能为J2EE的WEB应用程序创建漂亮醒目的PDF或者HTML格式的报表,它提供了核心的报表功能。

 

2.3 OLAP工具

联机分析处理工具。目前开源的OLAP工具也分为MOLAP(多维型)、ROLAP(关系型)和HOLAP(混合型),优秀的OLAP工具通常有以下特性:

  1. 良好的执行性能,能快速地进行分析处理工作;
  2. 良好的适用性和可伸缩性;
  3. 开放式接口和丰富的API;

目前较为知名的开源OLAP工具有:

  1. Mondrian,是Pentaho的一部分,为一个用Java开发的OLAP服务器,实现了MDX语言、XML解析和JOLAP规范,可以不写SQL就能分析存储于SQL 数据库的庞大数据集,可以封装JDBC数据源并把数据以多维的方式展现出来;
  2. JPivot,是一个JSP 自定制的标签库,可以绘制一个OLAP表格和图表。用户可以执行 典型的OLAP导航,如下钻,切片和方块。它使用Mondrian 作为其OLAP服务器。它使用WCF (Web Component Framework) ,基于XML/XSLT来渲染Web UI组件。JPivot在元数据缓存方面的过于简化的整体性初始化装载的做法将限制它只能处理很小的立方体(Cube)。

2.4 数据库

  开源的数据库也有很多,大多数为关系型数据库,少数为应用于数据仓库环境做了专门的优化工作。Bizgres以PostgreSQL为基础进行了数据仓库环境下的优化,提高了分析查询性能。

 

3 开源BI套件

下面列出相对成熟和完整,并且有借鉴意义的开源BI套件。

openI

Openi是一个Java开发的Web应用,能对OLAP服务器、关系数据库和数据挖掘服务器进行分析和报表展示,非常易于使用和部署,界面美观友好,后续还将支持数据挖掘和ETL等。Openi主要包括:

  1. OLAP展示:JPivot
  2. 报表工具:JFreeChart
  3. 分析数据源连接器

Openi架构:

RDL是Report Define Language
openI具有一个BI应有的大部分特性了,
report :  jasperreport ,JFreeChart
olap :   mondrian +  JPivot
data mining:   weka
它的各层衔接的非常的紧,好像用了eigenbase做数据管理,不是很清楚这部分,openI在做数据挖掘的时候它没有调度器,它的Portlet Interface 主要是指在用JPivot的时候JPivot可以到处使用openI没有自己的开发专属工具,入门门槛也相对较低。

JasperSoft

Jaspersoft商务智能套件是建立在模块的基础上的,因此很容易建立,以此证明其递增价值。Jaspersoft主要包括:

  1. JasperServer:为商业用户的互动,特定和预设查询与报告服务器
  2. JasperAnalysis:为商业用户的互动提供OLAP数据分析
  3. JasperETL:开发人员和数据库管理员的高性能的图形数据整合
  4. JasperReports:开发人员所用的Java报表函数库

JasperSoft最重要的就是它的报表,但是它支持输出的格式很多,管理的方式也很多,也用了eigenbase做数据管理。

有比较完善的权限控制,用的acegi,支持多种数据源,只要有JDBC驱动。它的产品已经形成了一个产品线,最著名当然还是它的JasperReport。

你可以看到它为了更好的管理各种报表和数据,有自己专属的展现平台JasperServer,这个平台是 06/26/2006才创建的,完全是JasperSoft为了实现BI而迈出的重要一步。jasper没有数据挖掘。

有任务调度器,用了quartz;
有自己专属的ETL: JasperETL;
它有自己的OLAP SERVER : jasperAnalysis;
展示层用到了AJAX和applet, 也有DashBoard;
查询语句支持SQL, Hibernate (HQL), XPath (XML), EJBQL, MDX(多维查询语言,OLAP专用,SQLSERVER用的是XMLA)。

SpagoBI

SpagoBI 集成了Mondrain和JProvit,能够通过OpenLaszlo产生实时报表。SpagoBI使用java开发,不依赖于具体的操作系统,有很强的扩展能力。它主要包括:

  1. 报表工具:JasperReports /Eclipse BIRT/ iReport
  2. OLAP Server:Mondrian
  3. OLAP展示:JPivot
  4. 数据挖掘组件:Weka
  5. Map引擎:Geo
  6. ETL:BIE
  7. 搜索引擎:Lucene
  8. Dashboard:OpenLaszlo
  9. Portal Server:JBoss/ Tomcat/ JOnAS

根据其Roadmap可以看出,SpagoBI将融入更多的BI功能,甚至BI之外的功能。

SpagoBI架构:

spagoBI平台功能很强大,也很复杂。
它的各个组件之间模块化很好,Plugin加载,来看一下它的各个组件:
report : BirtReportDriver ,  BirtReportEngine , JasperReportDriver ,JasperReportEngine;
GEO :  GeoDriver , GeoEngine(用地图显示数据和查询的);
OLAP : JPivotDriver , JPivotEngine;
QBE  :  QbeDriver, QbeEngine ; 
Data Mining : WekaDriver  ,  WekaEngine;
Security :  ExoPortalSecurityProvider;
Booklet (小册子) : BookletsComponent : it is a component for booklets generation.主要包括文件上传,工作流,OpenOffice支持;
它还有文档管理,用的是apache的JackRabbit,有搜索功能,用的lucene。是做cms,portlet,workflow出身的,技术很强。
spagoBI的使用的工具也比较多:
Report :  Bird  ,   JasperReport;
ETL :   Octupus  和  talend;
OLAP :  Mondrian  和  JPivot;
Data Mining   : Weka;
Portal   : eXoPortal;

它的展现层也使用了AJAX特性,另外它在DashBoard也使用了openlaszlo,(一个用Java code 生成Flash的框架,主页是http://www.openlaszlo.org/。新版的4.0好像也要支持生成DHTML) 所以spagoBI的DashBoard界面很友好。

spagoBI的ETL是非常之牛的。你可以看到它下面的数据处理层是单独分出来的。

Pentaho

Pentaho是一个以工作流为核心的、强调面向解决方案而非工具组件的BI套件,整合了多个开源项目,目标是和商业BI相抗衡。它包括:

  1. 工作流引擎:Shark and JaWE
  2. 数据库:Firebird RDBMS
  3. 集成管理和开发环境:Eclipse
  4. 报表工具:Eclipse BIRT
  5. ETL工具:Enhydra/Kettle
  6. OLAP Server:Mondrian
  7. OLAP展示:JPivot
  8. 数据挖掘组件:Weka
  9.  应用服务器和Portal服务器:JBoss
  10. 单点登陆服务及LDap认证:JOSSO
  11. 自定义脚本支持:Mozilla Rhino Javascript脚本处理器

由上可见Pentaho是一个很完善的BI解决方案。Pentaho偏向于与业务流程相结合的BI解决方案,侧重于大中型企业应用。

Pentaho架构:

pentaho的体系结构跟spagoBI非常相像,不过pentaho喜欢把自己的东西称作solution,以下引用自pentaho的whitepaper:

pentaho BI 平台不同于传统的BI产品。它是一个以流程为中心的,面向解决方案的(Solution)的框架,具有商业智能(BI)组件,使得公司可以开发商业智能问题的完整解决方案pentaho一样把数据处理层看的很重要,多种数据显示方式,甚至有RSS输出。

pentaho是有各种开源组件组成的。

ETL :  Kettle  (界面上显示的是pentaho Data Integration ,previously Kettle)
Report : Pentaho Report (它也支持Birt 和  JasperReport 的集成 ,还有专门的文档)
OLAP  : Mondrian 和  JPivot  (Mondrian已经加入了pentaho)
Platform : Pentaho Planform
Data Mining: Weka  (Weka也加入了pentaho)

官方站点

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327051245&siteId=291194637