Enterprise data applications of traditional business intelligence contrast big data applications

In recent years, as more and more application of large data systems, enterprise digital management solutions also layer out not cope, from the traditional BI model, the latest big data solutions, which in the end what similarities and differences between the various programs What is the relationship in the end? How to optimize business operations through the implementation of specific programs, development costs, the following is my discussion of online business intelligence and big data collected by some. I believe this article will provide some thoughts on making enterprise systems,

Traditional business intelligence model

Business Intelligence (also known as business intelligence, BusinessIntelligence, BI) concept was first proposed by Gartner Group (Gartner Group) of Howard Dresner put forward in 1996. It was defined as: "Business Intelligence describes a series of concepts and methods to aid business decisions based on facts support system through the application."

With the development of intelligence in the field of business, the concept continues to be substantial. As TomSoukup and Ian Davidson in "Visual Data Mining: Techniques and Tools for Data Visualization and Mining" in his book: "Business Intelligence solutions will transform business data into clear, fact-based information that can be executed, and make business customers will be able to discover trends, create customer loyalty, and enhance relationships with suppliers, reduce financial risk, as well as reveal new sales opportunities. "

Business Intelligence meaning contains information systems, data analysis, knowledge discovery and a variety of content strategy at all levels of the enterprise, the popular supply chain management (SCM), customer relationship management (CRM), enterprise resource planning (ERP) concepts are It can be considered as part of business intelligence.

Part of business intelligence systems

Is generally believed, DW, OLAP, DM is an integral part of all business intelligence systems are available in:

Data warehouse (DataWarehouse, DW) is a collection of valuable data to all types of businesses. BI enterprise system taken from a variety of platforms and cleaning processes and useful data, then decimated, transform, load (i.e. ETL) process, the data stored in the data warehouse, to obtain a global view of enterprise data. Since the data in the data warehouse is usually a variety of detailed data, the lack of aggregation and hierarchical relationship, and therefore rarely used directly for analysis and decision making.

Online analytical processing (On-LineAnalytical Processing, OLAP) for processing online data access and analysis needs. BI systems need to provide decision makers with efficient and intuitive data query and display, easier to assist decision-makers, so OLAP concept arises, it will be original, difficult to use the data into can be understood, multi-dimensional information, and provide drill, slicing, dicing and other operations of the multi-dimensional information to meet user data on the various dimensions of inquiry needs.

Data mining (DataMining, DM) refers to technology to find hidden information by an algorithm from the mass of data. Typically contain associated analysis, cluster analysis, anomaly analysis. The value of data mining is that it can take advantage of enterprise data inductive reasoning, and tap the potential of the model to help decision makers make decisions and adjust strategy. Data Mining BI system there is also different from the main difference between the traditional reporting system.

In conventional techniques, the three major components of the BI system can be implemented using a relational database (RDBMS), manufacturer of many relational databases, such as Oracle, IBM, Microsoft, but also business intelligence solutions provider, the combination of both visible close. In recent years, with the advent of the era of big data, non-relational databases (NoSQL) advantage beginning to show. Many IT companies, especially in the Internet industry, has entered the age of both SQL and NoSQL, non-relational databases such as HBase for cleaning and processing vast amounts of data, relational databases such as Oracle for user-oriented and multi-dimensional query show. Our data analysis platform also uses this technology model. But no matter what kind of technology, business intelligence using three main components corresponds to the following three main functions.

The main function of business intelligence systems

Data management functions: acquiring data from a plurality of sources of data, processing data in multiple formats, the ability to store vast amounts of data. This auxiliary function, the system includes a number of BI metadata management module, the data description data be managed. With the lifting of the order of business, the caliber of increasingly complex data, the near future we will need to improve data management capabilities, and metadata management is the best solution.

Data analysis: The system includes a conventional BI timely query, report generation, data visualization, data analysis. The Big Data era of significance is that the gap data is gradually eliminated, not only corporate decision-makers can be more convenient and flexible operational data, ordinary users demand access to data, the enterprise to meet the needs of users in this regard, allowing users to analyze their own management own, bring tremendous value to both parties will be. Sina microblogging data analysis plug-ins, Taobao data cube are all positive success stories. Unfortunately, there are still some antique business buck the trend, trying to heightening data barriers, allowing users to check their consumption is unclear, ask the unknown, which will only accelerate the loss of users. I suggest these enterprises to change old ideas as soon as possible, to create a transparent and open data environment, only embrace change, they will not suffer change.

Knowledge discovery: the implicit data, potentially useful and interesting people and form part of curing down of functional knowledge. Extract knowledge usually presents concepts, rules, laws, patterns and so on. I believe that knowledge discovery mainly to solve the problem of who, where, what, that is who the customer is, where customers, what customers want. In the big data environment, one can even find some incredible sales model, such as Wal-Mart "beer and diaper" a classic case. For our company, this capability urgent need to strengthen. Company leaders repeatedly referred to "books and users do not understand," is a clear signal of lack of knowledge discovery capabilities.

Big Data revolution

1, using a variety of data types integrated decision-making. At retail, for example, the traditional sales model in the line, save the enterprise information system data is usually only order data, companies are only concerned about the status of orders and financial statements thus generated. Personal characteristics of the customer's inquiry process, logistics and so forth have been discarded. The online sales model, the order is only a small part of the data, the enterprise is more valuable but the user browsing, search, compare, collections, inquiry, logistics, evaluation data for these abandoned traditional industries, and even a lot of electricity other web access user's website pains crawling, location, contacts, and other data. Regardless of whether the data was collected legally, at least online sales will give customers more accurate and recommended a more personalized experience, we can say electricity suppliers have been relying on sales data of the line leather life.

2, no longer explore cause and effect, and to explore the association. Traditional industries like causal theory to guide management, such as "buy basketball - Recommended basketball shoes," "because the off-season - so the promotion" and so on. Similar programs have understood the need for the industry itself, but will make frequent use of business models become more similar. In the big data environment, we need to explore is an association rather than causation. Such as Wal-Mart's "beer and diaper" story, is Wal-Mart's data analysts found a strong correlation between the two proposed sales program. Data analyst without the need to explore the deeper reason is the wife let her husband with diapers, or the husband let his wife with beer, it simply does not matter. In another example, Google's data modeling by scientists search terms to predict what areas will the outbreak of the flu, so the US public health service has made tremendous contributions. These data scientists do not even know why the influenza virus was, but this did not affect them discover the great value from the associated data.

3, gold and abnormal data from the dirty data. Traditional data warehouse construction process, the abnormal data, dirty data need to be removed in the ETL process, otherwise it will cause data storage problems and other failures. However, in the big data environment, abnormal data but may have its value. The author in the previous work, found that daily 8:00 and 20:00 have a large number of clients accessing the wrong one, these records are ETL cleaned. Further studies showed that these were calling the wrong one leads to the same interface, and then further verification service code and found Android client, 8:00 every day will send the design to track silence client users and 20 points two periods client to the server handshake message, when the amount is increased to maintain the client, the server handshake message so overwhelmed, eventually produce the wrong one. After the client code designed to adjust the dispersion to perform handshake day, reducing server load, avoiding false pressure according to the "peak" of the server expansion. As another example, a US credit agencies found that 10% of "dead" customers are still normal to repay the loan, rather than keep these anomalies do cancel the account data processing, will bring additional profits.

Of course, the big data revolution brought about far more than the above three points, it brings both opportunities and challenges. How big ideas and traditional BI data are combined to produce a new function point is a question we urgently need to think about.

BI (Business Intelligence)

Business Intelligence (Business Intelligence, referred to as: BI), also known as business intelligence or BI, refers to the analysis and processing techniques with modern data warehouse technology, online, data mining and data presentation techniques for data analysis to deliver business value.

Business Intelligence as a tool, is used to handle existing enterprise data, and convert it into knowledge, analysis and conclusions, auxiliary or business decision-makers to make the right and informed decisions. It is to help businesses make better use of data to improve the quality of decision-making techniques, including data from the warehouse to the analytical systems.

Business Intelligence

Business intelligence is usually understood as existing enterprise data into knowledge to help companies make informed business tool for decision-making. Here are talking about data, including orders, inventory, transaction accounts, customers and suppliers and other companies in which data from the industry and competitors from the enterprise business system and a variety of other data from the external environment in which the business. The Business Intelligence can assist business decisions, either operating layer, it can be tactical and strategic decision-making layer. In order to transform data into knowledge, it requires the use of data warehousing, online analytical processing (OLAP) tools and data mining techniques. Therefore, from a technical perspective, business intelligence is not a new technology, it is only the integrated use of data warehousing, OLAP and data mining technology.

It is believed that business intelligence is to collect information on business, management and analysis process is intended to enable all levels of business decision-makers to gain knowledge or insight (insight), prompting them to make the enterprise more profitable decisions. General business intelligence, data warehousing, online analytical processing, data mining, data backup and recovery and other parts. Business Intelligence implementation related to software, hardware, consulting services and applications, and its basic architecture including data warehousing, online analytical processing and data mining three parts.

Therefore, the business intelligence solutions should be seen as a more appropriate. The key is to extract business intelligence from data from many different enterprise operational system in a clean and useful data to ensure accuracy of the data, then decimated (Extraction), conversion (Transformation) and load (Load), that is, ETL process, merge into an enterprise data warehouse to give a global view of enterprise data, on this basis, using the appropriate query and analysis tools, data mining tools (big data mirror), OLAP tools be analysis and processing (this time decision support information becomes knowledge), and finally presented to knowledge management to support decision-making process managers.

Provide business intelligence solutions for leading IT companies including Microsoft, IBM, Oracle, SAP, Informatica , Microstrategy, SAS, Royalsoft and so on.

Mainstream business intelligence tools include Style Intelligence (BI Sida)
, FineBI business intelligence software, BO, COGNOS, BRIO. Some domestic software tools such as KCOM platform also integrates essential business intelligence tools.

OLTP (online transaction processing)

On-Line Transaction Processing online transaction processing (OLTP), also known as transaction-oriented processing (or online transaction processing), which is substantially characterized in reception received user data may be transferred immediately to a computing center for processing, and in a very short the results are given in the processing time, a user operation is one of the fast response mode.

Online transaction processing system is a kind of transactional metadata as a data processing unit, computer application system of human-computer interaction. It instantly updates or other operations on the data, the data within the system is always kept up to date. A sequence of operations a user can set data consistency element designated as a transaction, a terminal, a personal computer or other input device membered transaction, the process returns to the system after a result, tickets used in aircraft, bank teller, stock trading, supermarkets sales, hotel management before and after. [1]
maximum advantage that can be instantaneously input data processed timely answer. Also known as real-time system (Real time System). An important measure of online transaction processing result of performance of the system, embodied as a real-time request - response time (Response Time), i.e. after the user input data at the terminal, the time required to reply to this request given by the computer. OLTP is from the front, applications, databases together to complete, depending on the degree of processing speed and processing database engine, server, application engine.
OLTP database is intended to make the transaction application writes only the data required in order to process a single transaction as quickly as possible.

Online Transaction Processing

OLAP (online analytical processing)

Online analytical processing OLAP is a software technology that enables analysts to quickly, consistent, interactive observation information from all aspects, in order to achieve in-depth understanding of the data. It has FASMI (Fast Analysis of Shared Multidimensional Information), i.e., shared characteristics of rapid analysis of multidimensional information. Wherein F is a fast (Fast), means that the system can react to the user's requirements most analysis in seconds; A is analyzable (the Analysis), means that the user can define a new program without a special calculation, which is as an analysis part, and are reported in the manner desired by the user; M is multidimensional (Multi-dimensional), and analysis means for providing a multidimensional view of data analysis; information of the I (information), means timely access to information, and management of large-capacity information.

Here Insert Picture Description

The difference between the operating system and database data warehouse system

The main task of the operating system, the database is to perform online transaction and query processing. This system is called online transaction processing (OLTP) systems. They cover most of the daily operations of the unit, such as shopping, inventory, payroll, etc., also known as business systems. On the other hand, data warehouse and data analysis system in decision-making to provide users with services, this system is called online analytical processing (OLAP) system.

The main difference between OLTP and OLAP following areas.

And user-oriented systems: OLTP is customer-facing, used for transaction and query processing clerks, customer and IT professionals. OLAP is a market-oriented, for knowledge workers (including managers, executives and analysts) data analysis.
Data Content: OLTP systems management current data. Typically, this data is too trivial, it is difficult for decision-making. OLAP systems manage large amounts of historical data, provide summary and aggregation mechanisms, and stores and manages information on different level of granularity. These features make it easier for the data informed decisions.
View: OLTP systems focused on current data or a business within the department, and not to historical data or different data units. In contrast, due to the evolution of the unit, OLAP systems often span multiple versions of the database schema. OLAP systems also deal with information from different units, as well as information integration of multiple databases. Due to the huge amount of data, OLAP systems also typically stored on multiple storage media.
Access Mode: OLTP systems consist mainly of short atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, access to OLAP systems are mostly read-only operations (due to the large departmental data warehouse to store historical data, instead of the latest data), although many of these operations may be complex queries.
Other differences include OLTP and OLAP database size, performance metrics and the frequency of operation and the like.

Why do we need a separate data warehouse

Since the operation of the database to store large amounts of data, the reader may wonder, why not for online analytical processing (OLAP) directly on this database, but additional time and resources to construct a separate data warehouse? . The main reason is to help improve the performance of separation of the two systems. Database operational tasks and is known as the design load, such as primary key index used to retrieve specific records, a customized query optimization. On the other hand, data warehouse queries are often complex, involving large amounts of data in the summary level calculation may require special organization based on multidimensional views of data, access and implementation methods. OLAP query processing operations on the database, it may greatly reduce the performance of operational tasks.

As an example to describe the difference between the correlation and OLTP and OLAP.

Here Insert Picture Description
We can imagine this scene:
Zhang by operating an ATM deposit 2,000 dollars,
Zhang first of his bank card into the ATM machine, enter the password and wait for password authentication,
select the save operation, will bring their own 2000 cash into the ATM, ATM machines banknote verification,
this time during a waiting Zhang information back to the girlfriend's
identification is completed, confirm the deposit
to deposit 2000 dollars specified network transmission to the bank database, this time in Zhang wait for ATM processing feedback,
after the central bank's database has been processed, the processing of the results returned to Sally's ATM machine, deposit increased to 5000 from the original 7000,
Zhang, take the card, this deposit things done.

Another scenario:

End, the bank look at the various outlets this month's usage of ATM, bank leadership arrangements to do this little red thing.
Red open bank accounts system, log in to your account,
open the monthly deposits and withdrawals in statement,
because more data, often need to wait half an hour, red click on the "Export Data"
and then got up to go to lunch,
lunch back in statement has been derived, the little data processing, after viewing to the leadership,
the leadership see a network of cash taken out per day tend to be taken after completion, while area C outlets will always be a lot of rest every day , based on the condition leadership decided to increase ATM cash delivery area a, area C to reduce the cash delivery, increasing the balanced use of funds.

OLAP data from OLTP stage, OLAP help enterprises enhance their operational capacity, technical personnel to optimize the system by the OLAP data.

The difference between OLTP and OLAP generally have the following:

  1. OLTP often require faster response times than OLAP
  2. Related to the range of data, OLAP because it is helping corporate executives use to make some decisions, which often require comprehensive data analysis, and for users who are in a relationship only their own experience and data.
  3. Different parallelism: For OLTP in a specific transaction, you may also have tens of thousands of people executed, and often at the same time an instance of OLAP in execution.
  4. OLTP is always a time for a personal data changes, OLAP is a query and calculate the overall data.

ETL

The ETL, English abbreviation Extract-Transform-Load used to describe the data from the source terminal via extraction (Extract), conversion (Transform), load (load) to the destination process. The term more commonly used in data warehouse ETL, but the object is not limited to the data warehouse.

ETL is a data service system after extraction, loading it into the data warehouse after cleaning the conversion, the purpose is to disperse the enterprise, messy, integration standards are not unified data together, provide analytical basis for corporate decision-making, ETL is BI (Business intelligence) project important aspect.

MPP (massively parallel processing)

In the field of research in computer system architecture, massively parallel processing (MPP, Massively Parallel Pmcessing) refers to a large number of computer systems, with the configuration of a simple processing units (PE, ProcessingEIement), obtained by working in parallel between PE high system performance. MPP architecture employed, generally a large number of PE nodes, high-performance connection between a switched network PE, each PE has a local memory, between each PE communicate by message passing. A high degree of parallelism PE MPP system, reducing the shared storage system overhead caused by, for large-scale system expansion. on the other hand. Programming MPP system is more complicated, mainly in the mapping between the partition and the PE node computing task.

The relationship between BI and big data

Traditional BI technology Tags: ETL, data warehousing, OLAP, visual report.

Technical tag big data: Hadoop, MPP, HDFS, MapReduce, stream processing.

In the technical field, although some of the techniques of traditional BI ETL, data warehousing, OLAP, visual report seems to be lagging behind the edge, because it is difficult to solve the problem of huge amounts of data in the future, but can not totally negate or substitute into data. Some companies use SAP HANA, FineBI of direct big data engine optimization is based on the problem solution. The set of BI will persist, after all enterprise BI program is still very popular, popularization and application of big data is a long process.

VS Big Data Business Intelligence

Big Data is not empty talk mouth, its first priority is to solve business problems, to some extent, big data is to use a new data technology means to expand and optimize business, traditional companies need to gather a group of people to study this problem, we need someone special research and exploration. If outside, I think clearly new business models, if internal, want to know in what scenario, use the technique of big data to improve efficiency.

The current big data can generate local value, from the industry perspective, financial, banking, internet, medical, scientific research has broad prospects. From the perspective of the field of view, advertising, marketing, risk control, supply chain are big data play a valuable place for certain businesses, such as telecommunications operators, large data can also provide new methods in network optimization.

Not every business needs to build its own big data platform, enterprises need to take into account the level of information and the cost, do what it can own R & D, such as BAT; selection can purchase, such as the traditional big business; SMEs can also be rented, such as with Ali cloud and AWS.

In point of fact is concerned, BI applications are far greater than the large data applications, has its universal truth. Big data compared to traditional BI, not just a simple relationship PLUS, which involves thought, deep tools and personnel changes, BI personnel either do not mention big data, they scoff that it is new packaging vest, in fact, it so happened; nor do they need to sell ourselves short, that engage in large data on it so tall, it is indeed a heritage of most BI thought.

Business Intelligence and Big Data binding

Business Intelligence suitable as a system built on top of a big data applications.
Big Data platform for data collection through various cleaning, converted into structured data into the data warehouse, BI systems available to use.
Meanwhile, other large data services platform modules on Spark, Flink other computing framework.

1, rapid analysis. Faced with a growing number of surge in the amount of data and ad hoc queries analysts demand, BI requires quick analysis features. There are two means to support this feature. One dimension of redundancy, i.e. to make different levels of statistical summary level data, among the allowed level crossing, such data can PV cities summary, according to user type city + summary, according to the user type + + city service lines summary three dimensional data is redundant, i.e., it is a technical space for time. The disadvantage is the need to add a dimension to add a table, when the amount of data Shihai do sub-libraries, plus hardware. Second memory computing, some data may be frequently queried in memory, while secondary memory file system to add Storm mode can support even millisecond-second query. The disadvantage is that this technique can only support a small amount of data.

2, layered calculated. By the amount of data and data latency requirements are different, we can compute capabilities are organized into three layers, in order to achieve different techniques. Real-time maximum, minimum amount of data is calculated using the flow layer, as Storm (https://storm.incubator.apache.org/), which may trigger calculated at each arrival of data representative of a technique for real-time scalar Summary such as real-time sales of goods. Higher real-time, using a moderate amount of data of the block layer calculate, by conventional completion Oracle, Oracle in most applications to meet the needs of daily OLAP report. The lowest real time, the maximum amount of data of the case using the bulk layer was calculated, representing the Hadoop technology, such as the daily underlying data processing, long-period cumulative data and the like.

3, open service. Similar SaaS (Software as a Service) concept, data processing and data analysis package for the service, which allows data scientists with some experience of direct calls. Service-oriented architecture also helps to decouple the front and back, when the current station needs new indicators or show, backstage just to make a few changes to the interface, or completely without changes.

Published 58 original articles · won praise 11 · views 20000 +

Guess you like

Origin blog.csdn.net/wangxudongx/article/details/104942293