Big Data Era: How Far Can Traditional BI Go?

<div class="iteye-blog-content-contain" style="font-size: 14px;">I
have been engaged in BI for many years, and have experienced the construction and development of the business analysis system. I am also fortunate to be in big data and traditional BI. On the occasion of the replacement of systems, so let’s talk about it in particular, how far can traditional BI go?

<img src="http://p1.pstatp.com/large/e4900012f674b176306" alt="How far can traditional BI go?">

Technology serves business, so I don't talk about technology here, but from the perspective of users To explain the reasons, I have considered eight aspects, each of which is personally experienced by the author. Of course, no exhaustive method can prove absolutely correct, but I hope it can arouse thinking.

1. Resource application - from month to day, the same day whispers.

Since the enterprise has three resource pools of big data MPP, HADOOP, and stream processing, tenants are basically what they see is what they get. The company even provided a resource package for the convenience of application. We applied for a resource call package. This resource application model provides a basic guarantee for flexible and open data to the outside world. Within half a year, internal and external tenants have opened more than 100 (previously. It may be called a data mart), in retrospect, without this capability, the company's external realization is basically impossible.

Whether it is Alibaba Cloud or AWS, it is the same routine, but why should companies do it themselves, because larger companies are a huge market within themselves and have various application requirements, from data, security, interface, technology and other aspects In other words, they are not suitable for external platforms.

In the minicomputer stage of traditional BI, there is no concept of resource pools. Resource declarations are calculated based on the number of hardware units. You need to apply for a budget in advance. Even if the hardware is in place, the integration time is too long. The Taiwan 570 was divided into 12 partitions, which took more than a month, and the efficiency was not the same.

Big Data The system has completely exploded traditional BI in various aspects such as resource granularity, application speed, and dynamic expansion of resources. It has incomparable advantages in rapid business deployment, laying a good foundation for business innovation. If you have done DB2 project integration, and every time involves planning, partitioning, partitioning, installation, etc., you will know what waiting is.

2. Data collection - diversity can create more

application

scenarios They are all exported from the source database into text, and then imported into the destination database through the client tool. EXPORT is used for export, FTP is used for transmission, and IMPORT is used for import. Of course, the same type of database may use DBLINK and other shortcuts, which are used in the program. ODBC or something to connect to the database to operate. Many companies have specially developed some tools for mutual data transfer between multiple databases. Of course, the general enterprise-level platform does not use it, and the scalability and flexibility are too poor. The traditional ETL technology is very suitable for the static application requirements with the analysis period of days or months.

I think most enterprises, the data analysis cycle of BI is basically still days. The author has been doing BI for 10 years. I remember that the enterprise has ETL data for a long time. Of course, from a business point of view, it is enough to use it. However, some people will ask, how practical is it to reduce the data cycle to hours, minutes, seconds and even real-time? But is there really no need for shorter cycle analysis in business? Is it because of everyone's BI analysis routine or ability? Is it not enough?

From the point of view of fetching data, business personnel always hope that you can obtain data as quickly and in time as possible. We used to only publish monthly reports. Later, the performance improved, and complex daily reports could also be published, and daily reports became standard , after the daily newspaper, should real-time become the standard in the future?

From the application point of view, in addition to a bunch of operational indicator reports, companies generally have realistic needs for data from both marketing and risk control perspectives. Real-time marketing is obviously better than static marketing. BAT basically cannot survive without real-time marketing. , Real-time risk control is obviously better than offline risk control. For example, the anti-fraud system, if it is not real-time monitoring, how to intervene in the deception?

From the perspective of trends, if you agree that the future world is to meet the needs of personalization In this world, only real-time data can contain more information and provide you with more personalized services. You will think that there are too many scenarios that require real-time collection.

Even if you do not have any of the requirements mentioned above, technology and business are always interactive. If you have the ability to provide hourly services, others will create hourly business scenarios. If you have the ability to provide real-time services, others will create real-time services. business scenario. It's not clear who is the egg and who is the chicken, but if you want to serve better, you should be more forward-looking at the technical level.

But can traditional BI support it? The BI of traditional enterprises is not real-time. It is not that there is no demand. Maybe it is caused by insufficient capacity. I remember that it was very difficult to monitor real-time account allocation indicators when CRM was launched. Monthly reports, now, without daily newspapers, can I still live? I remember that many years ago, the first daily report was submitted by the IT staff, because the ability was up. What about the next 10 years?

ETL is a concept in traditional data warehouses. I think it is time to upgrade. Diversified collection methods are king. This is the general trend. There are three things that are most important. One is the collection method. Messages, data streams, crawlers, files, and log increments can all be supported. Second, the flow of data is not one-way, not only E, but also X, that is, exchange, which greatly derives the connotation of ETL. The distributed data collection can be expanded in parallel and dynamically, and the problem of reading and writing ETL can be better solved. These are exactly what traditional BI can't do.

3. Computing performance - cost performance is king, and the speed of change is faster than expected

<img src="http://p3.pstatp.com/large/e49000130d1245bcbf9" alt="How far can traditional BI go?"

DB2 and Teradata have always occupied a huge share in the field of data warehouse. It took us half a year to replace two P780s with GBASE+HADOOP. The overall performance can be said to be 1.5 times that of the original, but the investment is only a fraction of the cost. Although In the early stage, some tuning is involved, and there are higher requirements for the code, but the cost performance is very high. The key is to be able to expand dynamically with multiple tenants, and the disaster recovery capability is also superior to DB2. I remember in the past that once DB2 had a problem with a node, although it could be switched, the performance often dropped by half, which greatly affected the business.

Traditional data warehouses often treat different data processing methods equally, but in fact, different data processing stages have structurally different requirements for data processing. Some simple transformations and summaries are processed outside the library. It is cost-effective to process, but traditional BI is accustomed to importing all data into the data warehouse, which wastes precious minicomputer system resources and has low cost performance. Therefore, the current MPP+HADOOP mashup data warehouse is becoming a trend. HADOOP is good at massive and simple batch processing, and MPP is good at data correlation analysis. For example, eBAY, China Mobile, etc. have adopted similar solutions.

From a comprehensive point of view, DB2 and other data warehouses certainly have their advantages, such as the stability they are proud of, but these technologies are too dependent on foreign countries, and it feels that the operation and maintenance capabilities are getting worse, and the solution of key problems is becoming more and more difficult. The word stability is also To put a big question mark, I don't know how other companies feel. Believe that the author is not playing domestic GBASE advertisements, there are many pits, but it is worth having. 4. Reporting system - aesthetic fatigue

is inevitable, personalization is the trend The report system, such as BRIO, BO, BIEE, etc., all provide a better visual interface, and it is also good for the presentation of lightweight data, but I think this is not attractive for large enterprises.





First, the replaceability is too strong. Now there are too many open source components, and the functions are similar. Why use standardized and bundled things? For companies with certain development capabilities, it seems unnecessary.

Second, the open source is too poor. Enterprises have a lot of personalized requirements, such as security control, etc., but these products have poor openness and often fail to meet the requirements.

The third is inflexibility, no matter how common, can you do EXCEL? Don't expect to directly extract a report from a report system and paste it into a report, it always requires secondary processing. In this case, it is better to directly pour data into it. EXCEL is simple.

Fourth, the speed is too slow. The current report is no longer a report in the sense of traditional BI. Because the dimensions and granularity requirements are very fine, there are not a few records that exceed 100 million. For example, our indicator database records 10 billion records a year. Traditional BI records The report cannot be supported at all, the appearance is temporary, and the business personnel are always most concerned about the speed of the report.

Of course, it may still be attractive to small businesses, but in this open era, demand and new technologies are emerging one after another. Can such standardized products catch up with the changes? What if you want to combine HBASE with BIEE? Waiting for manufacturers Slowly launch the version, or just do it yourself?

5. Multidimensional analysis - poor adaptability, customization is the direction.

I have used some commercial multidimensional analysis systems, also called OLAP, such as IBM's ESSBASE. OLAP is a concept proposed by foreigners decades ago. It can quickly obtain the required results through analysis of various dimensions, but how much practical value does this OLAP have?

OLAP products always want to solve a professional analysis problem through generalized means, from There are flaws since birth, because analysis is fickle, do you want to use SQL to roam the rivers and lakes in the background or perform fixed and complex multi-dimensional operations in the face of a rigid interface? As a technical person, I don’t like to use it, but a business person I don't like to use it either, the operating threshold is too high.

In terms of openness, the back-end engine of traditional OLAP is still a traditional database, which obviously does not support some massive big data systems; playing CUBE is a design work, which is very time-consuming. Every time you update data, you have to re-play CUBE, which always makes the author crazy. , I don’t know what improvements have been made now; the amount of data of tens of millions and 10 dimensions is estimated to be its performance limit; finally, can the CUBE played before really solve your current analysis problems?

Taobao’s data cube explains to a certain extent The development direction of OLAP is to provide specific multi-dimensional data solutions for specific business problems. What we need to provide users is a professional system that is OK in terms of experience, performance, and speed.

Business-oriented + customized back-end data solutions (such as various big data components) are the direction of OLAP in the future.

6. Mining platform - from the sample to the full volume, it is necessary to upgrade the equipment in an all- round

way

It is a powerful tool for traditional data mining, but most of the time, they can only perform sampling analysis on the PC. Obviously, the full analysis of big data is beyond its ability, such as social networks, time series and so on.

Traditional data mining platforms don’t seem to be able to do anything. IBM DB2 used to have a DATA MINER, but later gave up. Teradata can, and has its own algorithm library, but its computing power is obviously not enough in the face of massive data. After a grade difference, most of the partners we contacted have begun to use SPARK as a standard suite of massively parallel algorithms.

Even with traditional algorithms such as logistic regression and decision trees, SPARK can obviously be trained based on more sample data or even full data, which is much better than SPSS and SAS, which can only be fiddled with on a PC.

Traditional BI's SAS and SPSS are still valid, but full-scale algorithms based on big data platforms should also be included in BI's vision.

7. Data management - not keeping pace with the times is a dead end

It is difficult to build a data management system, because the production system will not die without you, and it is difficult to evaluate the value with you, and the cost of operation and maintenance is too high.

The earliest contact with the metadata management system was in 2006-2007. At that time, it was quite forward-looking to engage in metadata. After many years, I understood a truth. If you regard metadata as a plug-in, this metadata system does not The possibility of success is that the seemingly possible method of post-recording, no matter how perfect the system is and how powerful the system analysis ability is, will eventually lead to the phenomenon of two skins between the source system and the metadata, and lose its due value.

As long as this problem is not addressed, I seriously doubt that traditional BI metadata management will really succeed. In the era of big data, with the continuous enrichment of data volume, data types, and technical components, it is even more impossible to engage in post-event metadata.

What does a data management system in the new era look like? As soon as it advocates production is management, that is to say, the rules of metadata management are fixed in the system production process in a systematic way. We advocate undocumented data development, because documents are Metadata, all requirements for metadata have been codified into rules and become part of the data development environment. For example, if you build a table, when you create a visual development interface, the definition of the table has forced the online input of necessary instructions, and the code you write is also regularized so that the metadata can be automatically parsed and become part of data quality monitoring.

The second is to be able to evaluate the benefits of data. Through the first method, data can be associated with applications, and the value of applications can be transferred to the value of data, providing standards for data value management. The most depressing thing about data is that I created a model, But I don't know the value of this model, my work becomes dispensable, and I don't know how to optimize it. Where are the hundreds of thousands of tables rotten, and I dare not clean them up.

The third is cross-platform management. With so many technical components, such as HADOOP, MPP, stream processing, etc., your management system must be able to seamlessly connect and transparently access, and each new type of component must be able to access the management system in time , otherwise, if one is connected, the data on this component will become free data, and data management will be impossible.

Data management is the most afraid of half-pull projects. To be systematized, it must be done thoroughly. Otherwise, it is better to document it, and there is not much difference.

8. Review positioning - BI does BI things, each performs its own duties,

traditional BI, too many reports, too few research platforms and algorithms, too much repetitive work, and too little creative work. With the development of the business, The people in BI are getting old, but there are not many things left in the system, which is very regrettable.

With the advent of the era of big data, this situation needs to be changed. It is time to re-examine our own positioning. Report fetching is indeed the basic work of BI, but people who are engaged in BI should not always play the role of a donkey. It's the one who is at the helm at the end, I can pull for a while, but I need to work out how to pull faster, and finally let the machine pull for me, or make the grinding work so enjoyable that the people who need it can pull it themselves.

BI people have too many things to innovate and learn. If there are too many fetching numbers, build a fetching robot. If there are too many reports, build an indicator system. If there are too many demands, build a self-service tool or a tenant environment. , to tempt the business people to do it themselves, the demand is endless, the desire is never satisfied, and the hole is filled by human flesh, which can never be filled, and needs the guidance of BI people.

How far can traditional BI go? I mentioned eight points. People at different stages may have different understandings. Of course, it is only a family's words, and I hope to enlighten them.

For more big data and analysis related industry information, solutions, cases, tutorials, etc., please click to view >>>

</div>

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326474229&siteId=291194637