The Methodology Road to Jane's data system

Road to Jane's data system construction methodology: two steps let you create a central pillar of the data of the operation!

Many companies have realized that a systematic data system will be a central pillar of data operations. So, companies that clearly how to build your own data system do? The author tells the reader summary based on years of experience with simple language a Road to SR methodology.

This article is the second in the "Data Operation Methodology" series of articles. The first chapter , "Road to Jane data analysis methodology," after talking about the "do not know how to analyze" problem, this article talking about the "do not know what this analysis," the problem. The first article more microscopic, standing personal analyst point of view, this broader, standing on the corporate level to explain.

And "I do not know how to analyze" the same, "I do not know what this analysis" is also one of many frequently asked questions. In fact, if you know the method, although it can not be done without overnight, but can also clear step by step how to create their own sound system data path.

Like the first article, it will clear the path to build a data system with the most simple plain language terms. Simply put, it is the first to tease out the Data System, and then fall to its BI (business intelligence, business intelligence, in fact, called for more flavor) system.

First, from top to bottom combing Data System

1. Targeting

 

This is the first question to ask yourself. Make great efforts to do data analysis, and ultimately for what? If this did not make it clear that the data system is certainly not start.

It is to improve the user activity, increase user, increase sales, or else target? When I think like I wanted. We want no problem, but will make the border work sprawling, leading to what can not advance. Therefore, it should start from that goal are most concerned / KPI.

So, what is the goal we most need to care about it?

For different areas, different stages of the company and the different roles of users, the answer to this question is different: for many company bosses, profit is the goal they are most concerned about; for the company's non-selling products / services or government and words, perhaps customer satisfaction is the goal of most concern; target platform for trading electricity supplier companies or early stage companies, profit is not the point, trading volume is most concerned about.

Target to get most concerned about, here is not to solve the problem all want it? Not the case. Big Data is a misunderstanding caused by the largest amount of data and the number of fields as possible. However, when really solve specific business problems, we must be the complete works of big data cut out a relevant subset use.

For a single person, whether the owner or executive level, while focusing on the goal / KPI are not too much. While watching dozens of KPI, imagine also know that will be very faint, and time-consuming. However, there are indeed many businesses are very important KPI. How to do this? Can be broken down to the people that collaborate with different roles, each concerned about their goals, all roles are complete works together all targets / KPI's.

Assuming that most concerned about the boss's goal is profit, profit = revenue - cost, this goal can be broken down by the sales director to focus on income, Director of Operations to focus on costs. Of course, not to say that the boss can not see revenue, but the routine of targeted attention in a practical range.

2. Decomposition index

Target identified, the next step is to break down the relevant indicators.

Target, which indicators to monitor or analyze the need to reach a goal? Such as profits, revenues and costs related metric is, of course, too thick, what types of income, the cost of which of these categories, should be taken into account. For example, for retail sales, it can be decomposed into traffic, into the store, purchase rate, customer price and re-purchase rate.

So, there are many ways decomposition, MECE need to follow the principle of (totally exhaustive, independent of each other).

3. Refinement field

The formula for the index, which involves the field, which were in the table which libraries, the need for data cleansing, what other cleaning rules Yes.

For example, the purchase rate, by the formula "number of people buying / number of people into the store," counted out, the number of purchase and is "customer ID" to count calculated out, field indicators related to the correspondence to which the database which tables of field, need to sort out clearly, this part requires iT involvement and cooperation of the staff or database administrator.

4. Non-functional requirements

After the above step 3 is completed, we actually considered finished combing index system, you can drop it, but in order to let the data system will eventually form a more complete, friendly, available, also need some sort of non-functional requirements.

UI: What kind of show style preferences, look at this point does not matter, but in fact the user data system will deal with every day, beautiful, good system UI will let users experience more like it.

Page Flow: What are the relevant indicators to be placed on the same page report, how the hierarchical relationship between pages, how the user can jump between pages.

Permissions: Who can see what data range, which fields and metrics Who can see the need for a unified access control, data security to avoid problems.

ETL: synchronous data from a data source to how the system frequency, how the rules.

Integration: the need for integration with other systems in the interface, warning messages, etc. level.

Performance: invisible, but directly determine system availability. It takes a few minutes or even tens of minutes to see the results if the data is large, believe that this system no one is willing to use.

5. The system of embodiment

After completion of the above four, we have formed a "requirements document data operating system / plan", you can fall into the operating system data, and then, and then determine the amount of work and time schedule according to the report the number of pages, data preparation complexity and so on.

two. From the bottom floor to the BI system implemented

1. connection data

According to the requirements document / embodiment, a step by step system to build work. Some companies call this system big data platform, some companies called the BI system. Category Big Data platform will be broader, but for enterprise data Operation, BI must constitute the core.

So, whether it is developed or based on the same technology as Wing Hung third-party tools to quickly implement the first step to build the system are connected to each data source, and open up the path between the various data sources.

In business, data is often heterogeneous environment, data sources may include databases, Hadoop platform series, Excel files, log files, NoSQL databases, third-party interface is required for each data source has rapid docking friendly way.

In the end, we can see all the tables and fields required for each data source in the system.

2. Data processing

Data source in the data often is not normative or less the existence of such duplicate records, such as a null missing, such as a patently unreasonable outliers (for example, there are 2020 orders for execution), also there may be a case of multiple names of the same things that exist in the system.

If these data do some of the work process or so-called wash, it is the significant impact the accuracy of the analysis would be, so it is necessary to do pre-treatment. This process is often the most time-consuming, most boring, but it is also very important.

Authors caution: this part of the question in the next article entitled "Road to Jane data governance methodology of" the article further in-depth discussion.

3. Data Modeling

Data is handled well, the next step is to do data modeling.

Mention modeling, non-technical users to daunting, feel profound incomprehensible. In fact, built out of the mold is what is it? Simply put, the associate multiple tables together, it is a data model.

For example, companies do performance analysis, required length of service, education, number of items, Amount, project profitability and other indicators of employees, of which length of service, education project list, project profitability personal information table, the number of items, items in the amount of in the financial table, this three tables have a common field "employee number" field by the associate these three tables, this is a data model, a data model for performance analysis topics.

4. Make Data Report

Based on the data model built, we can start creating the data reported.

Data model and provides the basis for data field, in accordance with the demand they be combined in a formula, on display with the appropriate chart type, the relevant indicators to be placed on the same page report, configure the hierarchy and relationship between good jump page . The following is based on Wing Hung-stop technology platform for big data analysis produced Demo.

5. Non-functional requirements to achieve

After Step 4, our data system has basically taken shape, and the rest is to achieve each of the above non-functional needs. In this way, a comprehensive, friendly and available data on the operating system on the line.

The end of the line is not working, at all times, demand changes or new business, need to be able to quickly adjust iteration, data processing, modeling, production data reporting and other operations requiring highly instrumental to ensure flexible configurable. Third-party tools compare the advantages of self-development is also reflected on this point is particularly evident.

After all, the purpose of the data do either is to improve the management (the throttle), either business innovation (open source). A systematic data system will be a central pillar of data operations.

 

 

 

Guess you like

Origin www.cnblogs.com/zwt20120701/p/11408827.html