(CRISP-DM) scientific exploration data flow

CRISP-DM (Cross Industry Process for Data Mining) model KDD project provides a complete description of the process is. The model will be a KDD project is divided into six different, but the order is not entirely the same stage.

  1. Business understanding (Business Understanding)

    In this first stage we have to understand from a business perspective what the requirements of the project and the ultimate goal is, and will combine these objectives with the definitions and data mining results.

  2. Data appreciated (Data Understanding)

    The main work includes: determining business goals, an important factor affecting the results of discovery, drawing from a business perspective the primary target customers, assess the situation, find all the resources, limitations envisaged and taken into account when analyzing the data to determine the goals and projects of various programs other factors, including risks and accidents, related terms, costs and benefits, etc., to determine the next target data mining project plan.

  3. Data appreciated (Data Understanding)

    Data understanding phase starts with the collection of data. The next step is familiar with the work data, such as specific: the amount of the detected data, preliminary understanding of the data, probe data interesting subset of data, and thus the formation of the assumptions underlying information. Collect raw data, loading the data, drawing data, and to explore the characteristics of data, simple statistical characteristics, quality inspection data, including data integrity and correctness, padding and other missing values.

  4. Data Preparation (Data Preparation)

    Data preparation phase covers the construction of the rough data from the original data set in the final (as the analysis object modeling tools) all the work. Data preparation work may be performed several times, but its implementation is not prescribed good order. The main task of this phase include: watch, record, select, and convert data variables, as well as data cleaning, etc. In order to meet the modeling tools carried out.

  5. Modeling (Modeling)

    The correlation with the target mining, data quality and technical limitations, as the selected data analysis used to clean up the data and further convert, derived variable configuration, data integration, and in accordance with the requirements of the tool, formatted data.

    At this stage, a variety of modeling methods are to be selected and used by the construction, evaluation model which parameters are calibrated to the ideal value. More typical it is, for the same type of data mining problem, various methods can be selected to use. If there are multiple technologies to be used, then this task, for each technology to be used to be treated separately. Some modeling approaches have specific requirements for the form of data, therefore, at this stage, back to the data preparation phase to perform certain tasks sometimes very necessary.

  6. Assessment (Evaluation)

    From the perspective of data analysis to consider, at this stage, we have established one or more high-quality models. But prior to final deployment model, a more thorough assessment model, review each step in the process of constructing the model performed, it is very important, so you can ensure that the model has reached the target company. A critical evaluation is to see whether there are still some important business issues have not been adequately attention and consideration. At the end of this stage, the use of relevant data mining results should reach a unanimous decision.

  7. Deployment (Deployment)

    Deployment, soon discovered their organizational process and the results become readable text. The ultimate goal is not to create a model of the project. Although modeling is to add more information about the data, but the information is still needed in a way that customers can use to be organized and presented. This often comes to a certain organization in dealing with the decision-making process, such as the decision to repeat the score on these web pages in real-time personnel or marketing database, with a "live" model.

    Depending on demand, the deployment phase can be just as simple as writing a report, it can be as complex as repeatable data mining program in the enterprise. In many cases, customers are often not the data analyst to perform the deployment phase. However, while data analysts need to deal with the deployment phase of the work, for customers, advance knowledge of the activities need to be performed to correctly use the built model is very important.

Guess you like

Origin www.cnblogs.com/JasonBUPT/p/11610469.html