Snoop data mining from the price forecast Case

13748829-b7fef7a364f3b77f

Data mining from large amounts of data to find useful information to aid decisions based on that information.

Data mining reveals the unknown, the future of data relationships, the main role is to predict, using computer technology, statistics, model algorithm.

Model algorithm classification algorithms, regression algorithms, clustering algorithms, each algorithm type also contains a number of different algorithms, such as classification algorithm contains logistic regression, naive Bayes, decision tree, using the Java programming language has language, Python, is not that we heard very professional, very complicated? Today we recommend an easy to use tool - Smartbi Mining , by Smartbi launch of standalone product designed to provide predictive analysis is personal, team, decisions made by companies.

Smartbi Mining has a process-oriented, visual modeling interface, built-in practical, classical statistical mining algorithms and deep learning algorithms, and supports Python extension algorithm, distributed cloud computing-based model can be sent to Smartbi unified platform, the BI platform perfect integration.

Simple drag drag can easily complete forecast, it is too convenient. The order "Boston prices forecast" data as an example to everyone with a peek at how data mining.

1. Smartbi Mining interface

Click on the top right of machine learning management interface "Creating a machine-learning projects" that can be tapped by learning the sample data source data flow and operations (create a file directory setting name).

The far left is a tree of nodes: Contains all nodes have developed good.

The main realization intermediate region, to drag over between the nodes.

Right side is a parameter configuration node configuration and properties.

13748829-e24f24e8a30dbc98

2. Data mining process

A set of standard data mining process, various scientific processing and forecast data, the data itself to discover the hidden laws. Specific process is as follows:

The first step: business understanding . Clear objectives, a clear analysis of needs.

Step two: data preparation . Collect raw data, test data quality, data integration, data formatting.

The third step: modeling . Select modeling technique, tuning parameters, generate test plans, build the model.

Step Four: Evaluation Model . The model is a comprehensive assessment, evaluation results, review process.

3. Case presentation

Case Background: the price is a problem we have been very concerned about, whether real estate or ready to purchase consumer, reasonable assessment of the trend of prices, can benefit from it.

Data Preparation: house prices predicted major concern following information, sample data, we have been treated in advance.

13748829-be5d909bdd0d5533

Model: This case is forecasting objective is to predict future prices, while the price is a continuous value, all selected regression algorithm (linear regression) model for training.

1) Select the data source. This is an example of the data source used in the "Boston rate prediction."

13748829-0465c18f8686ddc5

2) Data Processing.

Data field name is not clear, it can be modified in the "metadata editing" node pretreatment. This involves much data may not be an alias sample data has initially been processed, there is also no longer operate.

13748829-9d99e447e9ea4255

3) algorithm model.

It should be noted that, at present algorithms process characteristics must be input. Wherein there are two ways, one kind of feature selection, one kind is a chi-square feature selection. Difference between the two is that if you already know what kind of effects the greatest impact, can directly choose; if you can not sure the column selection field, and then set the number of columns of possible impacts.

Drag the selected node effects.

13748829-ae3a614a50b79093

"Feature selection", the latitude and longitude fields removed, as well as the field (Rate) of the predicted output, all the fields are to participate in the relevant fields as predicted.

Node selection algorithms: regression algorithm - linear regression.

Algorithm node needs training and validation process, the data needs to be split, split into training and test sets. After the split of the linear regression algorithm can be trained, after a good training model algorithm needs to be verified.

Evaluation Model

Left prediction selection data node connected to a training model, right coupling the test data set. In prediction algorithms for evaluation by the evaluation node.

13748829-52832445a73aadfe

Look at the results of the evaluation node, mainly to see r2 values.

13748829-43c3c91fce30e4e4
13748829-b3a6f10b5705e042

Prices might predict a negative value, we need to deal with it, to use this derived column.

New derived column: NewPrice, add / edit the expression: case when prediction <0 then 0.5 else

case when prediction >6 then 6 else prediction end end

0 means less than unity to a fixed value 0.5, is too large (6) to a uniform fixed value.

13748829-39266d2feacf507c

You can analyze graphical display to view the effect of the statistics. (Statistical Analysis - high-dimensional data visualization)

13748829-07ffa804edd8a172

After performing, right view the analysis result, a pattern type (a parallel coordinate plot) the original and the new price forecast prices dragged to the X-axis region.

The following diagram can view the forecast range is not large deviation.

13748829-99e5eed8fa0fd661

These are the prices predicted by the process of data mining. Visual interface , each step of the process of data mining and parameter function by dragging point (attributes) is configured to achieve. Function can operate on the ground tall gas.

Reproduced in: https: //www.jianshu.com/p/415899597f3e

Guess you like

Origin blog.csdn.net/weixin_34082789/article/details/91271099