Classification of data mining and predictive Profile

Classification and prediction are two ways to use the data to predict, can be used to determine future results.

Classification is used for discrete data objects class prediction, the prediction is required attribute value is discrete, disordered.

Prediction is the predicted value for continuous data objects, the property value prediction is required is continuous, orderly.

For example, in banking, loan applicants based on the information to determine lenders are "safe" category or "risk" category, which is the data mining classification tasks. The analysis of the loan amount to the lender is the prediction task of data mining.

This section describes commonly used classification and prediction methods are introduced, some of which algorithm is only used to classify or predict, but some algorithm that can be used both to classify, but also to predict.

The basic concept of classification

Classification algorithm reflects the characteristics of knowledge of how to find the differences between the characteristics of the same type of knowledge common nature of things and different things. Category classification model is built by learning training supervised, and the use of model instances unknown grouped with. Categories output attribute is discrete, unordered.

Classification technology has applications in many fields. At present, a very important feature of marketing is to emphasize customer segments. Classification using data mining techniques, the customer can be divided into different categories.

For example, a customer can be classified by the classification structure model for risk assessment of bank loan; you can call the customer into the design of the frequent customer call center, customer occasional large number of calls, and stable customer calls, the other to help find a call center the characteristics between these different types of customers, such a classification model allows users to understand the distribution characteristics of the different categories of customer behavior.

Other literature search and classification applications as well as search engines automatic text classification technology in the field of intrusion detection and other security-based classification technology.

Classification is through the study of existing data sets (the training set) to obtain an objective function f (model) to the set of attributes for each X is mapped to the target attribute y (class) on (y must be discrete).

Classification process is a two-step process: The first step is to build stage model, otherwise known as the training phase, the second step is the evaluation phase.

1) training phase

Purpose of the training phase is to describe the concept of a class or set of data predefined classification model. The stage need to select a known data from the centralized data as part of the establishment of the model training set, while the remaining portion as a test set. Usually select data items from the known concentration of 2/3 data as the training set, the data item 1/3 as a test set.

Training data set consists of a set of data tuples assumed that each data element belonging to a group have been previously specified category. Training phase can be seen as the process of learning a mapping function for a given tuple x, you can predict its category labeled by the mapping function. The mapping function is through the training data set, the model (otherwise known classifiers) are obtained, as shown in FIG. The model can be expressed in the form of classification rules, decision trees, or mathematical formulas and so on.

Training phase classification algorithm
Training stage 1 classification algorithm

2) evaluation phase

During the assessment phase, the first phase of the establishment of the model required for test set data tuples classification to assess prediction accuracy of the classification model, as shown in FIG.

The accuracy of the classifier is a classifier for a given percentage of tuples test on the test correctly classified data sets share. If the accuracy of a classifier that is acceptable, then the use of data tuples the classifier to classify the unknown category tags.

Assessment phase classification algorithm
Figure 2 assessment phase classification algorithm

The basic concept of forecast

Prediction Model and similar classification model can be seen as a map or a function y = f (x), wherein, x is the input tuple, and y is the output of a continuous or ordered values. The classification algorithm is different, the need to predict the prediction algorithm attribute value is continuous, orderly, classification is to predict property values ​​are discrete, unordered.

Like data mining prediction algorithm and classification algorithm, it is a two-step process. Test data set and the training data set in the prediction task should be independent. Accuracy of the prediction by the difference between the predicted value and the actual known y values ​​assessed.

And the difference between the predicted classification is the classification is used to predict the class labeled data object, it is predicted or estimated that some vacant unknown values. For example, the Shanghai index to predict tomorrow's closing price is up or down is classified, however, if you want to predict tomorrow's closing price on the Shanghai index How much is predictable.

52. The decision trees and naive Bayes algorithm
53. regression analysis
54. Cluster analysis Introduction
55 .k-means clustering algorithm
56 .DBSCAN clustering algorithm
57 association rules data mining analysis
58. The Apriori algorithm and FP-Tree algorithm
59. based on a large data precision marketing of
60. the personalized recommendation system based on large data
61. big data predictive
62. the other big data applications
63. large data which can be applied in industry
64. the application of big data in the financial sector
65. big data applications in the Internet industry
66. the application of big data in the logistics industry

Guess you like

Origin blog.csdn.net/yuidsd/article/details/92418178