Standardization and normalization (integrated)

part1:

【转】https://blog.csdn.net/weixin_40165004/article/details/89080968

Weka data preprocessing (a)

For data mining, we tend to focus only on substantial mining algorithms, such as classification, clustering, association rules, while ignoring the quality of data mining to be, but the quality of data in order to produce high-quality mining results, otherwise only " Garbage in garbage out "of. Ensure the quality of data to be an important step is the data preprocessing (Data Pre-Processing), in the actual operation, data preparation phase throughout the mining process can often take up to 6-8 times. In this paper, data preprocessing methods weka tool for some introduction.

 

Weka data preprocessing known as data filtering, they can be found in weka.filters in. The properties of the filter algorithm can be divided into supervised (SupervisedFilter) and unsupervised (UnsupervisedFilter). For the former, the filter needs to set a class attribute, consider the distribution of the attribute data set and the like, to determine the optimum number and size of the container; the latter class attributes may be absent. Meanwhile, the filtering algorithm in turn attributed to the attribute-based (attribute) and instance-based (instance). The method is mainly based on the attribute column for processing, such as adding or deleting columns; based on the example of a method for processing a main line, for example, add or delete rows.

Data filtering mainly addresses the following issues (commonplace):

Data missing values, standardization and discretized.

The value of missing data processing: weka.filters.unsupervised.attribute.ReplaceMissingValues. For numeric attribute, in place of the missing value with the average value for nominal attributes, with its MODE (most frequent value) to replace missing values.

Standardization (standardize): Class weka.filters.unsupervised.attribute.Standardize. Given dataset normalized values of all attributes values to a zero mean and unit variance of a normal distribution .

Normalization (Nomalize): Class weka.filters.unsupervised.attribute.Normalize. Normalization given dataset except for all values ​​of the attribute values, the attribute class. The default value results in the interval [0,1], but with zooming and panning parameters, we can attribute numerical values ​​to any specification interval. Such as: However, scale = 2.0, translation = -1.0, you can regulate the property value to the interval [-1, + 1].

Discrete (discretize): Class weka.filters.supervised.attribute.Discretize and weka.filters.unsupervised.attribute.Discretize. Properties were numerically supervised and unsupervised of discretization, for some numerical discretization dataset to categorical attributes.


part2:

【转】https://blog.csdn.net/u014381464/article/details/81101551

Normalization:
for database
normalization of the relations to meet the regulatory requirements are divided into several stages to meet the requirements of the lowest in first normal form (1NF), is again the second paradigm, the third paradigm, BC paradigm and 4NF, 5NF etc., norm the higher the level, the more stringent constraint set of conditions are met.

For data
normalized data includes normalization standardization regularization, it is a general term (as was also the standardization collectively).

Data normalization of the data transformation is a data mining, data conversion or the data is converted into a form suitable unified data mining, excavation attribute data will be scaled object, it falls within a certain small interval , such as [-1, 1] or [0, 1]

Normalized attribute values ​​commonly used for classification and clustering algorithms and neural network algorithm involves among distance metric. When classifying mining propagation algorithm such as the use of neural networks, a measure of the value of every property on the standardized training tuple help speed up the learning phase speed. A method for dissimilarity measure based on the distance data can be normalized so that all the attributes have the same weight value.

Data There are three common methods of normalization: minimum and maximum normalization, z-score normalization and standardization of fractional scaling

 

Standardization (standardization):
Data normalized data is scaled so that it falls within a small interval, the normalized data can be positive or negative, but the absolute value is generally not too large, generally z-score normalization method : after subtracting the expected divided by the standard deviation.

 

Normalization (normalization):

The scaling value to the inter-cell of 0-1 (attributed to the category of the digital signal processing), a general method is the largest minimum standard methods: min-max normalization

 

 


 

part.3

matlab function on

 

 


other:

weka algorithm source code gets way: https: //blog.csdn.net/renyiniki/article/details/79668870

1. First, download the official website weka source, there are two ways, one is to download the installation file, the installation will have a weka-src.jar in the installation directory, decompress after for the source, and the other is by downloading SVN:  https://svn.cms.waikato.ac.nz/svn/weka/trunk/weka   need tools such as SVN on the machine: TortoiseSVN

2. Import myeclipse

2.1 Create a working directory. Create a new directory weka.

2.2 Prepare the source code. Weka weka-src.jar found in the installation directory, extract to just build directory.

2.3 to create a project. Open the MyEclipse, File-> New-> Java Project, Project name fill weka, select create project from existing source, click next, click finish.

2.4 compile and run. Weka select the project you just created, run as Java Application, waiting for the pop-up dialog box, select the main class, weka.gui.main (main input to see). Soon, weka interface.

 

 

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/rinroll/p/11986350.html