Big Data sharing common data mining techniques can be used to learn new

Large data of interest for the development of small technology partner for data mining much about it? This article will give small series like Big Data Big Data development of small partners to share common data mining techniques, hoping for some little friends help.

1, statistical techniques

Data mining fields of science and technology involved many, such as statistical techniques. The main idea of ​​statistical techniques for data mining are set: statistical methods for a given data set assumes a distribution or a probability model (e.g. a normal distribution) and then excavating the model using the appropriate method.

Big Data sharing common data mining techniques can be used to learn new

In this case still have to recommend my own build Big Data learning exchange group: 529 867 072, the group is big data science development, big data if you are learning, you are welcome to join small series, we are all party software development, from time to time Share dry (only the big data-related software development), including a copy of the latest big data and advanced data advanced development course my own sort of welcome advanced and want to delve into the big data small partners to join.

2, association rules

Data association is an important type of knowledge can be found in the database exist. If the value of I Division two or more variables there is some regularity, it is called an association. Association can be divided into simple association, temporal association, a causal association. The purpose of the analysis is to identify the associated database hidden network of associations. Sometimes we do not know the correlation function data in the database, even if they know is uncertain, so the correlation analysis generated rule with confidence.

3, based on the history of MBR (Memory-based Reasoning) analysis

First look for a similar situation in accordance with empirical knowledge, then apply the information in these circumstances the present case. This is the essence of MBR (Memory Based Reasoning) is. MBR first looks like a new record and neighbors, the neighbors then use these new data classification and valuation. There are three main problems using the MBR, determined to find historical data; decision represents the most effective way of historical data; determine the number of distance function, joint function and neighbors.

4, genetic algorithm GA (Genetic Algorithms)

Based on the theory of evolution, and the use of genetic optimization techniques in conjunction with the design method, genetic variation, and natural selection. The main idea is: according to the principle of survival of the fittest, most suitable for the formation of a group of new rules of the current group, and the descendants of those rules of composition. Typically, fitness (Fitness) with its regular assessment of classification accuracy training sample set.

5, aggregation detection

The set of groupings of physical or abstract objects in the process of becoming a plurality of classes by like objects is called clustering. The clusters generated by the cluster is a collection of a set of data objects, these objects similar to one another with the same object in a cluster, the cluster with different other objects. SUI is based on the dissimilarity value genus described objects to be calculated, the distance metric is often used.

6, link analysis

Link analysis, Link analysis, it is the basic theory of graph theory. I thought graph theory is to find a good result can be obtained, but the result is not perfect algorithm, rather than algorithm to find the perfect solution. Link analysis is the use of this kind of thinking: imperfect results if it is feasible, then this is a good analysis of analysis. Using link analysis, can be analyzed from the behavior of some users some modes; while generated concepts apply broader user groups.

7, decision tree

Decision Tree provides a way to show under what conditions similar to what would be the value of such rules.

Guess you like

Origin blog.51cto.com/14296550/2407731