How to start learning from the big data mining analysis 0?

Recently, many people consult, want to learn big data, but do not know how to start, where to start to learn what things need to learn? For a beginner, learning what the big data mining analysis is logical thinking? This article reviews the steps to learn how to start thinking big data mining analysis from 0, to learn, to suggest that you can learn.


Many people believe that data mining need to master complex advanced algorithms need to master technology development, data mining analysis to do, in fact, not the case. If you drill into complex algorithms and technology development, can only make you obsessed, getting hard, and not very effective. In the actual work, the best big data mining engineer must be the most familiar with and understand the business of the people. For large data mining learning experience, the authors believe that learning data mining must be combined with practical business background, the background to the case study, this is the learning method to solve the problem-oriented. So, in general, a classic case of big data mining analysis are the following:

  1. Product predict whether the next time the user will be lost, how wastage;
  2. The company made a promotional activities, how to estimate the effect of the activities, how user acceptance;
  3. Assess the credit quality of the user;
  4. Existing customers to segment the market, which in the end customer is the target audience;
  5. After the product line on the market, how conversion rates, which in the end the most effective operating strategy;
  6. Operations done a lot of work, but also cast a lot of company resources, how to enhance product input-output ratio;
  7. After some users buy a lot of merchandise, the probability of which items are purchased at the same time high;
  8. Predict product sales and revenue for the coming year. . . .

Large data mining to do is put into commercial operation similar to the above problem into a data mining problem.

First, the problem of how to transform business operations to large data mining problems

So, the question is, how do we put the above-mentioned problems into commercial operation data mining problem of data mining problem can be subdivided into four questions:? Classification, clustering, association problem, prediction problem.

1 classification

Churn, promotion response, assessment of users belong to the classification of data mining, we need to know the characteristics of classification, to know what is supervised learning, to master the common classification methods: decision trees, Bayesian, KNN, SVM machines, neural networks and logistic regression and so on.

2, clustering problem

Segment, segment customer groups belong to the clustering problem of data mining, we have to grasp the characteristics of clustering, unsupervised learning to know, to understand common clustering algorithms, such as division of clustering, hierarchical clustering, density clustering, network grid clustering, model-based clustering.

3. Related problems

Cross-selling issues belong to the association problem, correlation analysis, also called market basket analysis, we have to grasp the common association analysis algorithm: Aprior algorithm, Carma algorithm, sequential algorithm.

4, prediction problem

We must grasp the simple linear regression analysis, multiple linear regression analysis, time series and so on.

Second, the tools with which the practical operation of large data mining

Tools and means to achieve too many data mining, SPSS, SAS, Python, R, etc. can be, but we need to know what to say or what to master a few, considered learned data mining? That all depends on where you are the advanced level and the path you want to be like.

Level 1: entry-level to reach understanding

Understanding of statistics and databases can be.

Second level: to achieve the primary workplace hierarchy

+ + SPSS statistical database (SPSS may be replaced by software)

Third level: Intermediate to reach the workplace hierarchy

SAS or R

The fourth tier: reach level data mining division

SAS or R + Python (or other programming language)

Third, learn how to use Python large data mining

As long as practical problems to solve, what data mining tools to learn it does not matter, here devaluation Python. That learning how to use Python data mining? You need to know what knowledge in Python?

1, the operation Pandas library

Panda is a particularly important data analysis library, we need to grasp the following three points:

  • Calculation pandas packet;
  • pandas index and multi-index;

Index more difficult, but it is very important

  • pandas multi-table operation PivotTable

2, numpy numerical

numpy data to calculate the main application is in data mining, machine learning for the future, deep learning, this is a must master library, we have to master the following:

  • Numpy array comprehension;
  • Array indexing operation;
  • Calculation array;
  • Broadcasting (linear algebra inside knowledge)

3, data visualization and seaborn -matplotlib

  • Matplotib grammar

python basic visualization tool is matplotlib. At first glance Matplotlib and matlib bit like, to figure out what is the relationship between the two, so it will be relatively easy to learn.

  • seaborn use

seaborn is a very nice visualization tools.

  • pandas graphics

We said before pandas are doing data analysis, but it also offers some drawing API.

4, data mining entry

This is the most interesting part is the hardest part, to master the following sections:

  • Definition of machine learning

Here to do with the difference between data mining

  • The definition of the cost function
  • Train/Test/Validate
  • And avoidance method defined Overfitting

5, data mining algorithms

Data mining development to the present, the algorithm has been very much, simply grasp the simplest, most central, most commonly used algorithm below:

  • Least squares algorithm;
  • Gradient descent;
  • Vectorization;
  • Maximum likelihood estimation;
  • Logistic Regression;
  • Decision Tree;
  • RandomForesr;
  • XGBoost;

6, the actual data mining

To understand the model of learning inside the most famous libraries scikit-learn through the machine.

The above is a logical thinking excavation learning to sort out everyone's big data. However, this is just the beginning, leading to data scientists and data mining division on the road, but also to learn natural language text processing and knowledge, Linux and Spark knowledge, deep learning knowledge and so on, we have to maintain sustained interest in learning data mining.

Guess you like

Origin www.cnblogs.com/dashjunih/p/11008837.html