Python data mining and machine learning practical technology

Analyze the experience and programming skills that need to be mastered in the application of machine learning. In the form of actual cases, it introduces how to refine innovation points, and how to publish high-level papers and other relevant experience. Aim to master the basic knowledge and skills of Python programming, feature engineering (data cleaning, variable dimensionality reduction, feature selection, group optimization algorithm), regression fitting (linear regression, BP neural network, extreme learning machine), classification recognition (KNN, Bayesian classification, support vector machine, decision tree, random forest, AdaBoost, XGBoost and LightGBM, etc.), cluster analysis (K-means, DBSCAN, hierarchical clustering), association analysis (association rules, collaborative filtering, Apriori algorithm) Basic principles and Python code implementation method

Original link: python data mining and machine learning practice technology

Module 1: Python programming [tamping the foundation] Learning content before class, Ai Shang training provides detailed information Python programming introduction 1, Python environment construction (download, installation and version selection). 2. How to choose a Python editor? (IDLE, Notepad++, PyCharm, Jupyter...) 3. Python basics (data types and variables, strings and encodings, lists and tuples, conditional judgments, loops, function definitions and calls, etc.) 4. Common errors and program debugging 5 , Installation and use of third-party modules 6. File reading and writing (I/O) 7. Practical exercises

Python advancement and improvement 1. Numpy module library (installation of Numpy; creation of ndarray type attributes and arrays; array indexing and slicing; introduction and use of Numpy common functions) 2. Pandas module library (DataFrame data structure, table transformation, sorting , splicing, fusion, grouping operations, etc.) 3. Matplotlib basic graphics drawing (line graph, histogram, pie chart, bubble chart, histogram, box plot, scatter plot, etc.) 4. Graphic style beautification (color, line 5. Graphic layout (drawing multiple subgraphs, drawing regular and irregular layouts, adding coordinate axes to any position in the canvas) 6. Advanced graphics drawing (3D graphs, contour graphs) , Cotton swab diagram, dumbbell diagram, funnel diagram, tree diagram, waffle diagram, etc.) 7. High-level application of coordinate axes (coordinate axes in shared drawing area, setting of coordinate axis scale style, controlling the display of coordinate axes, moving coordinates axis position) 8. Practical exercises

Module 2: Feature Engineering

data cleaning

1. Descriptive statistical analysis (frequency analysis of data: statistical histogram; central tendency analysis of data: arithmetic mean, geometric mean, mode, range and quartile difference, average dispersion, standard deviation, dispersion coefficient ;Data distribution: skewness coefficient, kurtosis; data correlation analysis: correlation coefficient) 2. Data standardization and normalization (why do you need standardization and normalization?) 3. Data outliers and missing values ​​processing 4. Data Discretization and coding processing 5. Manually generate new features 6. Practical exercises

variable dimensionality reduction

1. The basic principle of principal component analysis (PCA)

2. The basic principle of partial least squares (PLS)

3. Case Practice

4. Practical exercises

feature selection

1. Common feature selection methods (optimized search, Filter and Wrapper, etc.; forward and backward selection methods; interval methods; non-informative variable elimination methods; regular sparse optimization methods, etc.)

2. Case Practice

3. Practical exercises

group optimization algorithm

1. The basic principle of genetic algorithm (Genetic Algorithm, GA) (what is the basic idea of ​​group optimization algorithm represented by genetic algorithm? The difference and connection between particle swarm algorithm, dragonfly algorithm, bat algorithm, simulated annealing algorithm, etc. and genetic algorithm )

2. Python code implementation of genetic algorithm

3. Case Practice 1: Optimal Calculation of Unary Functions

4. Case Practice 2: Optimal Calculation of Discrete Variables (Feature Selection)

5. Practical exercises

Module 3: Regression Fitting Model

linear regression model

1. Single linear regression model and multiple linear regression model (estimation of regression parameters, significance test of regression equation, residual analysis)

2. Ridge regression model (working principle, selection of ridge parameter k, selection of variables with ridge regression)

3. LASSO model (working principle, feature selection, modeling prediction, hyperparameter adjustment)

4. Elastic Net model (working principle, modeling prediction, hyperparameter adjustment)

5. Case Practice

6. Practical exercises

feedforward neural network

1. The basic principle of BP neural network (what twists and turns have experienced in the development process of artificial intelligence? What are the classifications of artificial neural network? What is the topology and training process of BP neural network? What is the gradient descent method? Modeling of BP neural network What is the nature of it?)

2. Python code implementation of BP neural network (how to divide training set and test set? Why is normalization required? Is normalization necessary? What is gradient explosion and gradient disappearance?)

3. Optimization of BP neural network parameters (how to set the number of hidden layer neurons, learning rate, initial weight and threshold, etc.? What is cross-validation?)

4. Several issues worth studying (underfitting and overfitting, design of generalization performance evaluation indicators, sample imbalance, etc.)

5. Working principle of Extreme Learning Machine (ELM)

6. Case demonstration 7. Practical exercises

Module 4: Classification Recognition Model

KNN, Bayesian Classification and Support Vector Machines

1. KNN classification model (core idea of ​​KNN algorithm, selection of distance measurement method, selection of K value, selection of classification decision rules)

2. Naive Bayesian classification model (BernoulliNB, Naive Bayesian CategoricalNB, Gaussian Naive Bayesian besfGaussianNB, Multinomial Naive Bayesian MultinomialNB, Supplementary Naive Bayesian ComplementNB)

3. The working principle of SVM (what is the essence of SVM to solve? What are the four typical structures of SVM? What is the role of kernel function? What is support vector SVM extended knowledge (how to solve multi-classification problems? Besides, what else can we do to help?)

4. Case Practice

5. Practical exercises

Decision Trees, Random Forests, LightGBM, XGBoost

1. The working principle of the decision tree (Microsoft Xiaoice's inspiration; what is information entropy and information gain? The difference and connection between the ID3 algorithm and the C4.5 algorithm); besides building a model, the decision tree can also help us do whats the matter?

2. The working principle of random forest (why do you need random forest algorithm? What does "random forest" refer to in the broad and narrow sense? Where is "random" reflected? What is the essence of random forest? How to visualize and interpret random forest result of the forest?)

3. The difference and connection between Bagging and Boosting

4. How AdaBoost vs. Gradient Boosting works

5. Commonly used GBDT algorithm framework (XGBoost, LightGBM)

6. Case Practice

7. Practical exercises

Module Five: Cluster Analysis Algorithms

K-means, DBSCAN, hierarchical clustering

1. Working principle of K-means clustering algorithm

2. Working principle of DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clustering algorithm

3. Working principle of hierarchical clustering algorithm

4. Case explanation

5. Practical exercises

Module 6: Association Analysis Algorithms

Association rules, collaborative filtering Apriori algorithm

1. Working principle of association rule algorithm

2. Working principle of collaborative filtering algorithm

3. Working principle of Apriori algorithm

4. Case explanation

5. Practical exercises

Module 7: Summary and Q&A Discussion of information retrieval and common scientific research tools

1. How to access Google, YouTube and other websites without barriers? (Google Access Assistant, VPN, etc.)

2. How to consult literature? How can we ensure the tracking of the latest papers?

3. How to use Google Scholar and ResearchGate

4. Where should I go to find the data and codes that go with the paper?

5. Use of literature management tools (Endnote, Zotero, etc.)

6. When an error occurs in the code, how to solve it efficiently?

7. Practical exercises

Recommended reading:
Scientific research tools - R-META analysis and [bibliometric analysis, Bayesian, machine learning, etc.] multi-technology integration practice

Thesis Tool--Citespace and vosviewer Bibliometrics

Guess you like

Origin blog.csdn.net/weixin_58566962/article/details/131193939