Python Data Mining and Machine Learning

In recent years, the Python programming language has been favored by more and more researchers, and has continued to win the championship in multiple programming language rankings. At the same time, with the rapid development of deep learning, artificial intelligence technology is more and more widely used in various fields. Machine learning is the foundation of artificial intelligence. Therefore, mastering the working principles of commonly used machine learning algorithms and being able to use Python proficiently to build actual machine learning models is the premise and basis for carrying out research on artificial intelligence.

Learn Python programming and machine learning theory and code implementation methods, and gradually master it from "basic programming → machine learning → code implementation"

In the form of actual cases, it introduces how to refine innovation points, and how to publish high-level papers and other relevant experience. It aims to help students master the basic knowledge and skills of Python programming, feature engineering (data cleaning, variable dimension reduction, feature selection, group optimization algorithm), regression fitting (linear regression, BP neural network, extreme learning machine), classification recognition ( KNN, Bayesian classification, support vector machine, decision tree, random forest, AdaBoost, XGBoost and LightGBM, etc.), cluster analysis (K-means, DBSCAN, hierarchical clustering), association analysis (association rules, collaborative filtering, Apriori algorithm ) and the implementation method of Python code.

Getting Started with Python Programming

1. Build the Python environment (download, installation and version selection).

2. How to choose a Python editor? (IDLE, Notepad++, PyCharm, Jupyter...)

3. Python basics (data types and variables, strings and encodings, lists and tuples, conditional judgments, loops, function definitions and calls, etc.)

4. Common errors and program debugging

5. Installation and use of third-party modules

6. File reading and writing (I/O)

7. Practical exercises

Python advanced and improved

1. Numpy module library (Numpy installation; creation of ndarray type attributes and arrays; array indexing and slicing; introduction and use of Numpy common functions)

2. Pandas module library (DataFrame data structure, table transformation, sorting, splicing, fusion, grouping operations, etc.)

3. Matplotlib basic graphics drawing (line chart, histogram, pie chart, bubble chart, histogram, box plot, scatter plot, etc.)

4. Beautification of graphic style (modification of color, line type, mark, font and other attributes)

5. Graphic layout (multiple subgraph drawing, regular and irregular layout drawing, adding coordinate axes to any position in the canvas)

6. Advanced graphic drawing (3D graph, contour map, cotton swab graph, dumbbell graph, funnel graph, tree graph, waffle graph, etc.)

7. High-level application of coordinate axes (coordinate axes in the shared drawing area, setting of coordinate axis scale style, controlling the display of coordinate axes, moving the position of coordinate axes)

8. Practical exercises

data cleaning

1. Descriptive statistical analysis (frequency analysis of data: statistical histogram; central tendency analysis of data: arithmetic mean, geometric mean, mode, range and quartile difference, average dispersion, standard deviation, dispersion coefficient ;Data distribution: skewness coefficient, kurtosis; data correlation analysis: correlation coefficient)

2. Data standardization and normalization (why do you need standardization and normalization?)

3. Handling of data outliers and missing values

4. Data discretization and coding processing

5. Manually generate new features

6. Practical exercises

variable dimensionality reduction

1. The basic principle of principal component analysis (PCA)

2. The basic principle of partial least squares (PLS)

3. Case Practice

4. Practical exercises

feature selection

1. Common feature selection methods (optimized search, Filter and Wrapper, etc.; forward and backward selection methods; interval methods; non-informative variable elimination methods; regular sparse optimization methods, etc.)

2. Case Practice

3. Practical exercises

group optimization algorithm

1. The basic principle of genetic algorithm (Genetic Algorithm, GA) (what is the basic idea of ​​group optimization algorithm represented by genetic algorithm? The difference and connection between particle swarm algorithm, dragonfly algorithm, bat algorithm, simulated annealing algorithm, etc. and genetic algorithm )

2. Python code implementation of genetic algorithm

3. Case Practice 1: Optimal Calculation of Unary Functions

4. Case Practice 2: Optimal Calculation of Discrete Variables (Feature Selection)

5. Practical exercises

linear regression model

1. Single linear regression model and multiple linear regression model (estimation of regression parameters, significance test of regression equation, residual analysis)

2. Ridge regression model (working principle, selection of ridge parameter k, selection of variables with ridge regression)

3. LASSO model (working principle, feature selection, modeling prediction, hyperparameter adjustment)

4. Elastic Net model (working principle, modeling prediction, hyperparameter adjustment)

5. Case Practice

6. Practical exercises

feedforward neural network

1. The basic principle of BP neural network (what twists and turns have experienced in the development process of artificial intelligence? What are the classifications of artificial neural network? What is the topology and training process of BP neural network? What is the gradient descent method? Modeling of BP neural network What is the nature of it?)

2. Python code implementation of BP neural network (how to divide training set and test set? Why is normalization required? Is normalization necessary? What is gradient explosion and gradient disappearance?)

3. Optimization of BP neural network parameters (how to set the number of hidden layer neurons, learning rate, initial weight and threshold, etc.? What is cross-validation?)

4. Several issues worth studying (underfitting and overfitting, design of generalization performance evaluation indicators, sample imbalance, etc.)

5. Working principle of Extreme Learning Machine (ELM)

6. Case demonstration 7. Practical exercises

KNN, Bayesian Classification and Support Vector Machines

1. KNN classification model (core idea of ​​KNN algorithm, selection of distance measurement method, selection of K value, selection of classification decision rules)

2. Naive Bayesian classification model (BernoulliNB, Naive Bayesian CategoricalNB, Gaussian Naive Bayesian besfGaussianNB, Multinomial Naive Bayesian MultinomialNB, Supplementary Naive Bayesian ComplementNB)

3. The working principle of SVM (what is the essence of SVM to solve? What are the four typical structures of SVM? What is the role of kernel function? What is support vector SVM extended knowledge (how to solve multi-classification problems? Besides, what else can we do to help?)

4. Case Practice

5. Practical exercises

Decision Trees, Random Forests, LightGBM, XGBoost

1. The working principle of the decision tree (Microsoft Xiaoice's inspiration; what is information entropy and information gain? The difference and connection between the ID3 algorithm and the C4.5 algorithm); besides building a model, the decision tree can also help us do whats the matter?

2. The working principle of random forest (why do you need random forest algorithm? What does "random forest" refer to in the broad and narrow sense? Where is "random" reflected? What is the essence of random forest? How to visualize and interpret random forest result of the forest?)

3. The difference and connection between Bagging and Boosting

4. How AdaBoost vs. Gradient Boosting works

5. Commonly used GBDT algorithm framework (XGBoost, LightGBM)

6. Case Practice

7. Practical exercises

K-means, DBSCAN, hierarchical clustering

1. Working principle of K-means clustering algorithm

2. Working principle of DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clustering algorithm

3. Working principle of hierarchical clustering algorithm

4. Case explanation

5. Practical exercises

Association rules collaborative filtering Apriori algorithm

1. Working principle of association rule algorithm

2. Working principle of collaborative filtering algorithm

3. Working principle of Apriori algorithm

4. Case explanation

5. Practical exercises

Information Retrieval and Common Research Tools

1. How to access Google, YouTube and other websites without barriers? (Google Access Assistant, VPN, etc.)

2. How to consult literature? How can we ensure the tracking of the latest papers?

3. How to use Google Scholar and ResearchGate

4. Where should I go to find the data and codes that go with the paper?

5. Use of literature management tools (Endnote, Zotero, etc.)

6. When an error occurs in the code, how to solve it efficiently?

7. Practical exercises

1. What are the differences between papers in different divisions of SCI? Do you know why your paper appears thin?

2. From the reviewer's point of view, what elements do SCI journal papers need to have? (What are the reviewers' concerns? How do you respond to the reviewers' comments?)

3. How to refine and tap innovation points? (If it is difficult to make original work at the algorithm level, how to refine and tap innovation points in combination with your own practical problems?)

4. Sharing and copying of relevant learning materials (book recommendation, online course recommendation, etc.)

5. Establish a WeChat group for later discussion and Q&A

6. Q&A discussion (prepare questions in advance)

Guess you like

Origin blog.csdn.net/weixin_46433038/article/details/132054662