Data Science Introduction to Python language

Links: https://pan.baidu.com/s/1r7Nncm8azuI0jlsFhVJZng

Extraction code: 4rfg

Here Insert Picture Description

Translator's Preface

Foreword

Chapter 1 Novice 1

1.1 Introduction to Data Science and Python 1

1.2 Python installation 2

1.2.1 Python 2 or Python 33

1.2.2 mounting step 3

1.2.3 Python Core Kit glance 4

1.2.4 Kit Installation 7

1.2.5 Upgrade Kit 9

Scientific Computing 1.3 release 9

1.3.1 Anaconda10

1.3.2 Enthought Canopy10

1.3.3 PythonXY10

1.3.4 WinPython10

Introduction 10 1.4 IPython

1.4.1 IPython Notebook12

1.4.2 dataset 18 and the code book used

1.5 Summary 25

Chapter 2 rewrite data 26

26 2.1 Data Science course

2.2 pandas using data pre-loaded with 27

2.2.1 data quickly loaded 27

2.2.2 Data handling problems 30

2.2.3 handle large data sets 32

2.2.4 access other data formats 36

2.2.5 Data Pretreatment 37

2.2.6 Data Select 39

2.3 using the classification data and text data 41

2.4 49 performs data processing using NumPy

N-dimensional array of 49 2.4.1 NumPy

2.4.2 NumPy ndarray target base 50

2.5 Creating NumPy array 50

2.5.1 From the list to the one-dimensional array 50

2.5.2 Control Memory Size 51

2.5.3 Heterogeneous list 52

2.5.4 multi-dimensional array from the list to 53

2.5.5 change the size of the array 54

2.5.6 NumPy function generator using an array of 56

2.5.7 obtained directly from the file array 57

2.5.8 57 to extract data from pandas

2.6 NumPy fast operations and calculations 58

2.6.1 matrix operation 60

2.6.2 NumPy array slice and the index 61

2.6.3 NumPy array stack 63

2.7 Summary 65

Chapter 3 scientific data flow 66

3.1 EDA Introduction 66

3.2 feature creation 70

Brief about 3.3 Dimension 72

3.3.1 covariance matrix 72

3.3.2 principal component analysis 73

3.3.3 PCA one kind of modification for large data -Randomized PCA76

3.3.4 Potential Factors 77

3.3.5 Linear Discriminant Analysis 77

3.3.6 Latent Semantic Analysis 78

3.3.7 independent component analysis 78

3.3.8 nuclear principal component analysis 78

3.3.9 restricted Boltzmann machine 80

And abnormality detection processing 81 3.4

82 3.4.1 abnormality detection univariate

3.4.2 EllipticEnvelope83

3.4.3 OneClassSVM87

3.5 scoring function 90

3.5.1 multi-label classification 90

3.5.2 binary classification 92

3.5.3 Regression 93

3.6 test and verify 93

3.7 Cross-validation 97

3.7.1 using cross validation iterator 99

3.7.2 Sampling and bootstrap methods 100

3.8 Super parameter optimization 102

3.8.1 build custom scoring function 104

3.8.2 reduce the search time 106 mesh

Feature selection 108 3.9

3.9.1 Single variable selection 108

3.9.2 recursion elimination of 110

3.9.3 Stability of the selection based on the selection of 111 L1

Summary 112 3.10

Chapter 4 113 Machine Learning

4.1 linear and logistic regression 113

4.2 Naive Bayes 116

4.3 K nearest neighbor 118

4.4 advanced nonlinear algorithm 119

4.4.1 SVM-based classification algorithm 120

4.4.2 regression algorithm based on SVM 122

4.4.3 adjustment SVM123

4.5 Portfolio Strategy 124

4.5.1 strategy based on a random sample of the adhesive 125

4.5.2 Based on Weak points bags combined policies 125

4.5.3 random fragmentation and random subspaces 126

4.5.4 Model series -AdaBoost127

Tree 4.5.5 gradient lift 128

4.5.6 handle large data 129

136 4.6 Natural Language Processing glance

4.6.1 Segmentation 136 words

4.6.2 Stemming 137

4.6.3 POS tagging 137

4.6.4 NER 138

4.6.5 stop words 139

4.6.6 A complete example of scientific data - text categorization 140

4.7 Unsupervised Learning Overview 141

4.8 Summary 146

Chapter 5 147 Social Network Analysis

5.1 Introduction to Graph Theory 147

152 graph algorithm 5.2

FIG loading 5.3, and sample output 157

5.4 Summary 160

Chapter 6 161 Visualization

6.1 matplotlib introducing the 161

6.1.1 curve tracer 162

6.1.2 draw block 163 of FIG.

6.1.3 Scatter 164

6.1.4 Histogram 165

6.1.5 Histogram 166

6.1.6 Image Visualization 167

Several graphical example 6.2 pandas 169

6.2.1 boxplot histogram 170

6.2.2 Scatter 171

6.2.3 Parallel Coordinates 173

6.3 Advanced data representation learning 174

174 6.3.1 Learning Curve

6.3.2 verification curve 176

6.3.3 The importance of features 177

6.3.4 GBT portion dependency graph 179

180 6.4 Summary

Guess you like

Origin blog.csdn.net/u014211007/article/details/93732957