A detailed introduction to python's data analysis three-piece set

1. Commonly used three-piece suit

Three artifacts (numpy, pandas, matplotlib). Next, the commonly used methods are listed one by one.

Pay attention to understand the two necessary functions first:
enumerate() the function is one of the built-in functions of Python, which is used to convert an iterable object (such as a list, tuple or string) into an enumeration object, and return the index and the corresponding value at the same time. Specifically, it pairs each element with an ordinal number, so that both the value of each element and its index in the sequence can be obtained during iteration.

The main function of the enumerate() function is to obtain the element value and its corresponding index at the same time during the iteration process. It is commonly used in data analysis to use the for loop to extract the index i and value name.

递推式构造列表(list comprehension) creates a Pythagorean triple:

>>> [(x,y,z) for x in range(1,30) for y in range(x,30) for z in range(y,30) if x**2 + y**2 == z**2]
[(3, 4, 5), (5, 12, 13), (6, 8, 10), (7, 24, 25), (8, 15, 17), (9, 12, 15), (10, 24, 26), (12, 16, 20), (15, 20, 25), (20, 21, 29)]

## 或者简单的如下
>>> scores = [[random.randrange(50,101) for _ in range(3)] for _ in range(5)]

Two, numpy library

NumPy is a scientific computing library of the Python language. It provides high-performance multidimensional array objects and related tools, which can be used to process various mathematical tasks such as arrays, matrices, and numerical calculations. Here are some common functions in the NumPy library and their usage:

axis() is a function in NumPy that performs operations along a specified axis. It is used to specify on which axis of the array to apply the function in order to perform calculations on different dimensions of the array. Commonly used in:

# 在n维数组上,axis = 0 是对列操作;axis = 1是对行操作。
sorces.max(axis = 0)

np.array(): Creates a NumPy array. For example:

import numpy as np

a = np.array([1, 2, 3, 4])
print(a)

The output is: [1 2 3 4].

np.arange(): Create an arithmetic progression within the specified range. For example:

import numpy as np

a = np.arange(0, 10, 2)
print(a)

The output result is: [0 2 4 6 8].

np.linspace(): Create a sequence of equally spaced numbers within the specified range. For example:

import numpy as np

a = np.linspace(0, 10, 5)
print(a)

The output is: [ 0. 2.5 5. 7.5 10. ].

np.zeros(): Creates an all-zero array of the specified shape. For example:

import numpy as np

a = np.zeros((3, 3))
print(a)

The output is:

[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]

np.ones(): Create an all-one array of the specified shape. For example:

import numpy as np

a = np.ones((2, 2))
print(a)

The output is:
[[1. 1.]
[1. 1.]]

np.eye(): Creates an identity matrix of the specified size. For example:

import numpy as np

a = np.eye(3)
print(a)

The output is:
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]

np.random.rand(): Create an array of random numbers of the specified shape. For example:

import numpy as np

a = np.random.rand(2, 3)
print(a)

The output is:
[[0.96828581 0.29255347 0.82946626]
[0.84055973 0.39246847 0.51868462]]

np.max(): Returns the maximum value in the array. For example:

import numpy as np

a = np.array([1, 2, 3, 4])
max_value = np.max(a)
print(max_value)

The output is: 4.

np.min(): Returns the minimum value in the array. For example:

import numpy as np

a = np.array([1, 2, 3, 4])
min_value = np.min(a)
print(min_value)

The output is: 1.

np.mean(): Returns the average value in an array. For example:

import numpy as np

a = np.array([1, 2, 3, 4])
mean_value = np.mean(a)
print(mean_value)

The output is: 2.5.

np.sum(): Returns the sum of all elements in the array. For example:

import numpy as np

a = np.array([1, 2, 3, 4])
sum_value = np.sum(a)
print(sum_value)

The output is: 10.

np.dot(): Computes the dot product of two arrays. For example:

import numpy as np

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
dot_value = np.dot(a, b)
print(dot_value)

The output is:
[[19 22]
[43 50]]

  The above lists some commonly used functions in the NumPy library, including the functions of creating arrays, array operations, array calculations, etc., which can help us perform numerical calculations and scientific calculations more conveniently.

3. pandas

Between 70% and 80% of a data analyst's day-to-day work involves understanding and cleaning data, aka data exploration and data mining.

Pandas is mainly used for data analysis and it is one of the most used Python libraries. It provides you with some of the most useful tools for exploring, cleaning and analyzing data. Using Pandas, you can load, prepare, manipulate, and analyze all kinds of structured data. Here are some common functions in the Pandas library and their usage:

pd.DataFrame(): Creates a Pandas DataFrame. For example:

import pandas as pd

data = {
    
    
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'age': [25, 30, 35, 40],
    'gender': ['F', 'M', 'M', 'M']
}
df = pd.DataFrame(data)
print(df)

The output is:


       name  age gender
0     Alice   25      F
1       Bob   30      M
2   Charlie   35      M
3     David   40      M

pd.read_csv(): Reads data from a CSV file and creates a Pandas DataFrame. For example:

import pandas as pd

df = pd.read_csv('data.csv')
print(df)

df.head(): Returns the first few rows of DataFrame data. For example:

import pandas as pd

data = {
    
    
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'age': [25, 30, 35, 40],
    'gender': ['F', 'M', 'M', 'M']
}
df = pd.DataFrame(data)
print(df.head(2))

The output is:

    name  age gender
0  Alice   25      F
1    Bob   30      M

df.tail(): Returns the next few rows of DataFrame data. For example:

import pandas as pd

data = {
    
    
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'age': [25, 30, 35, 40],
    'gender': ['F', 'M', 'M', 'M']
}
df = pd.DataFrame(data)
print(df.tail(2))

The output is:


      name  age gender
2  Charlie   35      M
3    David   40      M

df.describe(): Returns statistics for a DataFrame. For example:


import pandas as pd

data = {
    
    
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'age': [25, 30, 35, 40],
    'gender': ['F', 'M', 'M', 'M']
}
df = pd.DataFrame(data)
print(df.describe())

The output is:


             age
count   4.000000
mean   32.500000
std     6.454972  # 标准差
min    25.000000
25%    28.750000
50%    32.500000
75%    36.250000
max    40.000000

df.groupby(): Groups the DataFrame by the specified column. For example:


import pandas as pd

data = {
    
    
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma', 'Frank'],
    'age': [25, 30, 35, 40, 25, 30],
    'gender': ['F', 'M', 'M', 'M', 'F', 'M']
}
df = pd.DataFrame(data)
grouped = df.groupby('age')
for name, group in grouped:
    print(name)
    print(group)

The output is:

25
    name  age gender
0  Alice   25      F
4   Emma   25      F
30
   name  age gender
1   Bob   30      M
5  Frank   30      M
35
      name  age gender
2  Charlie   35      M
40
    name  age gender
3  David   40      M

df.pivot_table():Createpivot table. For example:

import pandas as pd

data = {
    
    
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma', 'Frank'],
    'age': [25, 30, 35, 40, 25, 30],
    'gender': ['F', 'M', 'M', 'M', 'F', 'M'],
    'score': [90, 85, 80, 75, 70, 65]
}
df = pd.DataFrame(data)
pivot_table = df.pivot_table(values='score', index='name', columns='age')
print(pivot_table)

The output is:

age      25    30    35    40
name                        
Alice  90.0   NaN   NaN   NaN
Bob     NaN  85.0   NaN   NaN
Charlie NaN   NaN  80.0   NaN
David   NaN   NaN   NaN  75.0
Emma   70.0   NaN   NaN   NaN
Frank   NaN  65.0   NaN   NaN

pivot table,Rows correspond to index, columns correspond to columns, and values ​​correspond to values. In the above code, the value in the pivot table is score, the row is name, and the column is age.

Pivot tables can easily display data and perform data analysis.

df.merge(): Merge two DataFrames. For example:


import pandas as pd

data1 = {
    
    
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'age': [25, 30, 35, 40],
    'gender': ['F', 'M', 'M', 'M']
}
data2 = {
    
    
    'name': ['Alice', 'Bob', 'Charlie', 'Emma'],
    'score': [90, 85, 80, 75],
    'grade': ['A', 'B', 'C', 'B']
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
merged = pd.merge(df1, df2, on='name')
print(merged)

The output is:


      name  age gender  score grade
0    Alice   25      F     90     A
1      Bob   30      M     85     B
2  Charlie   35      M     80     C

In the above code, df1 and df2 are two DataFrames, both of which have the column name, and they can be merged by pd.merge(). The merged DataFrame contains the common columns and all rows in the two DataFrames.

In addition to the functions listed above, the pandas library also provides many other functions and methods, which can be used according to different needs.

yes, matplotlib

matplotlib is a plotting library for Python that can be used to create various static, dynamic, and interactive data visualizations. Here are some commonly used matplotlib functions and usage:

plt.plot(): Draw a line graph. For example:

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 1000)
y = np.sin(x)
plt.plot(x, y)
plt.show()

In the above code, use the np.linspace() function to generate 1000 equally spaced data points, then calculate the sine value of these data points, and finally use the plt.plot() function to connect these data points into a line, and use The plt.show() function shows it.

plt.scatter(): Draw a scatterplot. For example:

import matplotlib.pyplot as plt
import numpy as np

x = np.random.randn(1000)
y = np.random.randn(1000)
plt.scatter(x, y)
plt.show()

In the above code, use the np.random.randn() function to generate 1000 standard normal distribution data points, and then use the plt.scatter() function to draw these data points into a scatter plot.

plt.bar(): Draw a histogram. For example:

import matplotlib.pyplot as plt
import numpy as np

x = ['A', 'B', 'C', 'D', 'E']
y = [20, 35, 30, 25, 40]
plt.bar(x, y)
plt.show()

In the above code, the x list contains five categories, and the y list contains the corresponding quantities. Use the plt.bar() function to draw them into a histogram.

plt.pie(): Draw a pie chart. For example:


import matplotlib.pyplot as plt

labels = ['A', 'B', 'C', 'D', 'E']
sizes = [20, 35, 30, 25, 40]
plt.pie(sizes, labels=labels, autopct='%1.1f%%')
plt.axis('equal')
plt.show()

In the above code, the labels list contains five categories, and the sizes list contains the corresponding quantities. Use the plt.pie() function to draw them into pie charts, and use the autopct parameter to set the display percentage. Use plt.axis('equal' ) function makes the pie chart circular.

plt.hist(): Draw a histogram. For example:


import matplotlib.pyplot as plt
import numpy as np

x = np.random.randn(1000)
plt.hist(x, bins=20)
plt.show()

In the above code, the np.random.randn() function is used to generate 1000 standard normal distribution data, and the plt.hist() function is used to output the histogram.

plt.imshow(): Draw the image. For example:

import matplotlib.pyplot as plt
import numpy as np
from PIL import Image

img = Image.open('test.jpg')
plt.imshow(np.array(img))
plt.show()

In the above code, use the Image.open() function of the PIL library to open an image, and then use the plt.imshow() function to display it.

plt.subplot(): Create a subgraph. For example:

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 1000)
y1 = np.sin(x)
y2 = np.cos(x)
y3 = np.tan(x)

plt.subplot(2, 2, 1)
plt.plot(x, y1)

plt.subplot(2, 2, 2)
plt.plot(x, y2)

plt.subplot(2, 1, 2)
plt.plot(x, y3)

plt.show()

In the above code, use the plt.subplot() function to create a large graph containing three subplots, where the first subplot is located in the first row and first column, the second subplot is located in the first row and second column, and the third subplot is located in the first row and second column. subplots occupy the entire second row. Then the different functions are plotted in the three subplots respectively.

The above are some common functions and usages of matplotlib, and there are many other functions and usages that can be flexibly used according to specific needs.

Continuous update

Guess you like

Origin blog.csdn.net/qq_54015136/article/details/129516657