[Proficient in Python in 100 days] Day53: Python data analysis_NumPy data operation and analysis advanced

Table of contents

1. broadcast

 2 File input and output

3 Random number generation

4 Linear Algebra Operations

 5 Advanced Operations

6 Examples


1. broadcast

        Broadcasting is a mechanism in NumPy for performing element-wise operations between arrays of different shapes so that they have compatible shapes. Broadcasting allows you to operate on arrays of different shapes without explicitly copying the data. When you try to operate on arrays of different shapes, NumPy automatically reshapes those arrays so that they have compatible shapes for element-wise operations.

Broadcasting rules and examples: The broadcasting rules are as follows:

  1. If the dimensions of the two arrays are different, add 1 to the shape of the array with the smaller dimension until the dimensions of the two arrays are the same.
  2. If the shapes of two arrays do not agree in a dimension, but one of them has a dimension of size 1, then that dimension will be expanded to be the same size as the other.
  3. If the sizes of the two arrays do not match in any dimension and neither dimension has size 1, the broadcast operation will fail, throwing an exception.

Example:

  • Broadcast Rules and Examples
import numpy as np

# 广播示例1:将标量与数组相乘
scalar = 2
array = np.array([1, 2, 3])
result = scalar * array
print("广播示例1结果:", result)  # 输出:[2 4 6]

# 广播示例2:将一维数组与二维数组相加
a = np.array([1, 2, 3])
b = np.array([[10, 20, 30], [40, 50, 60]])
result = a + b
print("广播示例2结果:\n", result)
# 输出:
# [[11 22 33]
#  [41 52 63]]

# 广播示例3:形状不兼容的情况
a = np.array([1, 2, 3])
b = np.array([10, 20])
try:
    result = a + b
except ValueError as e:
    print("广播示例3结果(异常):", e)
# 输出:广播示例3结果(异常):operands could not be broadcast together with shapes (3,) (2,)

 2 File input and output

Read a text file:

  • np.loadtxt(): Used to read data from a text file and return a NumPy array.
  • np.genfromtxt(): Used to read data from a text file and automatically handle missing values ​​and data types as needed.

Write to text file:

  • np.savetxt(): Used to write NumPy arrays to text files.

Read and write binary files:

  • np.save(): Save a NumPy array to a disk file in binary format.
  • np.load(): Load a saved NumPy array from a disk file.

Example:

import numpy as np

# 读取文本文件
data = np.loadtxt('data.txt')  # 从文本文件中读取数据

# 写入文本文件
np.savetxt('output.txt', data, delimiter=',')  # 将数据写入文本文件,使用逗号作为分隔符

# 读取和写入二进制文件
arr = np.array([1, 2, 3])
np.save('array_data.npy', arr)  # 保存数组到二进制文件
loaded_arr = np.load('array_data.npy')  # 从二进制文件中加载数组

3 Random number generation

Generate random numbers:

  • np.random.rand(): Generates an array of uniformly distributed random numbers.
  • np.random.randn(): Generates an array of random numbers from a standard normal distribution (mean 0, standard deviation 1).
  • np.random.randint(): Generates a random integer within the specified range.

random seed:

  • np.random.seed(): Used to set the seed of the random number generator to ensure that the generated random numbers are repeatable.

Example:

import numpy as np

# 生成随机数
random_numbers = np.random.rand(3, 3)  # 生成3x3的均匀分布的随机数数组
standard_normal = np.random.randn(2, 2)  # 生成2x2的标准正态分布的随机数数组
random_integers = np.random.randint(1, 10, size=(2, 3))  # 生成2x3的随机整数数组,范围在1到10之间

# 设置随机种子以可重复生成相同的随机数
np.random.seed(42)
random_a = np.random.rand(3)
np.random.seed(42)  # 使用相同的种子
random_b = np.random.rand(3)

        When you use the same random seed value (42 in the above example), np.randomthe module will generate the same sequence of random numbers. This is useful for research, experimentation, and debugging, as it ensures reproducible randomness. For example:

import numpy as np

np.random.seed(42)
random_a = np.random.rand(3)

# 使用相同的种子值生成相同的随机数序列
np.random.seed(42)
random_b = np.random.rand(3)

# random_a 和 random_b 应该是相同的
print(random_a)
print(random_b)

        This will generate the same sequence of random numbers such that the values ​​of random_aand random_bare equal.

        Note that if you use the same seed value in different places, you will generate the same sequence of random numbers in those places. However, if you change the seed value, a different sequence of random numbers will be generated.

        Random number generation and random seeds are important in simulations, machine learning experiments, and applications that require reproducibility. Using a random seed ensures that your experimental results are reproducible, independent of randomness.

4 Linear Algebra Operations

        Linear algebra plays a key role in scientific computing, and NumPy provides many linear algebra operations for manipulating matrices and vectors.

  • Matrix multiplication: np.dot(), @operator
  • Inverse matrix and pseudo-inverse matrix: np.linalg.inv(),np.linalg.pinv()
  • Eigenvalues ​​and eigenvectors:np.linalg.eig()
  • Singular value decomposition (SVD):np.linalg.svd()

Matrix multiplication: Matrix multiplication can be performed using np.dot()the function or @operator.

Example:

import numpy as np

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

result = np.dot(A, B)  # 或者使用 result = A @ B

Inverse matrix and pseudo-inverse matrix: You can use np.linalg.inv()to calculate the inverse matrix, and np.linalg.pinv()to calculate the pseudo-inverse matrix (use the pseudo-inverse matrix when the matrix is ​​not invertible).

Example:

import numpy as np

A = np.array([[1, 2], [3, 4]])
inverse_A = np.linalg.inv(A)
pseudo_inverse_A = np.linalg.pinv(A)

Eigenvalues ​​and Eigenvectors: You can use to np.linalg.eig()compute the eigenvalues ​​and eigenvectors of a matrix.

Example:

import numpy as np

A = np.array([[1, 2], [2, 3]])
eigenvalues, eigenvectors = np.linalg.eig(A)

Singular Value Decomposition (SVD) : You can use np.linalg.svd()Singular value decomposition to decompose a matrix into a product of three matrices.

Example:

import numpy as np

A = np.array([[1, 2], [3, 4], [5, 6]])
U, S, VT = np.linalg.svd(A)

 5 Advanced Operations

5.1 Indexing and slicing techniques:

NumPy allows accessing and modifying elements of arrays using advanced indexing tricks such as boolean masks, integer array indexing, and more.

  1. Basic Slicing :

    • Basic slicing extracts subarrays of an array by specifying a start index, end index, and stride.
    • Example: arr[2:5]Extract elements with indices 2 to 4, arr[1:5:2]using stride to extract elements.
  2. Boolean Masking :

    • Boolean masks allow you to select elements in an array based on some condition, usually a Boolean expression.
    • Example: arr[arr > 2]Select elements greater than 2.
  3. Integer Array Indexing :

    • Using an array of integers as an index, elements in the array can be selected or rearranged.
    • Example: arr[indices]Using an array of integers indicesselects the element at the specified index.
  4. Multidimensional array slices :

    • When slicing a multidimensional array, you can specify slicing conditions for different dimensions.
    • Example: arr2[1:3, 0:2]Select the first 2 columns of rows 2 and 3.

Code example:

import numpy as np

# 基本切片示例
arr = np.array([0, 1, 2, 3, 4, 5])
sub_array1 = arr[2:5]  # 提取子数组,结果为 [2, 3, 4]
sub_array2 = arr[1:5:2]  # 使用步长,结果为 [1, 3]

# 布尔掩码示例
mask = arr > 2
result = arr[mask]  # 选择大于2的元素,结果为 [3, 4, 5]

# 整数数组索引示例
indices = np.array([0, 2, 4])
result2 = arr[indices]  # 使用整数数组索引,结果为 [0, 2, 4]

# 多维数组切片示例
arr2 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
sub_array3 = arr2[1:3, 0:2]  # 选择第2和第3行的前2列
# 结果为
# [[4, 5],
#  [7, 8]]

# 输出结果
print("基本切片示例1:", sub_array1)
print("基本切片示例2:", sub_array2)
print("布尔掩码示例:", result)
print("整数数组索引示例:", result2)
print("多维数组切片示例:\n", sub_array3)

5.2 Array sorting

        NumPy provides np.sort()and np.argsort()for sorting arrays and returning sorted indices.

Example:

import numpy as np

arr = np.array([3, 1, 2, 4, 5])
sorted_arr = np.sort(arr)  # 对数组进行排序
sorted_indices = np.argsort(arr)  # 返回排序后的索引

Example 1: Sort by value

import numpy as np

arr = np.array([3, 1, 2, 4, 5])
sorted_arr = np.sort(arr)  # 按值升序排序,结果为[1, 2, 3, 4, 5]

 Example 2: sort by index

import numpy as np

arr = np.array([3, 1, 2, 4, 5])
indices = np.argsort(arr)  # 获取按值排序后的索引,结果为[1, 2, 0, 3, 4]
sorted_arr = arr[indices]  # 按索引排序,结果为[1, 2, 3, 4, 5]

5.3 Structured arrays :

Structured arrays allow storing and manipulating data of different data types, similar to tables in a database.

Example:

import numpy as np

data = np.array([(1, 'Alice', 25), (2, 'Bob', 30)],
                dtype=[('ID', 'i4'), ('Name', 'U10'), ('Age', 'i4')])

# 访问结构化数组的元素
print(data['Name'])  # 输出['Alice', 'Bob']

6 Examples of data analysis

We'll load a CSV file containing student test scores, calculate the mean score, score distribution, and draw a histogram.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# 加载CSV文件数据
data = pd.read_csv('student_scores.csv')

# 提取分数列作为NumPy数组
scores = data['Score'].values

# 计算统计信息
mean_score = np.mean(scores)
median_score = np.median(scores)
std_deviation = np.std(scores)

# 绘制直方图
plt.hist(scores, bins=10, edgecolor='k', alpha=0.7)
plt.title('Score Distribution')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()

# 打印统计信息
print(f"Mean Score: {mean_score}")
print(f"Median Score: {median_score}")
print(f"Standard Deviation: {std_deviation}")

         In this example, we first load a CSV file using the Pandas library, then extract the score column from it and convert it to a NumPy array. Next, we use NumPy to calculate the mean, median and standard deviation. Finally, we plotted the histogram of the scores using the Matplotlib library.

        This example shows how to use NumPy with other libraries for more complex data analysis tasks, including data loading, computing statistics, and visualizing data.

Guess you like

Origin blog.csdn.net/qq_35831906/article/details/132646943