02-Create an array, you can still play like this

01-Numpy first experience: array creation and data types

In the previous section, we learned to create a sequence object first, and then generate an array based on it. But this has certain limitations, because if the array we create is relatively large, for example, the size of 100×100, it will be a bit troublesome.

In fact, Numpy has many built-in methods for creating arrays. In addition to the methods we explained in the previous section, Numpy also provides methods for quickly creating arithmetic arrays, geometric arrays, and random number arrays.

1. Create an arithmetic array

  • arange() method

If you let us create a list of arithmetic, from 1-100, a total of 100, it will be very convenient, with a range(1,101,1)quick fix. Numpy also provides a similar method arange(), and usage range()is very similar.

import numpy as np
# 指定 start、stop、以及step。arange和range一样,是左闭右开的区间。
arr_uniform0 = np.arange(1,10,1)
# 也可以只传入一个参数,这种情况下默认start=0,step=1
arr_uniform1 = np.arange(10)
arr_uniform0
Out:
    array([1, 2, 3, 4, 5, 6, 7, 8, 9])
arr_uniform1
Out:
    array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

The result is basically the same as we imagined.

So what is the difference between arange() and range()?

The answer is that the types of results returned by the two are not the same. One is numpy.ndarray, the other is range. Further, arange()capable of receiving the float, creating an array of floating point numbers, but range()can only receive integers create shaping sequence. It can be said arange()of a wider range of applications.

arr_uniform2 = np.arange(1.2, 3.8, 0.3)
arr_uniform2
Out:
    array([1.2, 1.5, 1.8, 2.1, 2.4, 2.7, 3. , 3.3, 3.6])

Here is a question. What are the limitations of using the arange() method?

In practical engineering problems, we often encounter scenarios where the step length is a long-tail decimal or even the step length is not clear. If you want to apply the arange() method at this time, you need to accurately calculate the step length before outputting the sequence. In fact, it doesn't have to be so troublesome, Numpy will take care of it for you.

  • linspace () method

The linspace() method is slightly more complicated, and its function call parameters are as follows:

np.linspace(start, stop[, num=50[, endpoint=True[, retstep=False[, dtype=None]]]]])
# start、stop参数,和arange()中一致;
# num为待创建的数组中的元素的个数,默认为50
# endpoint=True,则为左闭右闭区间,默认为True;endpoint=False,则为左闭右开区间
# retstep用来控制返回值的形式。默认为False,返回数组;若为True,则返回由数组和步长组成的元祖

Just look at a few cases:

# 不设置endpoint,默认为Ture,结果为左闭右闭区间
arr_uniform3 = np.linspace(1,99, 11)
arr_uniform3
Out:
    array([ 1. , 10.8, 20.6, 30.4, 40.2, 50. , 59.8, 69.6, 79.4, 89.2, 99. ])
# retstep设置为True,分别返回数组和步长
arr_uniform4 = np.linspace(1,99, 11, retstep=True)
arr_uniform4
Out:
    (array([ 1. , 10.8, 20.6, 30.4, 40.2, 50. , 59.8, 69.6, 79.4, 89.2, 99. ]), 9.8)
# 设置endpoint为False,结果为左闭右开区间
arr_uniform5 = np.linspace(1,99, 11, endpoint=False)
arr_uniform5
Out:
    array([ 1.        ,  9.90909091, 18.81818182, 27.72727273, 36.63636364,
           45.54545455, 54.45454545, 63.36363636, 72.27272727, 81.18181818,
           90.09090909])

The biggest feature of the linspace() method is that it can directly define the length of the array, which provides convenience for us to adjust the size of the array. Here is an introduction to the reshape method:

arr_uniform6 = np.linspace(1,100, 20)
# 这里定义了一个长度为20的等差数组,然后通过reshape方法,调整数组的大小为5×4
arr_uniform6.reshape(5,4)
Out:
    array([[  1.        ,   6.21052632,  11.42105263,  16.63157895],
       [ 21.84210526,  27.05263158,  32.26315789,  37.47368421],
       [ 42.68421053,  47.89473684,  53.10526316,  58.31578947],
       [ 63.52631579,  68.73684211,  73.94736842,  79.15789474],
       [ 84.36842105,  89.57894737,  94.78947368, 100.        ]])

The reshape method is very flexible to adjust the size of the array, but does not change the length of the array. That is, an array of length 100, you can easily adjust it to 1×100 or 4×25 or 5×20. This is very useful when adjusting the horizontal volume to a column vector.

2. Create a proportional array

Geometric series also have a wide range of applications in calculations, such as when calculating interest rates. Here we introduce two methods to create geometric data.

  • geomspace() method to create exponential geometric series

For example, I want to create a geometric series from 2 to 16. I don't know the specific common ratio, but I want my series to be 4 in length. Then I can do this:

# 起始项为2,结束项为16,数列的长度为4。这里要注意,默认是左闭右闭的数组
arr_geo0 = np.geomspace(2,16,4)
arr_geo0
Out: array([ 2.,  4.,  8., 16.])

The geomspace() method is very simple. Its parameters are described as follows. You will use it later. Friends can customize the input parameters according to the actual situation:

geomspace(start, stop, num=50, endpoint=True, dtype=None# start和stop,分别为区间的起始和终止值,为强制参数;
# num 为待生成等比数列的长度,指定后,程序会自动计算取等比数列的公比;
# endpoint默认为True,结果为左闭右必区间。否则为False,左闭右开区间;
  • logspace() method to create a logarithmic geometric sequence

The logspace() method is similar to geomspace(). The only difference is that when defining the start and end values ​​of the interval, they are defined in the form of exponents. The usage of logspace() is as follows:

logspace(start, stop, num=50, endpoint=True, base=10.0, dtype=None# start:区间起始值为base的start次方
# stop:区间终止值为base的stop次方(是否取得到,需要设定参数endpoint)
# num:为待生成等比数列的长度。按照对数,即start和stop值进行等分。默认值为50
# endpoint:若为True(默认),则可以取到区间终止值,即左闭右闭区间,规则同上

If we produce the same geometric sequence as above, how to write it?

# 起始项为2^1,结束项为2^4,数列的长度为4。这里要注意,起始项是以base为底,start值为指数的幂。
# 另外,因为logspace的参数比较多,建议除了start和stop,其他的参数都以键值对的形式进行传参,避免发生错误
arr_geo1 = np.logspace(1, 4, num=4, base=2)
arr_geo1
Out: array([ 2.,  4.,  8., 16.])

The above is the entire content of creating a geometric sequence. From a glimpse, we know the whole leopard, after Numpy encapsulation, it does provide us with great convenience. This is exactly the charm of Python for data analysis.

3. Create an array of random numbers

Random numbers have many magical uses in the programming world. For example, in the puzzle game that we have all played, a random color block will automatically drop on the top of the screen after a block is eliminated; when playing Landlords happily, shuffling is a random the process of.

But sometimes we also have certain requirements for the generated random numbers. For example, in our game, the probability of each color block appearing is different. Especially in the difficult levels, the program seems to be deliberately improved." Game difficulty". In fact, the random numbers here are all carefully calculated and carefully designed, so let's take a look at how to generate some "high-order" random numbers.

  • Create a uniformly distributed random array between [0, 1)
# 函数的输入为若干个整数,表示输出随机数的大小为d0×d1× ...×dn
# 如果没有参数输入,则返回一个float型的随机数
numpy.random.rand(d0, d1, ..., dn)
# 产生一个大小为3×2,符合0-1之间的均匀分布的数组
arr_rand0 = np.random.rand(3, 2)
arr_rand0
Out: 
    array([[0.07870998, 0.09327187],
           [0.49848953, 0.07535019],
           [0.64401283, 0.11176563]])
  • Create a uniformly distributed random array between (low, high)
# uniform方法可以指定产生随机数的范围[low, high),size为数组的形状,输入格式为整形(一维)或者整形元祖
# 如果不指定size的话,则返回一个服从该分布的随机数
numpy.random.uniform(low=0.0, high=1.0, size=None)
# 产生一个大小为3×2,符合0-10之间的均匀分布的数组
arr_rand1 = np.random.uniform(1, 10, (3, 2))
arr_rand1
Out: 
    array([[6.72617294, 5.32504844],
           [7.6895909 , 6.97631457],
           [1.3057397 , 3.51288886]])
  • Create an array that obeys the standard normal distribution (mean is 0, variance is 1)
# 该方法和rand类似,函数的输入为若干个整数,表示输出随机数的大小为d0×d1× ...×dn
# 如果没有参数输入,则返回一个服从标准正态分布的float型随机数
numpy.random.randn(d0, d1, ..., dn)
# 产生一个大小为3×2,符合标准正态分布的数组
arr_rand2 = np.random.randn(3, 2)
arr_rand2
Out: 
    array([[-0.70354968, -0.85339511],
           [ 0.22804958,  0.28517509],
           [ 0.736904  , -2.98846222]])
  • Create an array that obeys the normal distribution of μ=loc,σ=scale
# loc:指定均值 μ; scale:指定标准差 σ
# size:输入格式为整形(一维)或者整形元祖,指定了数组的形状
numpy.random.normal(loc=0.0, scale=1.0, size=None)
# 产生一个大小为3×2,符合均值为5,标准差为10的正态分布的数组
arr_rand3 = np.random.normal(5, 10, (3, 2))
arr_rand3
Out:
    array([[ -7.77480714,  -2.68529581],
           [  4.40425363,  -8.39891281],
           [-13.08126657,  -9.74238828]])

Up to now, we are all generating random floating-point numbers in a certain interval or conforming to a certain law. Can we randomly generate integers? Obviously it is possible.

  • Array of discrete uniform samples in the specified interval (low, high)
# 函数返回区间[low, high)内的离散均匀抽样,dtype指定返回抽样数组的数据类型,默认为整形
# size:输入格式为整形(一维)或者整形元祖,指定了数组的形状
numpy.random.randint(low, high=None, size=None, dtype=np.int64)
# 在[1, 5)之间离散均匀抽样,数组形状为3行2列
arr_rand4 = np.random.randint(1, 5, (3, 2))
arr_rand4
Out:
    array([[4, 4],
           [3, 3],
           [4, 2]])

For np.random.randint(1, 5, (3, 2))the implementation of the results can be interpreted this way: Suppose the existing four balls numbered 1, 2, and every time we have returned to sampling, each sample six times, each time the ball pits The numbers are combined and adjusted to a 3×2 size array.

numpy.random.randintIt is very convenient to implement sampling with replacement within a certain integer interval. If I want to sample specific objects, is there a better way? Let's continue watching.

  • Sampling of specific samples with or without replacement
# 从样本a中进行抽样,a可以为数组、列表或者整数,若为整数,表示[0,a)的离散抽样;
# replace为False,表示无放回抽样;replace为True,表示有放回抽样
# size为生成样本的大小
# p为给定数组中元素出现的概率
numpy.random.choice(a, size=None, replace=True, p=None)

Let’s look at a case. The probability of Xiao Ming shooting from the free throw line is 0.65. There are 10 shots in total. Let’s look at the computer simulation results:

# 因为理想情况下,每次投篮都不影响下一次的结果,所以把这个问题归结为有放回的抽样,一共进行10次
# shoot_lst用来存储投篮的结果
# 从["命中", "未命中"]中有放回抽样,其中命中的概率为0.65,共抽取10次,返回的格式为为numpy.ndarray
shoot_lst = np.random.choice(["命中", "未命中"], size=10, replace=True, p=[0.65, 0.35])
shoot_lst
Out: ['未命中', '命中', '未命中', '命中', '命中', '命中', '命中', '命中', '命中', '未命中']

From the results, 10 shots, 3 hits. The actual hit rate is 70%. Of course, if the computer continues to simulate, the hit rate will eventually approach 65%.

4. A small test: the secret of sampling

Statistics is a methodological science that studies random phenomena and is characterized by inference. The idea of ​​"inferring from part to the whole" runs through the whole of statistics.

For example, relevant departments publish reports on the physical fitness of college students every year, and one of the most basic characteristics is height. So how did these heights come from? Obviously, the cost of obtaining the height of all college students is very high, so in order to reduce the cost, the simplest trick is to sample randomly. So today we will study how random sampling can reflect the overall situation of the sample.

By consulting some data, we know that the height of college students obeys a certain normal distribution law. We assume that students average height is 175 cm, height standard deviation is 10 cm, by then we Numpycan generate a sample height of 100,000 students. Our current research is based on these 100,000 samples. Through continuous sampling, we observe how the average value changes as the number of samples increases. We use the numpy.random.choicefunction to simulate the sampling process. (The program will use a simple Numpycomputing and matplotlibgraphics capabilities, and friends can first look, will continue to introduce further details in subsequent sections of this column.)

# 设置matplotlib图片样式在jupyter notebook中显示
%matplotlib inline
# 导包
import numpy as np
import matplotlib.pyplot as plt

# 生成10万大学生的身高样本
arr_height = np.random.normal(175, 10, size=100000)
# 进行第一次采样,采样的样本赋值给sample_height,存储格式为ndarray
sample_height = np.random.choice(arr_height, size=1, replace=True)
# average 用来存储每次采样后计算的平均身高
average = []
# 进行1000轮循环采样,因为每次仅采集1个样本,所以整个过程可以视为有放回抽样
n = 10000
for round in range(n):
    sample = np.random.choice(arr_height, size=1, replace=True)
    sample_height = np.append(sample_height, sample)
    average.append(np.average(sample_height))

# 进行绘图,具体过程在第四章详细说明
plt.figure(figsize=(8,6))
plt.plot(np.arange(n), average, alpha=0.6, color='blue')
plt.plot(np.arange(n), [175 for i in range(n)], alpha=0.6, color='red', linestyle='--')
plt.xlabel("Sample Rounds", fontsize=10)
plt.ylabel("Average Height", fontsize=10)
plt.show()

img

From the effect of graph visualization, during the first 2000 sampling, the mean value of the sampled samples changed drastically; however, as the number of samples increased, the mean value of the sampled samples was getting closer and closer to 175 cm. Therefore, by setting up reasonable and scientific sampling methods, the cost of daily statistical work by relevant departments can be greatly reduced.

5. Summary

This chapter unlocked new ways to create arrays for friends, including how to flexibly create various arithmetic, proportional, and random arrays, and through a simple demo, it initially demonstrated Python's data statistics and visualization flexibility.

It should be noted that Numpy provides more advanced random number methods, which need to be distinguished from the random method that comes with Python.

Of course, what needs to be reminded is that we no longer have to worry about the lack of training data. In the future learning process, we can flexibly create various large arrays to help us practice grammar.

So far, how to create an array object is over. In the next chapter, I will explain the magic transformation of arrays. Stay tuned for the next section: "Indexing and Slicing, Seventy-two Changes in Arrays".

Guess you like

Origin blog.csdn.net/qq_33254766/article/details/108362487