NumPy能力大评估：70道测试题

原文地址：https : //www.machinelearningplus.com/101-numpy-exercises-python/
1，将NumPy导入为np，并查看版本
难度：L1
问题：将NumPy导入为np，并输出版本号。
答案：

import numpy as np 
print（np .__ version__ ）

2.如何创建 1 维数组？

难度：L1
问题：创建数字从 0 到 9 的 1 维数组。

期望输出：

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

arr = np.arange(10)
OR
arr = np.arange(0,10,1)

3.如何创建 boolean 数组？

难度：L1
问题：创建所有 True 的 3×3 Num

np.full((3, 3), True, dtype=bool)
#> array([[ True,  True,  True],
#>        [ True,  True,  True],
#>        [ True,  True,  True]], dtype=bool)

# Alternate method:
np.ones((3,3), dtype=bool)

bool类似于float,double等，只不过float定义浮点型，double定义双精度浮点型。
在objective-c中提供了相似的类型BOOL，它具有YES值和NO值，布尔型变量1为true。
numpy.full(shape, fill_value, dtype=None, order=’C’)

4.如何从 1 维数组中提取满足给定条件的项？

难度：L1
问题：从 arr 中提取所有奇数。

输入：

arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

期望输出：

array([1, 3, 5, 7, 9])

arr[arr % 2 == 1]
#> array([1, 3, 5, 7, 9])

5.如何将 NumPy 数组中满足给定条件的项替换成另一个数值？

难度：L1
问题：将 arr 中的所有奇数替换成 -1。

输入：

arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

期望输出：

array([ 0, -1, 2, -1, 4, -1, 6, -1, 8, -1])

arr[arr % 2 == 1] = -1

6、如何在不影响原始数组的前提下替换满足给定条件的项？

难度：L2
问题：将 arr 中所有奇数替换成 -1，且不改变 arr。

输入：

arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

期望输出：

out

array([ 0, -1, 2, -1, 4, -1, 6, -1, 8, -1])
arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

out = np.where(arr % 2 == 1, -1, arr)

numpy.where(condition[, x, y])

>>> x = np.arange(9.).reshape(3, 3)
>>> np.where( x > 5 )
(array([2, 2, 2]), array([0, 1, 2]))
>>> x[np.where( x > 3.0 )]               # Note: result is 1D.
array([ 4.,  5.,  6.,  7.,  8.])
>>> np.where(x < 5, x, -1)               # Note: broadcasting.
array([[ 0.,  1.,  2.],
       [ 3.,  4., -1.],
       [-1., -1., -1.]])

7、如何重塑（reshape）数组？

难度：L1
问题：将 1 维数组转换成 2 维数组（两行）。

输入：

np.arange(10)

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

期望输出

array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])

arr.reshape(2, -1)  # -1表示自动适配，这里等同于5

8、如何垂直堆叠两个数组？

难度：L2
问题：垂直堆叠数组 a 和 b。

输入：

a = np.arange(10).reshape(2,-1)
b = np.repeat(1, 10).reshape(2,-1)

期望输出：

array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])

# Method 1:
np.concatenate([a, b], axis=0)

# Method 2:
np.vstack([a, b])

# Method 3:
np.r_[a, b]

np.r_

#'衔接轴向0纵向1横向(axis)，dim数([[层数，最小为2)，衔接之前的轴向变换'
>>> np.r_['0,2,0', [1,2,3], [4,5,6]]
array([[1],
       [2],
       [3],
       [4],
       [5],
       [6]])
>>> np.r_['1,2,0', [1,2,3], [4,5,6]]
array([[1, 4],
       [2, 5],
       [3, 6]])

9、如何水平堆叠两个数组？

难度：L2
问题：水平堆叠数组 a 和 b。

输入：

a = np.arange(10).reshape(2,-1)
b = np.repeat(1, 10).reshape(2,-1)

期望输出：

array([[0, 1, 2, 3, 4, 1, 1, 1, 1, 1],
[5, 6, 7, 8, 9, 1, 1, 1, 1, 1]])

# Method 1:
np.concatenate([a, b], axis=1)

# Method 2:
np.hstack([a, b])

# Method 3:
np.c_[a, b]
or
np.r_['1', a, b]

10、在不使用硬编码的前提下，如何在 NumPy 中生成自定义序列？

难度：L2
问题：在不使用硬编码的前提下创建以下模式。仅使用 NumPy 函数和以下输入数组 a。

输入：

a = np.array([1,2,3])

期望输出：

array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])

np.r_[np.repeat(a, 3), np.tile(a, 3)]

11、如何获得两个 Python NumPy 数组中共同的项？

难度：L2
问题：获取数组 a 和 b 中的共同项。

输入：

a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])

期望输出：

array([2, 4])

np.intersect1d(a,b)

numpy.intersect1d(ar1, ar2, assume_unique=False)

#intersect1d函数只能对比2个输入的相同值，对比多个可以用reduce函数
>>> from functools import reduce
>>> reduce(np.intersect1d, ([1, 3, 4, 3], [3, 1, 2, 1], [6, 3, 4, 2]))

12、如何从一个数组中移除与另一个数组重复的项？

难度：L2
问题：从数组 a 中移除出现在数组 b 中的所有项。

输入：

a = np.array([1,2,3,4,5])
b = np.array([5,6,7,8,9])

期望输出：

array([1,2,3,4])

# From 'a' remove all of 'b'
np.setdiff1d(a,b)

13、如何获取两个数组匹配元素的位置？

难度：L2
问题：获取数组 a 和 b 中匹配元素的位置。

输入：

a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])

期望输出：

(array([1, 3, 5, 7]),)

np.where(a == b)

14、如何从 NumPy 数组中提取给定范围内的所有数字？

难度：L2
问题：从数组 a 中提取 5 和 10 之间的所有项。

输入：

a = np.arange(15)

期望输出：

(array([ 5, 6, 7, 8, 9, 10]),)

# Method 1
index = np.where((a >= 5) & (a <= 10))
a[index]

# Method 2:
index = np.where(np.logical_and(a>=5, a<=10))
a[index]
#> (array([ 5,  6,  7,  8,  9, 10]),)

# Method 3: (thanks loganzk!)
a[(a >= 5) & (a <= 10)]

15、如何创建一个 Python 函数以对 NumPy 数组执行元素级的操作？

难度：L2
问题：转换函数 maxx，使其从只能对比标量而变为对比两个数组。

输入：

def maxx(x, y):
“”“Get the maximum of two items”“”
if x >= y:
return x
else:
return y

maxx(1, 5)

5
期望输出：

a = np.array([5, 7, 9, 8, 6, 4, 5])
b = np.array([6, 3, 4, 8, 9, 7, 1])
pair_max(a, b)

array([ 6., 7., 9., 8., 9., 7., 5.])

def maxx(x, y):
    """Get the maximum of two items"""
    if x >= y:
        return x
    else:
        return y

pair_max = np.vectorize(maxx, otypes=[float])

a = np.array([5, 7, 9, 8, 6, 4, 5])
b = np.array([6, 3, 4, 8, 9, 7, 1])

pair_max(a, b)

16、如何在 2d NumPy 数组中交换两个列？

难度：L2
问题：在数组 arr 中交换列 1 和列 2。

arr = np.arange(9).reshape(3,3)
arr

arr[:, [1,0,2]]
> array([[1, 0, 2],
>        [4, 3, 5],
>        [7, 6, 8]])

17、如何在 2d NumPy 数组中交换两个行？

难度：L2
问题：在数组 arr 中交换行 1 和行 2。

arr = np.arange(9).reshape(3,3)
arr

arr[[1,0,2], :]
> array([[3, 4, 5],
>        [0, 1, 2],
>        [6, 7, 8]])

18.如何反转 2D 数组的所有行？

难度：L2
问题：反转 2D 数组 arr 中的所有行。

Input
arr = np.arange(9).reshape(3,3)

arr = np.arange(9).reshape(3,3)
arr[::-1]

19、如何反转 2D 数组的所有列？

难度：L2
问题：反转 2D 数组 arr 中的所有列。

Input
arr = np.arange(9).reshape(3,3)

arr[:, ::-1]

20.如何创建一个包含 5 和 10 之间随机浮点的 2 维数组？

难度：L2
问题：创建一个形态为 5×3 的 2 维数组，包含 5 和 10 之间的随机十进制小数。

# Solution Method 1:
rand_arr = np.random.randint(5, 10, size=(5,3)) + np.random.random((5,3))
# print(rand_arr)

# Solution Method 2:
rand_arr = np.random.uniform(5,10, size=(5,3))
print(rand_arr)

21.如何在 Python NumPy 数组中仅输出小数点后三位的数字？

难度：L1
问题：输出或显示 NumPy 数组 rand_arr 中小数点后三位的数字。

输入：

rand_arr = np.random.random((5,3))

np.set_printoptions(precision=3)

22.如何通过禁用科学计数法（如 1e10）打印 NumPy 数组？

难度：L1
问题：通过禁用科学计数法（如 1e10）打印 NumPy 数组 rand_arr。

输入：

Create the random array
np.random.seed(100)
rand_arr = np.random.random([3,3])/1e3
rand_arr

array([[ 5.434049e-04, 2.783694e-04, 4.245176e-04],
[ 8.447761e-04, 4.718856e-06, 1.215691e-04],
[ 6.707491e-04, 8.258528e-04, 1.367066e-04]])

期望输出：

array([[ 0.000543, 0.000278, 0.000425],
[ 0.000845, 0.000005, 0.000122],
[ 0.000671, 0.000826, 0.000137]])

np.set_printoptions(suppress=True)  # precision is optional

23.如何限制 NumPy 数组输出中项的数目？

难度：L1
问题：将 Python NumPy 数组 a 输出的项的数目限制在最多 6 个元素。

输入：

a = np.arange(15)

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])

期望输出：

array([ 0, 1, 2, …, 12, 13, 14])

np.set_printoptions(threshold=6)

24.如何在不截断数组的前提下打印出完整的 NumPy 数组？

难度：L1
问题：在不截断数组的前提下打印出完整的 NumPy 数组 a。

输入：

np.set_printoptions(threshold=6)
a = np.arange(15)
a

array([ 0, 1, 2, …, 12, 13, 14])

期望输出：

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])

np.set_printoptions(threshold=np.nan)

25、如何向 Python NumPy 导入包含数字和文本的数据集，同时保持文本不变？

难度：L2
问题：导入 iris 数据集，保持文本不变。

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

26、如何从 1 维元组数组中提取特定的列？

难度：L2
问题：从前一个问题导入的 1 维 iris 中提取文本列 species。

输入：

url = ‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’
iris_1d = np.genfromtxt(url, delimiter=’,’, dtype=None)

species = np.array([row[4] for row in iris_1d])

27、如何将 1 维元组数组转换成 2 维 NumPy 数组？

难度：L2
问题：忽略 species 文本字段，将 1 维 iris 转换成 2 维数组 iris_2d。

输入：

url = ‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’
iris_1d = np.genfromtxt(url, delimiter=’,’, dtype=None)

# Method 1: Convert each row to a list and get the first 4 items
iris_2d = np.array([row.tolist()[:4] for row in iris_1d])
iris_2d[:4]

# Alt Method 2: Import only the first 4 columns from source url
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
iris_2d[:4]

28、如何计算 NumPy 数组的平均值、中位数和标准差？

难度：L1
问题：找出 iris sepallength（第一列）的平均值、中位数和标准差。

url = ‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’
iris = np.genfromtxt(url, delimiter=’,’, dtype=’object’)

sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])
mu, med, sd = np.mean(sepallength), np.median(sepallength), np.std(sepallength)

29、如何归一化数组，使值的范围在 0 和 1 之间？

难度：L2
问题：创建 iris sepallength 的归一化格式，使其值在 0 到 1 之间。

输入：

url = ‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’
sepallength = np.genfromtxt(url, delimiter=’,’, dtype=’float’, usecols=[0])

Smax, Smin = sepallength.max(), sepallength.min()
S = (sepallength - Smin)/(Smax - Smin)
# or 
S = (sepallength - sepallength.min())/sepallength.ptp()  # Thanks, David Ojeda!

30、如何计算 softmax 分数？

难度：L3
问题：计算 sepallength 的 softmax 分数。

url = ‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’
sepallength = np.genfromtxt(url, delimiter=’,’, dtype=’float’, usecols=[0])

def softmax(x):
    """Compute softmax values for each sets of scores in x.
    https://stackoverflow.com/questions/34968722/how-to-implement-the-softmax-function-in-python"""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)

31、如何找到 NumPy 数组的百分数？

难度：L1
问题：找出 iris sepallength（第一列）的第 5 个和第 95 个百分数。

url = ‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’
sepallength = np.genfromtxt(url, delimiter=’,’, dtype=’float’, usecols=[0])

np.percentile(sepallength, q=[5, 95])

32、如何在数组的随机位置插入值？

难度：L2
问题：在 iris_2d 数据集中的 20 个随机位置插入 np.nan 值。

Input
url = ‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’
iris_2d = np.genfromtxt(url, delimiter=’,’, dtype=’object’)

# Method 1
i, j = np.where(iris_2d)

# i, j contain the row numbers and column numbers of 600 elements of iris_x
np.random.seed(100)
iris_2d[np.random.choice((i), 20), np.random.choice((j), 20)] = np.nan

# Method 2
np.random.seed(100)
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

33.如何在 NumPy 数组中找出缺失值的位置？

难度：L2
问题：在 iris_2d 的 sepallength（第一列）中找出缺失值的数目和位置。

Input
url = ‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’
iris_2d = np.genfromtxt(url, delimiter=’,’, dtype=’float’)
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

print("Number of missing values: \n", np.isnan(iris_2d[:, 0]).sum())
print("Position of missing values: \n", np.where(np.isnan(iris_2d[:, 0])))

34、如何基于两个或以上条件过滤 NumPy 数组？

难度：L3
问题：过滤 iris_2d 中满足 petallength（第三列）> 1.5 和 sepallength（第一列）< 5.0 的行。

Input
url = ‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’
iris_2d = np.genfromtxt(url, delimiter=’,’, dtype=’float’, usecols=[0,1,2,3])

# condition type bool （true or false）
condition = (iris_2d[:, 2] > 1.5) & (iris_2d[:, 0] < 5.0)
iris_2d[condition]

35、如何在 NumPy 数组中删除包含缺失值的行？

难度：L3
问题：选择 iris_2d 中不包含 nan 值的行。

Input
url = ‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’
iris_2d = np.genfromtxt(url, delimiter=’,’, dtype=’float’, usecols=[0,1,2,3])

any_nan_in_row = np.array([~np.any(np.isnan(row)) for row in iris_2d])

36.如何找出 NumPy 数组中两列之间的关联性？

难度：L2
问题：找出 iris_2d 中 SepalLength（第一列）和 PetalLength（第三列）之间的关联性。

Input
url = ‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’
iris_2d = np.genfromtxt(url, delimiter=’,’, dtype=’float’, usecols=[0,1,2,3])

# Solution 1
np.corrcoef(iris[:, 0], iris[:, 2])[0, 1]

# Solution 2
from scipy.stats.stats import pearsonr  
corr, p_value = pearsonr(iris[:, 0], iris[:, 2])
print(corr)

# Correlation coef indicates the degree of linear relationship between two numeric variables.
# It can range between -1 to +1.

# The p-value roughly indicates the probability of an uncorrelated system producing 
# datasets that have a correlation at least as extreme as the one computed.
# The lower the p-value (<0.01), stronger is the significance of the relationship.
# It is not an indicator of the strength.
#> 0.871754157305

37、如何确定给定数组是否有空值？

难度：L2
问题：确定 iris_2d 是否有缺失值。

Input
url = ‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’
iris_2d = np.genfromtxt(url, delimiter=’,’, dtype=’float’, usecols=[0,1,2,3])

np.isnan(iris_2d).any()

38.如何在 NumPy 数组中将所有缺失值替换成 0？

难度：L2
问题：在 NumPy 数组中将所有 nan 替换成 0。

Input
url = ‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’
iris_2d = np.genfromtxt(url, delimiter=’,’, dtype=’float’, usecols=[0,1,2,3])
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

iris_2d[np.isnan(iris_2d)] = 0

39、如何在 NumPy 数组中找出每个值及其数量？

难度：L2
问题：在 iris 的 species 列中找出值及其数量。

Input
url = ‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’
iris = np.genfromtxt(url, delimiter=’,’, dtype=’object’)
names = (‘sepallength’, ‘sepalwidth’, ‘petallength’, ‘petalwidth’, ‘species’)

# Extract the species column as an array
species = np.array([row.tolist()[4] for row in iris])

# Get the unique values and the counts
np.unique(species, return_counts=True)
#> (array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'],
#>        dtype='|S15'), array([50, 50, 50]))

40.如何将一个数值转换为一个类别（文本）数组？

难度：L2
问题：将 iris_2d 的 petallength（第三列）转换以构建一个文本数组，按如下规则进行转换：

Less than 3 –> ‘small’
3-5 –> ‘medium’
‘>=5 –> ‘large’

# Bin petallength 
petal_length_bin = np.digitize(iris[:, 2].astype('float'), [0, 3, 5, 10])

# Map it to respective category
label_map = {1: 'small', 2: 'medium', 3: 'large', 4: np.nan}
petal_length_cat = [label_map[x] for x in petal_length_bin]

41.如何基于 NumPy 数组现有列创建一个新的列？

难度：L2
问题：为 iris_2d 中的 volume 列创建一个新的列，volume 指 (pi x petallength x sepal_length^2)/3。

Input
url = ‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’
iris_2d = np.genfromtxt(url, delimiter=’,’, dtype=’object’)
names = (‘sepallength’, ‘sepalwidth’, ‘petallength’, ‘petalwidth’, ‘species’)

# Compute volume
sepallength = iris_2d[:, 0].astype('float')
petallength = iris_2d[:, 2].astype('float')
volume = (np.pi * petallength * (sepallength**2))/3

# Introduce new dimension to match iris_2d's
volume = volume[:, np.newaxis]

# Add the new column
out = np.hstack([iris_2d, volume])

42.如何在 NumPy 中执行概率采样？

难度：L3
问题：随机采样 iris 数据集中的 species 列，使得 setose 的数量是 versicolor 和 virginica 数量的两倍。

Import iris keeping the text column intact
url = ‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’
iris = np.genfromtxt(url, delimiter=’,’, dtype=’object’)

# Approach 1: Generate Probablistically
np.random.seed(100)
a = np.array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'])
species_out = np.random.choice(a, 150, p=[0.5, 0.25, 0.25])

# Approach 2: Probablistic Sampling (preferred)
np.random.seed(100)
probs = np.r_[np.linspace(0, 0.500, num=50), np.linspace(0.501, .750, num=50), np.linspace(.751, 1.0, num=50)]
index = np.searchsorted(probs, np.random.random(150))
species_out = species[index]
print(np.unique(species_out, return_counts=True))

43.如何在多维数组中找到一维的第二最大值？

难度：L2
问题：在 species setosa 的 petallength 列中找到第二最大值。

# Get the species and petal length columns
petal_len_setosa = iris[iris[:, 4] == b'Iris-setosa', [2]].astype('float')

# Get the second last value
np.unique(np.sort(petal_len_setosa))[-2]

44.如何用给定列将 2 维数组排序？

难度：L2
问题：基于 sepallength 列将 iris 数据集排序。

url = ‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’
iris = np.genfromtxt(url, delimiter=’,’, dtype=’object’)
names = (‘sepallength’, ‘sepalwidth’, ‘petallength’, ‘petalwidth’, ‘species’)

iris[iris[:,0].argsort()]

45.如何在 NumPy 数组中找到最频繁出现的值？

难度：L1
问题：在 iris 数据集中找到 petallength（第三列）中最频繁出现的值。

vals, counts = np.unique(iris[:, 3], return_counts=True)
print(vals[np.argmax(counts)])

46.如何找到第一个大于给定值的数的位置？

难度：L2
问题：在 iris 数据集的 petalwidth（第四列）中找到第一个值大于 1.0 的数的位置。

Input:
url = ‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’
iris = np.genfromtxt(url, delimiter=’,’, dtype=’object’)

print(np.argmax(iris[:, 3].astype('float') > 1.0))

47.如何将数组中所有大于给定值的数替换为给定的 cutoff 值？

难度：L2
问题：对于数组 a，将所有大于 30 的值替换为 30，将所有小于 10 的值替换为 10。

输入：

np.random.seed(100)
np.random.uniform(1,50, 20)

# Input
np.set_printoptions(precision=2)
np.random.seed(100)
a = np.random.uniform(1,50, 20)

# Solution 1: Using np.clip
np.clip(a, a_min=10, a_max=30)

# Solution 2: Using np.where
print(np.where(a < 10, 10, np.where(a > 30, 30, a)))
#> [ 27.63  14.64  21.8   30.    10.    10.    30.    30.    10.    29.18  30.
#>   11.25  10.08  10.    11.77  30.    30.    10.    30.    14.43]

numpy.clip(a, a_min, a_max, out=None)

# 大小控制也可以单一化
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.clip(a, [3, 4, 1, 1, 1, 4, 4, 4, 4, 4], 8)
array([3, 4, 2, 3, 4, 5, 6, 7, 8, 8])

48.如何在 NumPy 数组中找到 top-n 数值的位置？

难度：L2
问题：在给定数组 a 中找到 top-5 最大值的位置。

np.random.seed(100)
a = np.random.uniform(1,50, 20)

# Solution:
print(a.argsort())
#> [18 7 3 10 15]

# Solution 2:
np.argpartition(-a, 5)[:5]
#> [15 10  3  7 18]

# 获得相应值得方法
# Method 1:
a[a.argsort()][-5:]

# Method 2:
np.sort(a)[-5:]

# Method 3:
np.partition(a, kth=-5)[-5:]

# Method 4:
a[np.argpartition(-a, 5)][:5]

49.如何逐行计算数组中所有值的数量？

难度：L4
问题：逐行计算唯一值的数量。

输入：

np.random.seed(100)
arr = np.random.randint(1,11,size=(6, 10))
arr

array([[ 9, 9, 4, 8, 8, 1, 5, 3, 6, 3],
[ 3, 3, 2, 1, 9, 5, 1, 10, 7, 3],
[ 5, 2, 6, 4, 5, 5, 4, 8, 2, 2],
[ 8, 8, 1, 3, 10, 10, 4, 3, 6, 9],
[ 2, 1, 8, 7, 3, 1, 9, 3, 6, 2],
[ 9, 2, 6, 5, 3, 9, 4, 6, 1, 10]])

期望输出：

[[1, 0, 2, 1, 1, 1, 0, 2, 2, 0],
[2, 1, 3, 0, 1, 0, 1, 0, 1, 1],
[0, 3, 0, 2, 3, 1, 0, 1, 0, 0],
[1, 0, 2, 1, 0, 1, 0, 2, 1, 2],
[2, 2, 2, 0, 0, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 2, 0, 0, 2, 1]]

输出包含 10 个列，表示从 1 到 10 的数字。这些数值分别代表每一行的计数数量。例如，Cell(0,2) 中有值 2，这意味着，数字 3 在第一行出现了两次。

# Solution 1
def counts_of_all_values_rowwise(arr2d):
    # Unique values and its counts row wise
    num_counts_array = [np.unique(row, return_counts=True) for row in arr2d]

    # Counts of all values row wise
    return([[int(b[a==i]) if i in a else 0 for i in np.unique(arr2d)] for a, b in num_counts_array])

# Print
print(np.arange(1,11))
counts_of_all_values_rowwise(arr)

# Solution 2
c = np.zeros((6, 10)).astype(int)
for i, row in enumerate(arr):
    a, b = np.unique(row, return_counts=True)
    c[i][a-1] = b
print(c)

50.如何将 array_of_arrays 转换为平面 1 维数组？

难度：L2
问题：将 array_of_arrays 转换为平面线性 1 维数组。

Input:
arr1 = np.arange(3)
arr2 = np.arange(3,7)
arr3 = np.arange(7,10)
array_of_arrays = np.array([arr1, arr2, arr3])
array_of_arrays

array([array([0, 1, 2]), array([3, 4, 5, 6]), array([7, 8, 9])], dtype=object)

期望输出：

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Solution 1
arr_2d = np.array([a for arr in array_of_arrays for a in arr])

# Solution 2:
arr_2d = np.concatenate(array_of_arrays)

51.如何为 NumPy 数组生成 one-hot 编码？

难度：L4
问题：计算 one-hot 编码。

输入：

np.random.seed(101)
arr = np.random.randint(1,4, size=6)
arr

array([2, 3, 2, 2, 2, 1])

输出：

array([[ 0., 1., 0.],
[ 0., 0., 1.],
[ 0., 1., 0.],
[ 0., 1., 0.],
[ 0., 1., 0.],
[ 1., 0., 0.]])

# Solution:
def one_hot_encodings(arr):
    uniqs = np.unique(arr)
    out = np.zeros((arr.shape[0], uniqs.shape[0]))
    for i, k in enumerate(arr):
        out[i, k-1] = 1
    return out

one_hot_encodings(arr)
#> array([[ 0.,  1.,  0.],
#>        [ 0.,  0.,  1.],
#>        [ 0.,  1.,  0.],
#>        [ 0.,  1.,  0.],
#>        [ 0.,  1.,  0.],
#>        [ 1.,  0.,  0.]])

# Method 2:
(arr[:, None] == np.unique(arr)).view(np.int8)

52.如何创建由类别变量分组确定的一维数值？

难度：L3
问题：创建由类别变量分组的行数。使用以下来自 iris species 的样本作为输入。

输入：

url = ‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data’
species = np.genfromtxt(url, delimiter=’,’, dtype=’str’, usecols=4)
species_small = np.sort(np.random.choice(species, size=20))
species_small

array([‘Iris-setosa’, ‘Iris-setosa’, ‘Iris-setosa’, ‘Iris-setosa’,
‘Iris-setosa’, ‘Iris-setosa’, ‘Iris-versicolor’, ‘Iris-versicolor’,
‘Iris-versicolor’, ‘Iris-versicolor’, ‘Iris-versicolor’,
‘Iris-versicolor’, ‘Iris-virginica’, ‘Iris-virginica’,
‘Iris-virginica’, ‘Iris-virginica’, ‘Iris-virginica’,
‘Iris-virginica’, ‘Iris-virginica’, ‘Iris-virginica’],
dtype=’

# Input:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4)
np.random.seed(100)
species_small = np.sort(np.random.choice(species, size=20))
species_small
#> array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
#>        'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',
#>        'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
#>        'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
#>        'Iris-versicolor', 'Iris-virginica', 'Iris-virginica',
#>        'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
#>        'Iris-virginica'],
#>       dtype='<U15')

print([i for val in np.unique(species_small) for i, grp in enumerate(species_small[species_small==val])])

53.如何基于给定的类别变量创建分组 id？

难度：L4
问题：基于给定的类别变量创建分组 id。使用以下来自 iris species 的样本作为输入。

输入：

array([‘Iris-setosa’, ‘Iris-setosa’, ‘Iris-setosa’, ‘Iris-setosa’,
‘Iris-setosa’, ‘Iris-setosa’, ‘Iris-versicolor’, ‘Iris-versicolor’,
‘Iris-versicolor’, ‘Iris-versicolor’, ‘Iris-versicolor’,
‘Iris-versicolor’, ‘Iris-virginica’, ‘Iris-virginica’,
‘Iris-virginica’, ‘Iris-virginica’, ‘Iris-virginica’,
‘Iris-virginica’, ‘Iris-virginica’, ‘Iris-virginica’],
dtype=’

# Solution:
output = [np.argwhere(np.unique(species_small) == s).tolist()[0][0] for val in np.unique(species_small) for s in species_small[species_small==val]]

# Solution: For Loop version
output = []
uniqs = np.unique(species_small)

for val in uniqs:  # uniq values in group
    for s in species_small[species_small==val]:  # each element in group
        groupid = np.argwhere(uniqs == s).tolist()[0][0]  # groupid
        output.append(groupid)

54.如何使用 NumPy 对数组中的项进行排序？

难度：L2
问题：为给定的数值数组 a 创建排序。

输入：

np.random.seed(10)
a = np.random.randint(20, size=10)print(a)#> [ 9 4 15 0 17 16 17 8 9 0]

期望输出：

[4 2 6 0 8 7 9 3 5 1]

np.random.seed(10)
a = np.random.randint(20, size=10)
print('Array: ', a)

# Solution
print(a.argsort().argsort())
print('Array: ', a)
#> Array:  [ 9  4 15  0 17 16 17  8  9  0]
#> [4 2 6 0 8 7 9 3 5 1]
#> Array:  [ 9  4 15  0 17 16 17  8  9  0]

55.如何使用 NumPy 对多维数组中的项进行排序？

难度：L3
问题：给出一个数值数组 a，创建一个形态相同的排序数组。

输入：

np.random.seed(10)
a = np.random.randint(20, size=[2,5])
print(a)

[[ 9 4 15 0 17]
[16 17 8 9 0]]

期望输出：

[[4 2 6 0 8]
[7 9 3 5 1]]

# Solution
print(a.ravel().argsort().argsort().reshape(a.shape))

56.如何在 2 维 NumPy 数组中找到每一行的最大值？

难度：L2
问题：在给定数组中找到每一行的最大值。

np.random.seed(100)
a = np.random.randint(1,10, [5,3])
a

array([[9, 9, 4],
[8, 8, 1],
[5, 3, 6],
[3, 3, 3],
[2, 1, 9]])

# Solution 1
np.amax(a, axis=1)

# Solution 2
np.apply_along_axis(np.max, arr=a, axis=1)

# Solution 3
[np.max(row) for row in a]

57.如何计算 2 维 NumPy 数组每一行的 min-by-max？

难度：L3
问题：给定一个 2 维 NumPy 数组，计算每一行的 min-by-max。

np.random.seed(100)
a = np.random.randint(1,10, [5,3])
a

array([[9, 9, 4],
[8, 8, 1],
[5, 3, 6],
[3, 3, 3],
[2, 1, 9]])

# Solution
np.apply_along_axis(lambda x: np.min(x)/np.max(x), arr=a, axis=1)
#> array([ 0.44444444,  0.125     ,  0.5       ,  1.        ,  0.11111111])

58.如何在 NumPy 数组中找到重复条目？

难度：L3
问题：在给定的 NumPy 数组中找到重复条目（从第二次出现开始），并将其标记为 True。第一次出现的条目需要标记为 False。

Input
np.random.seed(100)
a = np.random.randint(0, 5, 10)
print(‘Array: ‘, a)

Array: [0 0 3 0 2 4 2 2 2 2]
期望输出：

[False True False True False False True True True True]

## Solution
# There is no direct function to do this as of 1.13.3

# Create an all True array
out = np.full(a.shape[0], True)

# Find the index positions of unique elements
unique_positions = np.unique(a, return_index=True)[1]

# Mark those positions as False
out[unique_positions] = False

print(out)
#> [False  True False  True False False  True  True  True  True]

59.如何找到 NumPy 的分组平均值？

难度：L3
问题：在 2 维 NumPy 数组的类别列中找到数值的平均值。

输入：

期望解：

[[b’Iris-setosa’, 3.418],
[b’Iris-versicolor’, 2.770],
[b’Iris-virginica’, 2.974]]

# Solution
# No direct way to implement this. Just a version of a workaround.
numeric_column = iris[:, 1].astype('float')  # sepalwidth
grouping_column = iris[:, 4]  # species

# List comprehension version
[[group_val, numeric_column[grouping_column==group_val].mean()] for group_val in np.unique(grouping_column)]

# For Loop version
output = []
for group_val in np.unique(grouping_column):
    output.append([group_val, numeric_column[grouping_column==group_val].mean()])

60.如何将 PIL 图像转换成 NumPy 数组？

难度：L3
问题：从以下 URL 中导入图像，并将其转换成 NumPy 数组。

URL = ‘https://upload.wikimedia.org/wikipedia/commons/8/8b/Denali_Mt_McKinley.jpg’

from io import BytesIO
from PIL import Image
import PIL, requests

# Import image from URL
URL = 'https://upload.wikimedia.org/wikipedia/commons/8/8b/Denali_Mt_McKinley.jpg'
response = requests.get(URL)

# Read it as Image
I = Image.open(BytesIO(response.content))

# Optionally resize
I = I.resize([150,150])

# Convert to numpy array
arr = np.asarray(I)

# Optionaly Convert it back to an image and show
im = PIL.Image.fromarray(np.uint8(arr))
Image.Image.show(im)

61.如何删除 NumPy 数组中所有的缺失值？

难度：L2
问题：从 1 维 NumPy 数组中删除所有的 nan 值。

输入：

np.array([1,2,3,np.nan,5,6,7,np.nan])

期望输出：

array([ 1., 2., 3., 5., 6., 7.])

a = np.array([1,2,3,np.nan,5,6,7,np.nan])
a[~np.isnan(a)]

62.如何计算两个数组之间的欧几里得距离？

难度：L3
问题：计算两个数组 a 和 b 之间的欧几里得距离。

输入：

a = np.array([1,2,3,4,5])
b = np.array([4,5,6,7,8])

# Solution
dist = np.linalg.norm(a-b)

63.如何在一个 1 维数组中找到所有的局部极大值（peak）？

难度：L4
问题：在 1 维数组 a 中找到所有的 peak，peak 指一个数字比两侧的数字都大。

输入：

a = np.array([1, 3, 7, 1, 2, 6, 0, 1])
999999
期望输出：

array([2, 5])

a = np.array([1, 3, 7, 1, 2, 6, 0, 1])
doublediff = np.diff(np.sign(np.diff(a)))
peak_locations = np.where(doublediff == -2)[0] + 1

64.如何从 2 维数组中减去 1 维数组，从 2 维数组的每一行分别减去 1 维数组的每一项？

难度：L2
问题：从 2 维数组 a_2d 中减去 1 维数组 b_1d，即从 a_2d 的每一行分别减去 b_1d 的每一项。

输入：

a_2d = np.array([[3,3,3],[4,4,4],[5,5,5]])
b_1d = np.array([1,2,3]

期望输出：

[[2 2 2]
[2 2 2]
[2 2 2]]

# Input
a_2d = np.array([[3,3,3],[4,4,4],[5,5,5]])
b_1d = np.array([1,2,3])

# Solution
print(a_2d - b_1d[:,None])
#> [[2 2 2]
#>  [2 2 2]
#>  [2 2 2]]

65.如何在数组中找出某个项的第 n 个重复索引？

难度：L2
问题：找到数组 x 中数字 1 的第 5 个重复索引。

x = np.array([1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2])

x = np.array([1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2])
n = 5

# Solution 1: List comprehension
[i for i, v in enumerate(x) if v == 1][n-1]

# Solution 2: Numpy version
np.where(x == 1)[0][n-1]
#> 8

66.如何将 NumPy 的 datetime64 对象（object）转换为 datetime 的 datetime 对象？

难度：L2
问题：将 NumPy 的 datetime64 对象（object）转换为 datetime 的 datetime 对象。

Input: a numpy datetime64 object
dt64 = np.datetime64(‘2018-02-25 22:10:10’)

# Input: a numpy datetime64 object
dt64 = np.datetime64('2018-02-25 22:10:10')

# Solution
from datetime import datetime
dt64.tolist()

# or

dt64.astype(datetime)
#> datetime.datetime(2018, 2, 25, 22, 10, 10)

67.如何计算 NumPy 数组的移动平均数？

难度：L3
问题：给定 1 维数组，计算 window size 为 3 的移动平均数。

输入：

np.random.seed(100)
Z = np.random.randint(10, size=10)

# Solution
# Source: https://stackoverflow.com/questions/14313510/how-to-calculate-moving-average-using-numpy
def moving_average(a, n=3) :
    ret = np.cumsum(a, dtype=float)
    ret[n:] = ret[n:] - ret[:-n]
    return ret[n - 1:] / n

np.random.seed(100)
Z = np.random.randint(10, size=10)
print('array: ', Z)
# Method 1
moving_average(Z, n=3).round(2)

# Method 2:  # Thanks AlanLRH!
# np.ones(3)/3 gives equal weights. Use np.ones(4)/4 for window size 4.
np.convolve(Z, np.ones(3)/3, mode='valid') . 


#> array:  [8 8 3 7 7 0 4 2 5 2]
#> moving average:  [ 6.33  6.    5.67  4.67  3.67  2.    3.67  3.  ]

68.给定起始数字、length 和步长，如何创建一个 NumPy 数组序列？

难度：L2
问题：从 5 开始，创建一个 length 为 10 的 NumPy 数组，相邻数字的差是 3。

# 1
length = 10
start = 5
step = 3

def seq(start, length, step):
    end = start + (step*length)
    return np.arange(start, end, step)

seq(start, length, step)
#> array([ 5,  8, 11, 14, 17, 20, 23, 26, 29, 32])

# 2
np.arange(5,5+3*10,3)

69.如何在不规则 NumPy 日期序列中填充缺失日期？

难度：L3
问题：给定一个非连续日期序列的数组，通过填充缺失的日期，使其变成连续的日期序列。

输入：

Input
dates = np.arange(np.datetime64(‘2018-02-01’), np.datetime64(‘2018-02-25’), 2)
print(dates)

[‘2018-02-01’ ‘2018-02-03’ ‘2018-02-05’ ‘2018-02-07’ ‘2018-02-09’
‘2018-02-11’ ‘2018-02-13’ ‘2018-02-15’ ‘2018-02-17’ ‘2018-02-19’
‘2018-02-21’ ‘2018-02-23’]

# Solution ---------------
filled_in = np.array([np.arange(date, (date+d)) for date, d in zip(dates, np.diff(dates))]).reshape(-1)

# add the last day
output = np.hstack([filled_in, dates[-1]])
output

# For loop version -------
out = []
for date, d in zip(dates, np.diff(dates)):
    out.append(np.arange(date, (date+d)))

filled_in = np.array(out).reshape(-1)

# add the last day
output = np.hstack([filled_in, dates[-1]])

# my
dates.sort()
np.arange(dates[0],dates[-1]+1,1)

70.如何基于给定的 1 维数组创建 strides？

难度：L4
问题：给定 1 维数组 arr，使用 strides 生成一个 2 维矩阵，其中 window length 等于 4，strides 等于 2，例如 [[0,1,2,3], [2,3,4,5], [4,5,6,7]..]。

输入：

arr = np.arange(15)
arr

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])

期望输出：

[[ 0 1 2 3]
[ 2 3 4 5]
[ 4 5 6 7]
[ 6 7 8 9]
[ 8 9 10 11]
[10 11 12 13]]

def gen_strides(arr, stride_len=5, window_len=5):
    n_strides = ((a.size-window_len)//stride_len) + 1
    # return np.array([a[s:(s+window_len)] for s in np.arange(0, a.size, stride_len)[:n_strides]])
    return np.array([a[s:(s+window_len)] for s in np.arange(0, n_strides*stride_len, stride_len)])

print(gen_strides(np.arange(15), stride_len=2, window_len=4))
#> [[ 0  1  2  3]
#>  [ 2  3  4  5]
#>  [ 4  5  6  7]
#>  [ 6  7  8  9]
#>  [ 8  9 10 11]
#>  [10 11 12 13]]

NumPy能力大评估：70道测试题

猜你喜欢