Python Data Analysis (a): NumPy Basics

1 Introduction
2. Use

2.1 ndarray
2.2 Data Types
2.3 index and sliced
2.4 copy of view
2.5 Concept shaft
2.6 Basic operation
2.7 Common operations

1 Introduction

NumPy (Numerical Python) is an open source Python scientific computing extensions is mainly used to process the matrix array of any dimension, generally calculated for the same task, Python NumPy than directly using the basic data structure should be simple, efficient and more. Installation pip install numpycommand.

2. Use

2.1 ndarray

ndarray i.e. n dimension array type, it is a collection of the same data type, labeled 0 to index the start elements in a set.

Can be used to create an array of NumPy array method, the following format:

array (p_object, dtype = None, copy = True, order = 'K', tested = False, ndmin = 0)

p_object: nested array or the number of columns
dtype: Data type of the array element
copy: if you want to copy
order: Create a style array, C is the row direction, F is the column direction, A is an arbitrary direction (default)
subok: a default return consistent with an array of base class
ndmin: generating a smallest dimension of the array

Of course, the method may also be used arange following examples look at the specific use.

Creating an array

Look at how to create a one-dimensional array

import numpy as np

arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array(range(1, 6))
arr3 = np.arange(1, 6)
print (arr1)
print (arr2)
print (arr3)

Look at how to create multidimensional arrays, two-dimensional array, for example

import numpy as np

arr = np.array([[1, 2], [3, 4], [5, 6]])
print(arr)

Common properties

By way of example look at the common properties of objects ndarray

import numpy as np

arr = np.array([1, 2, 3])
# 元素类型
print(arr.dtype)
# 形状
print(arr.shape)
# 元素个数
print(arr.size)
# 维度
print(arr.ndim)
# 每个元素大小（字节）
print(arr.itemsize)

Change the shape of the array

import numpy as np

arr = np.arange(30)
print(arr)
# 变成二维数组
arr.shape = (5, 6)
print(arr)
# 变成三维数组
arr = arr.reshape((2, 3, 5))
print(arr)

2.2 Data Types

By the following table look NumPy of common data types.

Types of	description
int_	The default integer type (similar to the C language long, int32 or Int64)
intc	The same type C and int, int is generally 64 or int32
intp	For the integer type of the index (an ssize_t C-like, generally remains int32 or Int64)
int8	Byte (-128 to 127)
int16	Integer (-32768 to 32767)
int32	Integer (-2147483648 to 2147483647)
int64	Integer (-9223372036854775808 to 9223372036854775807)
uint8	Unsigned integer (0 to 255)
uint16	Unsigned integer (0 to 65535)
uint32	Unsigned integer (0 to 4294967295)
uint64	Unsigned integer (0 to 18446744073709551615)
bool_	Boolean data type (True or False)
float_	float64 type of shorthand
float16	Half-precision floating-point format, comprising: a sign bit, five-bit exponent, 10 mantissa bits
float32	Single-precision floating point number, comprising: a sign bit, eight exponent, 23 mantissa bits
float64	Double precision floating point, comprising: a sign bit, 11 exponent bits, 52 bits mantissa
complex_	complex128 shorthand type, i.e., a plurality of 128-bit
complex64	Complex, 32-bit floating-point number represents bis (real number part and imaginary number part)
complex128	Complex, 64-bit floating-point number represents bis (real number part and imaginary number part)

How to modify the data type of view by example.

import numpy as np

arr1 = np.array([1, 2, 3])
arr2 = np.array([1.111, 2.222, 3.333])
# 当前数据类型
print(arr1.dtype)
# 修改数据类型
arr1 = arr1.astype(np.int64)
print(arr1.dtype)
# 保留一位小数
arr2 = np.round(arr2, 1)
print(arr2)

2.3 index and sliced

NumPy array support indexing, slicing operation, can also be iterative, look at the one-dimensional array.

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])
print(arr[3])
# 修改元素值
arr[3] = 10
print(arr[3])
print(arr[2:])
print(arr[2:4])
print(arr[4:])
for i in arr:
    print(i)

Look at these operations multidimensional arrays.

import numpy as np

arr = np.arange(12).reshape(3, 4)
print(arr)
# 取某一个值
print(arr[2, 3])
# 取多个不连续的值
print(arr[[0, 2],[1, 3]])
# 取一行
print(arr[0])
# 连续取多行
print(arr[1:])
# 取不连续的多行
print(arr[[0, 2]])
# 取一列
print(arr[:, 0])
# 连续取多列
print(arr[:, 2:])
# 取不连续的多列
print(arr[:, [0, 2]])

2.4 copy of view

View (shallow copy) is just a reference to the original data can be accessed by this reference, the operation of the original data, if we modify the view, it will affect the original data, because shallow copy shared memory.

Copy (deep copy) is a complete copy of the data, if we make changes to the copy, it will not affect the original data, because deep copy is not shared memory.

The call ndarray view () method will produce a view of, the following by way of example look.

import numpy as np

a = np.arange(6).reshape(2,3)
# 创建视图
b = a.view()
print('a的id：', id(a))
print('b的id：', id(b))
# 修改 b 的形状
b.shape =  3,2
print('a的形状：')
print(a)
print('b的形状：')
print(b)
print(a is b)

Call the copy ndarray () method will produce a copy of the following by way of example to look at.

import numpy as np

a = np.arange(1, 6)
# 创建副本
b = a.copy()
print(a is b)
b[1] = 10
print(a[1])
print(b[1])

2.5 Concept shaft

NumPy is simple in the shaft direction is meant the use of digital 0,1,2 said axis a one-dimensional array of only 0, 0, 1 shaft two-dimensional array, a three-dimensional array has 0,1,2 shaft, understood axis corresponding concept can help us make the appropriate calculations.

2.6 Basic operation

Operation between the array and the digital

Plus a look between the array and the number, subtract, multiply, divide and multiply.

import numpy as np

arr = np.arange(12).reshape(3, 4)
print(arr + 3)
print(arr - 1)
print(arr * 2)
print(arr / 3)

Operation between arrays and arrays

Look at the operation of the array and between arrays.

import numpy as np

# 相同行数，相同列数
a = np.arange(12).reshape(3, 4)
b = np.arange(20, 32).reshape(3, 4)
print(a + b)
print(b * a)
# 相同行数
c = np.arange(12).reshape(3, 4)
d = np.arange(3).reshape(3, 1)
print(c + d)
print(c - d)
# 相同列数
e = np.arange(12).reshape(4, 3)
f = np.arange(3).reshape(1, 3)
print(e * f)
print(e - f)

Common Math

import numpy as np

arr = np.array([[33, 55], [11, 66], [22, 99]])
print(arr)
# 最大值
print(np.max(arr))
# 最小值
print(np.min(arr))
# 某一轴上的最大值
print(np.max(arr, 1))
# 某一轴上的最小值
print(np.min(arr, 1))
# 平均值
print(np.mean(arr))
# 某一行、一列的平均值
print(np.mean(arr, axis=1))
# 最大值索引
print(np.argmax(arr))
print(np.argmax(arr, axis=1))
# 最小值索引
print(np.argmin(arr))
print(np.argmin(arr, axis=1))
# 极差
print(np.ptp(arr))
print(np.ptp(arr, axis=1))
# 方差
print(np.var(arr))
# 标准差
print(np.std(arr))
# 中位数
print(np.median(arr))

2.7 Common operations

Add operation

NumPy the append () method may add the value of the end of the array, the operation assigned to the entire array, and copy the original array to the new array, the need to ensure the operation of the input matching dimensions, using the following exemplary look.

import numpy as np

arr = np.array([[1, 3, 5], [2, 4, 6]])
# 添加元素
print(np.append(arr, [1, 1, 3]))
# 沿 0 轴添加元素
print(np.append(arr, [[1, 1, 3]], axis=0))
# 沿 1 轴添加元素
print(np.append(arr, [[1, 1, 3], [2, 1, 5]], axis=1))

We can also use the insert () method add operation, which is inserted into an axial array value at the given index forefront given, using the following exemplary look.

import numpy as np

arr = np.array([[1, 3, 5], [2, 4, 6]])
# 添加元素
print(np.insert(arr, 1, [1, 1, 3]))
# 沿 0 轴添加元素
print(np.insert(arr, 1, [1, 1, 3], axis=0))
# 沿 1 轴添加元素
print(np.insert(arr, 1, [1, 5], axis=1))

Deletion

NumPy the delete () can delete an array, use the following look at an example.

import numpy as np

arr = np.array([[1, 3, 5], [2, 4, 6]])
# 删除元素
print(np.delete(arr, 1))
# 沿 0 轴删除元素
print(np.delete(arr, 1, axis=0))
# 沿 1 轴删除元素
print(np.delete(arr, 1, axis=1))

Deduplication operation

NumPy the unique () method can remove repetitive elements in the array.

import numpy as np

arr = np.array([1, 3, 5, 2, 4, 6, 1, 5, 3])
# 去除重复元素
print(np.unique(arr))
# 去重数组的索引数组
u, indices = np.unique(arr, return_index=True)
print(indices)
# 去重元素的重复数量
u, indices = np.unique(arr, return_counts=True)
print(indices)