Numpy study notes (Part II)

Numpy study notes (Part II)


The road is long Come, happiness and earth! Numpy study notes (Part I)

A combined array of the division operation Numpy

In using machine learning algorithms often use both.

1, merge operations

import numpy as np
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
z = np.array([666, 666, 666])
  • np.concatenate ([,], axis = ) Default axis = 0, after the splice is returned to a new array. It does not change the original array.
np.concatenate([x, y])

Run output: array ([1, 2, 3, 3, 2, 1])

np.concatenate([x, y, z])

Run output: array ([1, 2, 3, 3, 2, 1, 666, 666, 666])

The above is a mosaic of one-dimensional array, then look at two-dimensional.

A = np.array([[1, 2, 3],
             [4, 5, 6]])    # A.shape=(2,3),从第一个维度上拼接就是(4,3)
np.concatenate([A, A])

Run output:

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])
np.concatenate([A, A], axis=1)      # 从第二个维度上拼接就是(2,6)

Run output:

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

So, can the A and z spliced ​​together? Obviously not, because dimensional array z is 1, and A is 2-dimensional array operation error. z.shape = (3,), we can use this time to operate reshape installed for the first 2-dimensional array, and then stitching.

np.concatenate([A, z.reshape(1, -1)])

Run output:

array([[  1,   2,   3],
       [  4,   5,   6],
       [666, 666, 666]])

In fact, a good package numpy has a function to solve the issue of merger between different dimensions.

  • np.vstack ()
np.vstack([A, z])

Run output:

array([[  1,   2,   3],
       [  4,   5,   6],
       [666, 666, 666]])
  • np.hstack ()
B = np.full((2, 2), 100)
np.hstack([A, B])
array([[  1,   2,   3, 100, 100],
       [  4,   5,   6, 100, 100]])

2, the split operation

  • np.split(x, [,], axis=)

    The first parameter is the segmented object, the second parameter is the division points, and dividing points may not be unique. The default is 0 axis

x = np.arange(10)
x

Run output: array ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

np.split(x, [3, 7])

Run output: [array ([0, 1, 2]), array ([3, 4, 5, 6]), array ([7, 8, 9])]

Also, try the following two-dimensional array.

A = np.arange(16).reshape(4, 4)
A

Run output:

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
np.split(A, [3])

Run output:

[array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]]), array([[12, 13, 14, 15]])]
np.split(A, [3], axis=1)

Run output:

[array([[ 0,  1,  2],
        [ 4,  5,  6],
        [ 8,  9, 10],
        [12, 13, 14]]), array([[ 3],
        [ 7],
        [11],
        [15]])]

In fact, since there numpy in vertical and horizontal stitching, it has vertical and horizontal split.

  • np.vsplit ()
np.vsplit(A, [3])

Run output:

[array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]]), array([[12, 13, 14, 15]])]
np.hsplit(A, [3])

Run output:

[array([[ 0,  1,  2],
        [ 4,  5,  6],
        [ 8,  9, 10],
        [12, 13, 14]]), array([[ 3],
        [ 7],
        [11],
        [15]])]

It can be found through the above comparison: in fact vsplit is split the time axis = 0, and hsplit is split the time axis = 1!

So then do a little short answer exercises, now there is the following set of data, the first three as the data, and finally as a sample of the label, at this time we need to split it apart, while the label (the last one) is converted to vector:

array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])

data = np.arange(16).reshape(4, 4)
x, y = np.hsplit(data, [-1])

Run output:

array([[ 0,  1,  2],
       [ 4,  5,  6],
       [ 8,  9, 10],
       [12, 13, 14]])
array([[ 3],
       [ 7],
       [11],
       [15]])

Next you need to convert vector array.

y[:, 0]

Run output: array ([3, 7, 11, 15])

Two, Numpy the matrix operation

Now there is such a problem: given a vector, so that each element of the vector is multiplied by 2, a = (0,1,2), a * 2 = (0,2,4)

L = [i for i in range(10)]
L * 2

Run output: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

This is obviously not the result we want. So how do you want to achieve?

A = []
for i in L:
    A.append(i * 2)
A

Run output: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

This is certainly not the best way, here's a few to compare the speed of implementation.

%%time
L = [i for i in range(100000)]
A = []
for i in L:
    A.append(i * 2)
A

Run output: Wall time: 14.6 ms

%%time
L = [i*2 for i in range(100000)]

Run output: Wall time: 6.83 ms

%%time
import numpy as np
A = np.array(i*2 for i in range(100000000000000))
A
%%time
L = np.arange(10)
L * 2
L

Run output: Wall time: 0 ns

0 Why is it? In fact, since A returns a generator, no matter how much behind the number is the same. This is numpy in large data operations advantage. About Builder .

import numpy as np
L = np.arange(10)
L * 2

Run output: array ([0, 2, 4, 6, 8, 10, 12, 14, 16, 18])

1、Universal Function

X = np.arange(1, 16).reshape(3, 5)
X

Run output:

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15]])
  • addition
X + 1

Run output:

array([[ 2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16]])
  • Subtraction
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
  • multiplication
X * 2

Run output:

array([[ 2,  4,  6,  8, 10],
       [12, 14, 16, 18, 20],
       [22, 24, 26, 28, 30]])
  • division
X / 2

Run output:

array([[0.5, 1. , 1.5, 2. , 2.5],
       [3. , 3.5, 4. , 4.5, 5. ],
       [5.5, 6. , 6.5, 7. , 7.5]])
X // 2

Run output:

array([[0, 1, 1, 2, 2],
       [3, 3, 4, 4, 5],
       [5, 6, 6, 7, 7]], dtype=int32)
  • Remainder
X % 2

Run output:

array([[1, 0, 1, 0, 1],
       [0, 1, 0, 1, 0],
       [1, 0, 1, 0, 1]], dtype=int32)
  • Reciprocal --1 / X
  • Absolute value --np.abs ()
  • Sine function --np.sin ()
  • Cosine function --np.cos ()
  • Tangent function --np.tan ()
  • Also there are inverse sine, cosine, arc tangent, and so on.
  • Exponential function --np.exp ()
  • np.power()
  • np.log()
  • np.log2 ()
  • np.log10 ()

2, matrix operation

A = np.arange(4).reshape(2, 2)
B = np.full((2, 2), 10)
  • A+B

  • A-B

  • A*B

  • A/B

    The above method is the corresponding element of the corresponding operation, so if you need to perform matrix multiplication how to do?

  • Matrix multiplication --np.dot ()

A.dot(B)

Run output:

array([[10, 10],
       [50, 50]])
  • Matrix transpose
A.T

Run output:

array([[0, 2],
       [1, 3]])
  • --Np.linalg.inv inverse matrix ()
np.linalg.inv(A)

Run output:

array([[-1.5,  0.5],
       [ 1. ,  0. ]])
np.linalg.inv(A).dot(A)

Run output:

array([[1., 0.],
       [0., 1.]])

This also verifies the A * A [^ - 1] = E

  • --Np.linalg.pinv pseudo-inverse matrix ()

    In many cases, we may not be a square matrix, then under normal circumstances, at this time we are unable to obtain the inverse matrix of. But the pseudo-inverse matrix can be obtained.

C = np.arange(0, 16).reshape(2, 8)
C = np.arange(0, 16).reshape(2, 8)

Run output:

array([[-1.35416667e-01,  5.20833333e-02],
       [-1.01190476e-01,  4.16666667e-02],
       [-6.69642857e-02,  3.12500000e-02],
       [-3.27380952e-02,  2.08333333e-02],
       [ 1.48809524e-03,  1.04166667e-02],
       [ 3.57142857e-02, -1.04083409e-17],
       [ 6.99404762e-02, -1.04166667e-02],
       [ 1.04166667e-01, -2.08333333e-02]])
C.dot(np.linalg.pinv(C))

Run output:

array([[ 1.00000000e+00, -2.49800181e-16],
       [ 0.00000000e+00,  1.00000000e+00]])

Can be found by the above results, the approximate matrix described pseudo inverse matrix is ​​determined, approximated. For more specific pseudo-inverse matrix to solve their own Baidu!

3, vector and matrix operations

A = np.arange(4).reshape(2, 2)
v = np.array([1, 2])
  • the + A
v + A

Run output:

array([[1, 3],
       [3, 5]])
np.vstack([v] * A.shape[0])

Run output:

array([[1, 2],
       [1, 2]])
np.vstack([v] * A.shape[0]) + A

Run output:

array([[1, 3],
       [3, 5]])

In this case, the two results obtained can be found are the same. In fact, the function package already pyhon stacked

  • np.tile()
np.tile(v, (2, 1))

Run output:

array([[1, 2],
       [1, 2]])
np.tile(v, (2, 1)) + A

Run output:

array([[1, 3],
       [3, 5]])
  • in A *
v * A

Run output:

array([[0, 2],
       [2, 6]])
  • A.dot(v)
A.dot(v)

Run output: array ([2, 8])

  • v.dot(A)
v.dot(A)

Run output: array ([4, 7])

Three, Numpy polymerization operation

import numpy as np
L = np.random.random(100)

Run output:

array([0.21395159, 0.90268106, 0.88705369, 0.11517909, 0.62676208,
       0.56121013, 0.62103571, 0.2418181 , 0.13781453, 0.66670862,
       0.51939238, 0.99679432, 0.06384017, 0.5974129 , 0.22196488,
       0.93826983, 0.83706847, 0.63491905, 0.48828241, 0.85424059,
       0.86514318, 0.47937265, 0.34254143, 0.89577197, 0.14823176,
       0.94488872, 0.57030248, 0.57643624, 0.08268558, 0.8237711 ,
       0.21887705, 0.46440547, 0.9338367 , 0.132422  , 0.4867988 ,
       0.6545799 , 0.36226663, 0.01641314, 0.67876507, 0.35811434,
       0.36533195, 0.12174504, 0.37477359, 0.98791281, 0.20553232,
       0.65235494, 0.13567244, 0.92317556, 0.82237976, 0.62747037,
       0.41160535, 0.46839494, 0.06753446, 0.22386476, 0.20821765,
       0.11778734, 0.8643039 , 0.77497708, 0.9884161 , 0.65142779,
       0.2374325 , 0.32467954, 0.81959546, 0.9863651 , 0.54072234,
       0.21293241, 0.92733881, 0.98738362, 0.90565471, 0.23441948,
       0.05477787, 0.69157053, 0.49194796, 0.12415383, 0.55427813,
       0.29040539, 0.20166942, 0.30054924, 0.30772375, 0.90932004,
       0.84668024, 0.51970052, 0.67773186, 0.37401172, 0.43911304,
       0.98495573, 0.42493635, 0.83658015, 0.35920119, 0.91977698,
       0.95094167, 0.03354397, 0.92045222, 0.80083071, 0.03480189,
       0.22378161, 0.21437509, 0.33268728, 0.51601075, 0.61235958])
  • Summing --np.sum ()
sum(L)

Run output: 52.28029464862967

np.sum(L)

Run output: 52.28029464862967

So what do these two are not the same? In fact, it is not the same on pure efficiency.

  • Minimum --np.min ()
np.min(L)

Run output: .016413139615859218

  • Maximum --np.max ()
np.max(L)

Run output: 0.9967943174823842

Next, try a two-dimensional array.

X = np.arange(16).reshape(4, -1)
X

Run output:

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
np.sum(X)

Run output: 120

However, many times we do not need to count all the sums, but simply requires that each row and each column or.

np.sum(X, axis=0)

Run output: array ([24, 28, 32, 36])

np.sum(X, axis=1)

Run output: array ([6, 22, 38, 54])

Here, put a little skill, axis = 0 row compression is actually put out, that means no matter how many lines directly compressed into one line, that is, the sum of each row put together, axis = 1 is actually put Columnar compression out, the end result is that each line a number. Because the rows and columns sum summation remember it's not that easy.

  • Multiplicative --np.prod ()
np.prod(X)

Run output: 0

np.prod(X + 1)

Run output: 2004189184

  • Mean --np.mean ()
np.mean(X)

Run output: 7.5

  • Median number of --np.median ()
np.median(X)

Run output: 7.5

  • Percentile --np.precentile ()
X = np.arange(16).reshape(4, -1)
for percent in [0, 25, 50, 75, 100]:
    print(np.percentile(X, q=percent))

Run output:

0.0
3.75
7.5
11.25
15.0
  • Variance --np.var ()
np.var(X)

Run output: 21.25

  • Standard deviation --np.std ()
np.std(X)

Run output: 4.6097722286464435

Four, Numpy the operation arg

1, the index operation

  • () np.argmin index position where the minimum value of #
  • () np.argmax index position where the maximum #

2, sorting and indexing use

# 首先生成一个乱序数组
import numpy as np
x = np.arange(16)
np.random.shuffle(x)
x

Run output: array ([4, 2, 8, 14, 0, 15, 6, 3, 11, 7, 13, 1, 12, 10, 9, 5])

  • np.sort (x, axis =) Default axis = 1
np.sort(x)

Run output: array ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])

At this point, there is no change x, x is still a disordered state, if you want to sort directly on x:

x.sort()
x

Run output: array ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])

Then for two-dimensional matrix it?

X = np.random.randint(10, size=(4, 4))
X

Run output:

array([[5, 3, 9, 2],
       [3, 7, 5, 7],
       [0, 6, 2, 0],
       [8, 7, 4, 8]])
np.sort(X, axis=0)

Run output:

array([[0, 3, 2, 0],
       [3, 6, 4, 2],
       [5, 7, 5, 7],
       [8, 7, 9, 8]])
  • np.argsort () sorted index position
import numpy as np
x = np.arange(16)
np.random.shuffle(x)
np.argsort(x)

Run output:

array([ 1, 15,  9,  0, 10,  8, 12, 13,  5,  4,  6,  2,  3, 14, 11,  7],
      dtype=int64)
  • np.partition()

    In fact, in many cases, we do not need to count all be sorted in descending order, but to find a middle value, less than the median value of the left, to the right is greater than the median value.

np.partition(x, 3)

Run output: array ([0, 1, 2, 3, 9, 8, 10, 12, 5, 11, 4, 14, 6, 7, 13, 15])

  • np.argpartition ()
np.argpartition(x, 3)

Run output:

array([ 1, 15,  9,  0,  4,  5,  6,  3,  8,  2, 10, 11, 12, 13, 14,  7],
      dtype=int64)

五、Fancy Indexing

import numpy as np
x = np.arange(16)

If we need two intervals from 3-9 to take a number?

x[3:9:2]

Run output: array ([3, 5, 7])

If we need to get data is not equally spaced it?

idx = [3, 5, 8]
x[idx]

Run output: array ([3, 5, 8])

ind = np.array([[2, 3],
               [4, 5]])
x[ind]

Run output:

array([[2, 3],
       [4, 5]])
X = x.reshape(4, -1)
row = np.array([0, 1, 2])
col = np.array([1, 2, 3])
X[row, col]

Run output: array ([1, 6, 11])

col = [True, False, True, True]
X[1:3, col]

Run output:

array([[ 4,  6,  7],
       [ 8, 10, 11]])

Six, Numpy.array comparison

import numpy as np
x = np.arange(16)
x > 3

Run output:

array([False, False, False, False,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True])
  • '>'

  • '<'

  • '>='

  • '<='

  • '=='

  • '!='

    Combined with aggregate operations just learned, some exercises.

np.sum(x <= 3)

Run output: 4

np.sum((x >= 3) & (x <= 10))

Run output: 8

np.count_nonzero(x <= 3)        # True为1,False为0

Run output: 4

  • np.any()
  • np.all()
  • versus--&
  • Or - |
  • Non - ~

I am a tail

Each one poison chicken soup: Do not touch your own. Most people seem efforts, but a foolish caused.

I did, you are free!

The recommendation: a picture viewer, Preview easier

honeyview

adhere to!

Guess you like

Origin www.cnblogs.com/zhangkanghui/p/11280845.html