数据挖掘工具numpy（一）Numpy基本认识

一，Numpy的优势

ndarray 对象由计算机内存中的一维连续区域组成，带有将每个元素映射到内存块中某个位置的索引方案。内存块以按行（C风格）或按列（FORTRAN 或 MatLab 风格）的方式保存元素。

1，Numpy的优势

numpy的优势在运算速度快，是帮助处理数值型数据的，多用于大型、多维数组上的执行数值运算。

numpy是以连续的内存形式进行存储的。内存有两种排列方式“c-type”(行排列)、“Fortran”(列排列)。
numpy可以实行并行化运算，不仅使用c来实现，还使用了BLAS。也就是说numpy底层使用BLAS做向量。可以做到真正的多线程。

2，代码案例

通过以下实例可以看出，np在计算速度方面的优势非常明显，是普通计算速度的10倍。

import timeit
import random
import numpy as np

a = []
for i in range(100000):
    a.append(random.random())
b=np.array(a)

def nornal_add():
    sum1 = sum(a)

def numpy_add():
    sum3 = np.sum(b)

timer = timeit.Timer(nornal_add,"from __main__ import nornal_add")
print("%s: %f seconds" % (nornal_add,timer.timeit(number=1000)))
timer = timeit.Timer(numpy_add,"from __main__ import numpy_add")
print("%s: %f seconds" % (numpy_add,timer.timeit(number=1000)))

# -----------output-----------------
<function nornal_add at 0x000001CADA40C1E0>: 0.504544 seconds
<function numpy_add at 0x000001CAE3276510>: 0.047483 seconds

二，Numpy的属性：

Numpy最重要的一个特点就是其N纬数组对象(即ndarray),该对象是一个快速而灵活的大数据集容器。你可以利用这种数组对整块的数据执行一些数学运算。
ndarray是一个通用的同构数据多维容器，其中的所有元素必须是相同类型的。每个数组都有一个shape(表示各维度大小的元组)和一个dtype(表示数组数据类型的对象):

1，数组的类名

temp1 = np.arange(12)
print(temp1,type(temp1))

# -------------output---------------------
[ 0  1  2  3  4  5  6  7  8  9 10 11] <class 'numpy.ndarray'>

2，数据的类型（当前数组里面所存放的数据的类型）

temp3 = np.array(range(1,6))
print(temp3.dtype)

# -------------output---------------------
int32     # 默认为多少位的电脑，数据类型就位多少位

3，调整数据类型

temp2 = np.array([[1,2,3,4],[3,4,5,6],[7,8,9,0]],dtype='i4')
print(temp2.dtype)
temp2 = temp2.astype('i8')
print(temp2.dtype)

# -------------output---------------------
int32
int64

在这里插入图片描述

4，限制数组float的浮点位数

temp4 = np.array([random.random() for i in range(10)])
print(temp4,temp4.dtype)
temp4 = np.round(temp4,2)
print(temp4,temp4.dtype)

# -------------output---------------------
[0.03505807 0.30070143 0.81331086 0.80476998 0.88999505 0.59220155
 0.6514705  0.11714838 0.53510445 0.09625571] float64
[0.04 0.3  0.81 0.8  0.89 0.59 0.65 0.12 0.54 0.1 ] float64

# python 代码中限制float的浮点位数
import random
a = random.random()
print(a)
a = round(a,2)
print(a)

# -------------output---------------------
0.2379073047892517
0.24

5，数组的维度

temp2 = np.array([[1,2,3,4],[3,4,5,6],[7,8,9,0]])
print(temp2.ndim)

# -------------output---------------------
2     # 返回数组的维度

6，数组占用的元素数目

import numpy as np
temp = np.array([[1,2,3],[4,5,6],[7,8,9],[5,6,7]])
# 获取array对象的维度形状
print(temp.shape)
# 获得arrary对象元素的数量
print(temp.size)
# 获得arrary对象每一个元素占用的内存位数
print(temp.itemsize)
# 获得arrary对象所有元素占用的内存位数
print(temp.nbytes)
# arrary对象内存的分布排列方式
print(temp.flags)

# -------------output---------------------
(4, 3)
12
4
48
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

7，数组的形状(shape、reshape)

import numpy as np

# np定义数组的三种方法（推荐使用第一种）
temp1 = np.arange(12)
temp2 = np.array([[1,2,3,4],[3,4,5,6],[7,8,9,0]],dtype='i4')

# 查看数组形状（大小）
rst = temp2.shape
rst2 = temp1.reshape((2,2,3))
print(rst,rst2.shape)

# -------------output---------------------
(3, 4) (2, 2, 3)

8，将数组转化为一维数据的两种方法

temp2 = np.array([[1,2,3,4],[3,4,5,6],[7,8,9,0]],dtype='i4')
print(temp2.shape,temp2.ndim)

# 1，flatten是将多维数组输出为一维数组
temp2 = temp2.flatten()
print(temp2.shape,temp2.ndim)
# 2，也可以使用reshape转化为一维数组
temp2 = temp2.reshape(12,)
print(temp2.shape,temp2.ndim)

# -------------output---------------------
(3, 4) 2
(12,) 1
(12,) 1

9，扩展 - 自定义数据结构

通常对于numpy数组来说，储存的都是同一类型的数据。
但其实也可以通过np.dtype实现 数据类型对象表示数据结构。

import numpy as np

mytype = np.dtype([("name" ,np.string_,10),('height',np.float64)])
print(mytype)

arr = np.array([("Sarsh",(8.3)),("John",(6.345))],dtype=mytype)
print(arr)
print(arr[0]["name"])
# 对于储存关系复杂的数据，我们会选择pandas更加方便的工具

# -------------output---------------------
[('name', 'S10'), ('height', '<f8')]
[(b'Sarsh', 8.3  ) (b'John', 6.345)]
b'Sarsh'

数据挖掘工具numpy（一）Numpy基本认识

一，Numpy的优势

1，Numpy的优势

2，代码案例

二，Numpy的属性：

1，数组的类名

2，数据的类型（当前数组里面所存放的数据的类型）

3，调整数据类型

4，限制数组float的浮点位数

5，数组的维度

6，数组占用的元素数目

7，数组的形状(shape、reshape)

8，将数组转化为一维数据的两种方法

9，扩展 - 自定义数据结构

猜你喜欢