pandas

pandas.categorical

>>> pd.Categorical([1, 2, 3, 1, 2, 3])
[1, 2, 3, 1, 2, 3]
Categories (3, int64): [1, 2, 3]
>>> pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c'])
[a, b, c, a, b, c]
Categories (3, object): [a, b, c]
#### 对category进行排序
>>> c = pd.Categorical(['a','b','c','a','b','c'], ordered=True,
...                    categories=['c', 'b', 'a'])
>>> c
[a, b, c, a, b, c]
Categories (3, object): [c < b < a]
>>> c.min()
'c'

categorical.codes

categorical官方文档的解释

map函数

主要作用是将函数作用于一个Series的每一个元素，用法如下所示

In [122]: frame['e'].map(format)
Out[122]: 
Utah       1.28
Ohio      -1.55
Texas      0.20
Oregon    -0.31
Name: e, dtype: object

总的来说就是apply()是一种让函数作用于列或者行操作，applymap()是一种让函数作用于DataFrame每一个元素的操作，而map是一种让函数作用于Series每一个元素的操作

numpy

argsort()

函数将数组的值从小到大排序后，并按照其相对应的索引值输出

##一维数组
>>> a = array([3,1,2])  
>>> argsort(a)  
array([1, 2, 0])  
##二维数组
>>> b = array([[1,2],[2,3]])  
>>> argsort(b,axis=1) #按行排序  
array([[0, 1],  
       [0, 1]])  
>>> argsort(b,axis=0) #按列排序  
array([[0, 0],  
       [1, 1]])  
>>>

numpy的ravel()

将多维数据变成一维数据，一般是对label做此操作。数据类型是np.ravel的

>>> x = np.array([[1, 2, 3], [4, 5, 6]])
>>> print(np.ravel(x))
[1 2 3 4 5 6]
>>> print(x.reshape(-1))
[1 2 3 4 5 6]

numpy 辨异（五）—— numpy.ravel() vs numpy.flatten()

python中小数默认是按照科学计数法显示的，在代码里面加上一行

np.set_printoptions(suppress=True)
##这样就会以小数的形式显示了

meshgrid函数

通常在数据的矢量化上使用，meshgrid的作用适用于生成网格型数据，可以接受两个一维数组生成两个二维矩阵，对应两个数组中所有的(x,y)对。接下来通过简单的shell交互来演示一下这个功能的使用，并做一下小结。

import numpy as np
N ,M= 5,5
x1_min, x1_max = 1, 10 # 第0列的范围
x2_min, x2_max = 3, 15  # 第1列的范围
t1 = np.linspace(x1_min, x1_max, N)
t2 = np.linspace(x2_min, x2_max, M)
x1, x2 = np.meshgrid(t1, t2)  # 生成网格采样点
print(x1)
print(x2)
x_show = np.stack((x1.flat, x2.flat), axis=1)  # 测试点
print(x_show)##shape-(25, 2)

[[  1.     3.25   5.5    7.75  10.  ]
 [  1.     3.25   5.5    7.75  10.  ]
 [  1.     3.25   5.5    7.75  10.  ]
 [  1.     3.25   5.5    7.75  10.  ]
 [  1.     3.25   5.5    7.75  10.  ]]
[[  3.   3.   3.   3.   3.]
 [  6.   6.   6.   6.   6.]
 [  9.   9.   9.   9.   9.]
 [ 12.  12.  12.  12.  12.]
 [ 15.  15.  15.  15.  15.]]
[[  1.     3.  ]
 [  3.25   3.  ]
 [  5.5    3.  ]
 [  7.75   3.  ]
 [ 10.     3.  ]
 [  1.     6.  ]
 [  3.25   6.  ]
 [  5.5    6.  ]
 [  7.75   6.  ]
 [ 10.     6.  ]
 [  1.     9.  ]
 [  3.25   9.  ]
 [  5.5    9.  ]
 [  7.75   9.  ]
 [ 10.     9.  ]
 [  1.    12.  ]
 [  3.25  12.  ]
 [  5.5   12.  ]
 [  7.75  12.  ]
 [ 10.    12.  ]
 [  1.    15.  ]
 [  3.25  15.  ]
 [  5.5   15.  ]
 [  7.75  15.  ]
 [ 10.    15.  ]]

np.split

数据可以更好的分割开来。对训练数据可以采用这种方法把特征和label分开

import numpy as np
data = [[1,2,3,4],[5,6,7,8]]
data = np.array(data)
x,y = np.split(data,(2,),axis=1)
##axis = 1按照行的方向进行分割，
print(x)
print(y)
##结果:
[[1 2]
 [5 6]]
[[3 4]
 [7 8]]
 x,y = np.split(data,(3,),axis=1)
 ##结果
 [[1 2 3]
 [5 6 7]]
[[4]
 [8]]

matplot

plt.legend

(loc=’upper right’)#显示图例，就是图里面的线代表什么，loc指定图例的位置
pcolormesh函数将x1,x2两个网格矩阵和对应的预测结果y_show_hat绘制在图片上
plt.pcolormesh(x1, x2, y_show_hat, cmap=cm_light) # 预测值的显示

plt.scatte-画散点图
参数c表示的是颜色，可以是色彩或颜色序列。

plt.xlim()
- 调用时不带参数，则返回当前的参数值。例如，plt.xlim()返回当前的X轴绘图范围。
- 调用时带参数，则设置参数值。因此，plt.xlim([0,10])会将X轴的范围设置为0到10

python数据分析——pandas，numpy，matplot