Python learning notes (3)

Q1.  UnicodeEncodeError: 'utf-8' codec can't encode character '\udcd5' in position 84: surrogates not allowed

This error suddenly occurred when Windows developed a TensorFlow program to read scv data. At first, I thought it was a coding problem. As a result, adding #coding:utf-8 at the beginning of the file still reported the same error.


Finally, I checked various materials and found that the reason for the error is that the single backslash is used in the path in the program:

E:\code\TensorFlow\reader.csv

It seems that there is no error in writing this way in the path, but in fact it cannot be executed. Modify it to

E:\\code\\TensorFlow\\reader.csv

successfully executed

  Q2: tf.unstack() command

unstack(
    value,
    num=none
    axis=0,
    name='unstack'
)

tf.unstack()  

  Splits the given R-dimensional tensor into R-1-dimensional tensors

  Decompose value into num tensors according to axis , and the returned value is of type list. If num is not specified, it will be inferred according to axis !

DEMO:



 
import tensorflow as tf
a = tf.constant([3,2,4,5,6])
b = tf.constant([1,6,7,8,0])
c = tf.stack([a,b],axis=0)
d = tf.stack([a,b],axis=1)
e = tf.unstack([a,b],axis=0)
f = tf.unstack([a,b],axis=1)
 
with tf.Session() as sess:
    print (sess.run (c))
    print (sess.run (d))
    print (sess.run)
    print (sess.run (f))
output:

[[3 2 4 5 6]
[1 6 7 8 0]]

--------------------
[[3 1]
[2 6]
[4 7]
[5 8]
[6 0]]

----------------------
[array([3, 2, 4, 5, 6]), array([1, 6, 7, 8, 0])]

----------------------
[array([3, 1]), array([2, 6]), array([4, 7]), array([5, 8]), array([6, 0])]

Q3.  What is os.path.splitext in "python" used for ?

Function: separate file name and extension; return (fname, fextension) tuple by default, which can be used for fragmentation operations.

for example:

<span style="font-size: 18px;">import os  

path_01='D:/User/wgy/workplace/data/notMNIST_large.tar.gar'  

path_02='D:/User/wgy/workplace/data/notMNIST_large'  

root_01=os.path.splitext(path_01)  

root_02=os.path.splitext(path_02)  

print(root_01)  

print(root_02)</span>

os.path module usage in python:

  1. dirname() is used to remove the file name and return the path where the directory is located

    Such as:

    >>> import os

    >>> os.path.dirname('d:\\library\\book.txt')'d:\\library'

  2. basename() is used to remove the path of the directory and return only the file name

    Such as:

    >>> import os

    >>> os.path.basename('d:\\library\\book.txt')'book.txt'

  3. join() is used to combine the separated parts into a single pathname

    Such as:

    >>> import os

    >>> os.path.join('d:\\library','book.txt')'d:\\library\\book.txt'

  4. split() returns a tuple of directory paths and filenames

    Such as:

    >>> import os

    >>> os.path.split('d:\\library\\book.txt')('d:\\library', 'book.txt')

  5. splitdrive() returns a tuple of drive and path characters

    >>> import os

    >>> os.path.splitdrive('d:\\library\\book.txt')('d:', '\\library\\book.txt')

  6. splitext() is used to return a tuple of filename and extension

    Such as:

    >>> os.path.splitext('d:\\library\\book.txt')('d:\\library\\book', '.txt')

    >>> os.path.splitext('book.txt')('book', '.txt')

Q4: pandas.read_csv command
See https://www.cnblogs.com/datablog/p/6127000.html
Note: In addition, the csv library in python can easily migrate data between different applications. Data can be batch exported to csv
format and then imported into other applications.
Q5: related to os library in python
?
import os
help(os)  ##查询os库功能
If you know the basic operation of linux, the following os methods should be familiar, because the basic operation methods are the same as those under linux. Here are a few commonly used ones:

 
 
1.os.getcwd() to get the current path
2.os.listdir(path) Get the contents of the directory
3.os.mkdir(path) to create a directory 4.os.rmdir(path) to delete a directory 5.os.isdir(path) os.isfile(path) to determine whether it is a directory or a file 6.os.remove(path) to delete a file 7.os.rename(old, new) Renames a file or directory 8.os.name The output string indicates the platform being used. If it is window it is represented by 'nt', for Linux/Unix users, it is 'posix' 9.os.system() Run the shell command 10.os.path.split() Returns the directory name and file name of a path 11. os.path.splitext() separates the file name and extension 12.os.path.getsize(name) gets the file size, if the name is a directory returns 0L13.os.getegid() returns the effective group id to which the current process (process) belongs Only available on unix 14.os.geteuid() Returns the user ID to which the current process belongs (Unix) 15.os.getgid() Returns the real group ID to which the current process belongs (real group id) 16.os.getlogin() Returns the current login Username 17.os.getpgrp() Returns the ID of the current process group (Unix) 18.os.getpid() Returns the PID of the current process, (Unix, Windows) 19.os.getppid() Returns the ID of the parent process of the current process (Unix)20.os.getuid() Returns the user ID of the current process (Unix)

Q6. Other common libraries: numpy, pandas, matplotlib

import numpy as np  
2.import pandas as pd  
3.import matplotlib.pyplot as plt  
4.  
5.---------------numpy-----------------------  
6.arr = np.array([1,2,3], dtype=np.float64)  
7.np.zeros((3,6))  np.empty((2,3,2)) np.arange(15)  
8.arr.dtype arr.ndim arr.shape  
9.arr.astype(np.int32) #np.float64 np.string_ np.unicode_  
10.arr * arr arr - arr 1/arr  
11.arr= np.arange(32).reshape((8,4))  
12.arr[1:3, : ]  #正常切片  
13.arr[[1,2,3]]  #花式索引  
14.arr.T   arr.transpose((...))   arr.swapaxes(...) #转置  
15.arr.dot #矩阵内积  
16.np.sqrt(arr)   np.exp(arr)    randn(8)#正态分布值   np.maximum(x,y)  
17.np.where(cond, xarr, yarr)  #当cond为真,取xarr,否则取yarr  
18.arr.mean()  arr.mean(axis=1)   #算术平均数  
19.arr.sum()   arr.std()  arr.var()   #和、标准差、方差  
20.arr.min()   arr.max()   #最小值、最大值  
21.arr.argmin()   arr.argmax()    #最小索引、最大索引  
22.arr.cumsum()    arr.cumprod()   #所有元素的累计和、累计积  
23.arr.all()   arr.any()   # 检查数组中是否全为真、部分为真  
24.arr.sort()   arr.sort(1)   #排序、1轴向上排序  
25.arr.unique()   #去重  
26.np.in1d(arr1, arr2)  #arr1的值是否在arr2中  
27.np.load() np.loadtxt() np.save() np.savez() #读取、保存文件  
28.np.concatenate([arr, arr], axis=1)  #连接两个arr,按行的方向  
29.  
30.  
31.---------------pandas-----------------------  
32.ser = Series()     ser = Series([...], index=[...])  #一维数组, 字典可以直接转化为series  
33.ser.values    ser.index    ser.reindex([...], fill_value=0)  #数组的值、数组的索引、重新定义索引  
34.ser.isnull()   pd.isnull(ser)   pd.notnull(ser)   #检测缺失数据  
35.ser.name=       ser.index.name=    #ser本身的名字、ser索引的名字  
36.ser.drop('x') #丢弃索引x对应的值  
37.ser +ser  #算术运算  
38.ser.sort_index()   ser.order()     #按索引排序、按值排序  
39.df = DataFrame(data, columns=[...], index=[...]) #表结构的数据结构,既有行索引又有列索引  
40.df.ix['x']  #索引为x的值    对于series,直接使用ser['x']  
41.del df['ly']  #用del删除第ly列  
42.df.T    #转置  
43.df.index.name df.columns.name df.values  
44.df.drop([...])  
45.df + df   df1.add(df2, fill_vaule=0) #算术运算  
46.df -ser   #df与ser的算术运算  
47.f=lambda x: x.max()-x.min()   df.apply(f)  
48.df.sort_index(axis=1, ascending=False)   #按行索引排序  
49.df.sort_index(by=['a','b'])   #按a、b列索引排序  
50.ser.rank()   df.rank(axis=1)  #排序,增设一个排名值  
51.df.sum()   df.sum(axis=1)   #按列、按行求和  
52.df.mean(axis=1, skipna=False)   #求各行的平均值,考虑na的存在  
53.df.idxmax()   #返回最大值的索引  
54.df.cumsum()   #累计求和  
55.df.describe()  ser.describe()   #返回count mean std min max等值  
56.ser.unique()  #去重  
57.ser.value_counts()   df.value_counts()  #返回一个series,其索引为唯一值,值为频率  
58.ser.isin(['x', 'y'])  #判断ser的值是否为x,y,得到布尔值  
59.ser.dropna() ser.isnull() ser.notnull() ser.fillna(0)  #处理缺失数据,df相同  
60.df.unstack()   #行列索引和值互换  df.unstack().stack()  
61.df.swaplevel('key1','key2')   #接受两个级别编号或名称,并互换  
62.df.sortlevel(1) #根据级别1进行排序,df的行、列索引可以有两级  
63.df.set_index(['c','d'], drop=False)    #将c、d两列转换为行,因drop为false,在列中仍保留c,d  
64.read_csv   read_table   read_fwf    #读取文件分隔符为逗号、分隔符为制表符('\t')、无分隔符(固定列宽)  
65.pd.read_csv('...', nrows=5) #读取文件前5行  
66.pd.read_csv('...', chunksize=1000) #按块读取,避免过大的文件占用内存  
67.pd.load() #pd也有load方法,用来读取二进制文件  
68.pd.ExcelFile('...xls').parse('Sheet1')  # 读取excel文件中的sheet1  
69.df.to_csv('...csv', sep='|', index=False, header=False) #将数据写入csv文件,以|为分隔符,默认以,为分隔符, 禁用列、行的标签  
70.pd.merge(df1, df2, on='key', suffixes=('_left', '_right')) #合并两个数据集,类似数据库的inner join, 以二者共有的key列作为键,suffixes将两个key分别命名为key_left、key_right  
71.pd.merge(df1, df2, left_on='lkey', right_on='rkey') #合并,类似数据库的inner join, 但二者没有同样的列名,分别指出,作为合并的参照  
72.pd.merge(df1, df2, how='outer') #合并,但是是outer join;how='left'是笛卡尔积,how='inner'是...;还可以对多个键进行合并  
73.df1.join(df2, on='key', how='outer')  #也是合并  
74.pd.concat([ser1, ser2, ser3], axis=1) #连接三个序列,按行的方向  
75.ser1.combine_first(ser2)   df1.combine_first(df2) #把2合并到1上,并对齐  
76.df.stack() df.unstack()  #列旋转为行、行旋转为列  
77.df.pivot()  
78.df.duplicated()   df.drop_duplicates() #判断是否为重复数据、删除重复数据  
79.df[''].map(lambda x: abs(x)) #将函数映射到df的指定列  
80.ser.replace(-999, np.nan) #将-999全部替换为nan  
81.df.rename(index={}, columns={}, inplace=True) #修改索引,inplace为真表示就地修改数据集  
82.pd.cut(ser, bins)  #根据面元bin判断ser的各个数据属于哪一个区段,有labels、levels属性  
83.df[(np.abs(df)>3).any(1)] #输出含有“超过3或-3的值”的行  
84.permutation  take    #用来进行随机重排序  
85.pd.get_dummies(df['key'], prefix='key')  #给df的所有列索引加前缀key  
86.df[...].str.contains()  df[...].str.findall(pattern, flags=re.IGNORECASE)  df[...].str.match(pattern, flags=...)    df[...].str.get()  #矢量化的字符串函数  
87.  
88.----绘图  
89.ser.plot() df.plot() #pandas的绘图工具,有参数label, ax, style, alpha, kind, logy, use_index, rot, xticks, xlim, grid等,详见page257  
90.kind='kde' #密度图  
91.kind='bar' kind='barh' #垂直柱状图、水平柱状图,stacked=True为堆积图  
92.ser.hist(bins=50) #直方图  
93.plt.scatter(x,y) #绘制x,y组成的散点图  
94.pd.scatter_matrix(df, diagonal='kde', color='k', alpha='0.3')  #将df各列分别组合绘制散点图  
95.  
96.----聚合分组  
97.groupby() 默认在axis=0轴上分组,也可以在1组上分组;可以用for进行分组迭代  
98.df.groupby(df['key1']) #根据key1对df进行分组  
99.df['key2'].groupby(df['key1'])  #根据key1对key2列进行分组  
100.df['key3'].groupby(df['key1'], df['key2'])  #先根据key1、再根据key2对key3列进行分组  
101.df['key2'].groupby(df['key1']).size() #size()返回一个含有分组大小的series  
102.df.groupby(df['key1'])['data1']  等价于 df['data1'].groupby(df['key1'])  
103.df.groupby(df['key1'])[['data1']]  等价于  df[['data1']].groupby(df['key1'])  
104.df.groupby(mapping, axis=1)  ser(mapping) #定义mapping字典,根据字典的分组来进行分组  
105.df.groupby(len) #通过函数来进行分组,如根据len函数  
106.df.groupby(level='...', axis=1)  #根据索引级别来分组  
107.df.groupby([], as_index=False)   #禁用索引,返回无索引形式的数据  
108.df.groupby(...).agg(['mean', 'std'])   #一次使用多个聚合函数时,用agg方法  
109.df.groupby(...).transform(np.mean)   #transform()可以将其内的函数用于各个分组  
110.df.groupby().apply()  #apply方法会将待处理的对象拆分成多个片段,然后对各片段调用传入的函数,最后尝试将各片段组合到一起  
111.  
112.----透视交叉  
113.df.pivot_table(['',''], rows=['',''], cols='', margins=True)  #margins为真时会加一列all  
114.pd.crosstab(df.col1, df.col2, margins=True) #margins作用同上  
115.  
116.  
117.---------------matplotlib---------------  
118.fig=plt.figure() #图像所在的基对象  
119.ax=fig.add_subplot(2,2,1)  #2*2的图像,当前选中第1个  
120.fig, axes = plt.subplots(nrows, nclos, sharex, sharey)  #创建图像,指定行、列、共享x轴刻度、共享y轴刻度  
121.plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=None)  
122.#调整subplot之间的距离,wspace、hspace用来控制宽度、高度百分比  
123.ax.plot(x, y, linestyle='--', color='g')   #依据x,y坐标画图,设置线型、颜色  
124.ax.set_xticks([...]) ax.set_xticklabels([...]) #设置x轴刻度  
125.ax.set_xlabel('...') #设置x轴名称  
126.ax.set_title('....') #设置图名  
127.ax.legend(loc='best') #设置图例, loc指定将图例放在合适的位置  
128.ax.text(x,y, 'hello', family='monospace', fontsize=10) #将注释hello放在x,y处,字体大小为10  
129.ax.add_patch() #在图中添加块  
130.plt.savefig('...png', dpi=400, bbox_inches='tight') #保存图片,dpi为分辨率,bbox=tight表示将裁减空白部分  
131.  
132.  
133.  
134.  
135.------------------------------------------  
136.from mpl_toolkits.basemap import Basemap  
137.import matplotlib.pyplot as plt  
138.#可以用来绘制地图  
139.  
140.  
141.-----------------时间序列--------------------------  
142.pd.to_datetime(datestrs)    #将字符串型日期解析为日期格式  
143.pd.date_range('1/1/2000', periods=1000)    #生成时间序列  
144.ts.resample('D', how='mean')   #采样,将时间序列转换成以每天为固定频率的, 并计算均值;how='ohlc'是股票四个指数;  
145.#重采样会聚合,即将短频率(日)变成长频率(月),对应的值叠加;  
146.#升采样会插值,即将长频率变为短频率,中间产生新值  
147.ts.shift(2, freq='D')   ts.shift(-2, freq='D') #后移、前移2天  
148.now+Day() now+MonthEnd()  
149.import pytz   pytz.timezone('US/Eastern')   #时区操作,需要安装pytz  
150.pd.Period('2010', freq='A-DEC')   #period表示时间区间,叫做时期  
151.pd.PeriodIndex    #时期索引  
152.ts.to_period('M')   #时间转换为时期  
153.pd.rolling_mean(...)    pd.rolling_std(...)   #移动窗口函数-平均值、标准差  

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325429762&siteId=291194637