【python】总结pd.read_csv( )与open( )中常用的导入数据的问题之中文问题与转义问题

- 一、对于csv文件里是没有中文存在时：

1.1 用pd.read_csv( )：
详解参考

import pandas as pd
from pandas import DataFrame,Series
df = pd.read_csv('C:\Users\yingfei-wjc\Desktop\index_kchart.csv')

SyntaxError: (unicode error) ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated \UXXXXXXXX escape

此时报错显示原因参考：https://blog.csdn.net/brucewong0516/article/details/79027674 转义符【\】的作用，可以将【\】改为【/】

import pandas as pd
from pandas import DataFrame,Series
df = pd.read_csv('C:/Users/yingfei-wjc/Desktop/index_kchart.csv')   #使用ctr+r快速选取全部的被替换对象

同时可以使用【r】的方法操作：在含有转义符的字符串前加‘r’表示字符串内按原始含义解释，不做转义处理；这里的r相当于正则表达的操作；

1.2 用open打开

df = open('c:/Users/yingfei-wjc/Desktop/index_kchart.csv','r')  #不用注意大小写
print(df.read())

然后结合 pd.read_csv( )

df = open('C:/Users/yingfei-wjc/Desktop/index_kchart.csv','r')
dff = pd.read_csv(df)
print(dff.describe())
df.close()

                pre          open          high           low         close  \
count  1.048575e+06  1.048575e+06  1.048575e+06  1.048575e+06  1.048575e+06   
mean   2.852272e+03  2.850896e+03  2.892983e+03  2.809985e+03  2.854650e+03   
std    2.134117e+03  2.133180e+03  2.168087e+03  2.098019e+03  2.135198e+03   
min    1.000000e+00  1.000000e+00  1.000000e+00  0.000000e+00  1.000000e+00   
25%    1.209940e+03  1.209415e+03  1.225700e+03  1.194380e+03  1.211960e+03   
50%    2.446300e+03  2.444680e+03  2.480360e+03  2.409870e+03  2.448550e+03   
75%    3.934960e+03  3.933370e+03  3.987460e+03  3.878180e+03  3.937355e+03   
max    2.195854e+04  2.206328e+04  2.206328e+04  2.149338e+04  2.195854e+04   

       change_price  change_percent        volume        amount  
count  1.048575e+06    1.048575e+06  1.048575e+06  1.048575e+06  
mean   2.378354e+00    8.023494e-01  4.863823e+07  6.048913e+10  
std    1.169148e+02    3.502789e+02  6.515467e+08  2.867983e+11  
min   -4.104120e+03   -9.655170e+01  0.000000e+00  0.000000e+00  
25%   -1.916000e+01   -9.372000e-01  1.480358e+06  1.627033e+09  
50%    1.380000e+00    1.081000e-01  8.872906e+06  1.051434e+10  
75%    2.720000e+01    1.250400e+00  2.999871e+07  3.712923e+10  
max    5.195610e+03    2.066000e+05  4.570000e+11  3.630000e+13

- 二、如果文件中有中文

再编码：encoding = ‘gbk’ or ‘gb2312’ or ‘gb18030’

import pandas as pd
from pandas import DataFrame,Series
#df = pd.read_csv(r'C:/Users/yingfei-wjc/Desktop/index_kchart.csv',encoding = 'gbk')
#df = pd.read_csv(r'C:/Users/yingfei-wjc/Desktop/index_kchart.csv',encoding = 'gb2312')
df = pd.read_csv(r'C:/Users/yingfei-wjc/Desktop/index_kchart.csv',encoding = 'gb18030')

  # -*- coding: utf-8 -*-
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1048575 entries, 0 to 1048574
Data columns (total 11 columns):
time              1048575 non-null object
pre               1048575 non-null float64
open              1048575 non-null float64
high              1048575 non-null float64
low               1048575 non-null float64
close             1048575 non-null float64
change_price      1048575 non-null float64
change_percent    1048575 non-null float64
volume            1048575 non-null float64
amount            1048575 non-null float64
Unnamed: 10       1 non-null object
dtypes: float64(9), object(2)
memory usage: 88.0+ MB
None

【python】总结pd.read_csv( )与open( )中常用的导入数据的问题之中文问题与转义问题

- 一、对于csv文件里是没有中文存在时：

- 二、如果文件中有中文

猜你喜欢