- 一、对于csv文件里是没有中文存在时:
- 1.1 用pd.read_csv( ):
详解参考
import pandas as pd
from pandas import DataFrame,Series
df = pd.read_csv('C:\Users\yingfei-wjc\Desktop\index_kchart.csv')
SyntaxError: (unicode error) ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated \UXXXXXXXX escape
此时报错显示原因参考:https://blog.csdn.net/brucewong0516/article/details/79027674 转义符【\】的作用,可以将【\】改为【/】
import pandas as pd
from pandas import DataFrame,Series
df = pd.read_csv('C:/Users/yingfei-wjc/Desktop/index_kchart.csv') #使用ctr+r快速选取全部的被替换对象
同时可以使用【r】的方法操作:在含有转义符的字符串前加‘r’表示字符串内按原始含义解释,不做转义处理;这里的r相当于正则表达的操作;
- 1.2 用open打开
df = open('c:/Users/yingfei-wjc/Desktop/index_kchart.csv','r') #不用注意大小写
print(df.read())
然后结合 pd.read_csv( )
df = open('C:/Users/yingfei-wjc/Desktop/index_kchart.csv','r')
dff = pd.read_csv(df)
print(dff.describe())
df.close()
pre open high low close \
count 1.048575e+06 1.048575e+06 1.048575e+06 1.048575e+06 1.048575e+06
mean 2.852272e+03 2.850896e+03 2.892983e+03 2.809985e+03 2.854650e+03
std 2.134117e+03 2.133180e+03 2.168087e+03 2.098019e+03 2.135198e+03
min 1.000000e+00 1.000000e+00 1.000000e+00 0.000000e+00 1.000000e+00
25% 1.209940e+03 1.209415e+03 1.225700e+03 1.194380e+03 1.211960e+03
50% 2.446300e+03 2.444680e+03 2.480360e+03 2.409870e+03 2.448550e+03
75% 3.934960e+03 3.933370e+03 3.987460e+03 3.878180e+03 3.937355e+03
max 2.195854e+04 2.206328e+04 2.206328e+04 2.149338e+04 2.195854e+04
change_price change_percent volume amount
count 1.048575e+06 1.048575e+06 1.048575e+06 1.048575e+06
mean 2.378354e+00 8.023494e-01 4.863823e+07 6.048913e+10
std 1.169148e+02 3.502789e+02 6.515467e+08 2.867983e+11
min -4.104120e+03 -9.655170e+01 0.000000e+00 0.000000e+00
25% -1.916000e+01 -9.372000e-01 1.480358e+06 1.627033e+09
50% 1.380000e+00 1.081000e-01 8.872906e+06 1.051434e+10
75% 2.720000e+01 1.250400e+00 2.999871e+07 3.712923e+10
max 5.195610e+03 2.066000e+05 4.570000e+11 3.630000e+13
- 二、如果文件中有中文
再编码:encoding = ‘gbk’ or ‘gb2312’ or ‘gb18030’
import pandas as pd
from pandas import DataFrame,Series
#df = pd.read_csv(r'C:/Users/yingfei-wjc/Desktop/index_kchart.csv',encoding = 'gbk')
#df = pd.read_csv(r'C:/Users/yingfei-wjc/Desktop/index_kchart.csv',encoding = 'gb2312')
df = pd.read_csv(r'C:/Users/yingfei-wjc/Desktop/index_kchart.csv',encoding = 'gb18030')
# -*- coding: utf-8 -*-
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1048575 entries, 0 to 1048574
Data columns (total 11 columns):
time 1048575 non-null object
pre 1048575 non-null float64
open 1048575 non-null float64
high 1048575 non-null float64
low 1048575 non-null float64
close 1048575 non-null float64
change_price 1048575 non-null float64
change_percent 1048575 non-null float64
volume 1048575 non-null float64
amount 1048575 non-null float64
Unnamed: 10 1 non-null object
dtypes: float64(9), object(2)
memory usage: 88.0+ MB
None