5.数据清洗之CSV文件读写

说明:

*csv与txt文件读写方式是一样的

*pandas内置10多种数据源读取函数,常见的就是csv和excel

*使用read_csv方法读取,文件名称尽量是英文

*参数较多,可以自行控制,但很多时候用默认参数

*读取csv时,注意编码,常用编码为utf-8、gbk、gbk2312和gb18030等

*使用to_csv方法快速保存

df=pd.read_csv('meal_order_info.csv',encoding='gbk')
df=pd.read_csv('meal_order_info.csv',encoding='gbk',nrows=10)
df.to_csv('df.csv',index=False)

import numpy as np 
import pandas as pd
import os
#获取文件当前路径
os.getcwd()
'C:\\Users\\Administrator'
#更改文件当前路径
os.chdir('C:\\Users\\Administrator')
baby=pd.read_csv('Vehicle No.0_Fragment No.0.csv',encoding='utf-8')#注意编码格式,不同的文件用不同的格式
baby.head(5)#默认将第一行做表头即列标签,左边为行索引
time	total_voltage	total_current	soc	temp_max	temp_min	motor_voltage	motor_current	mileage
0	20180205195412	520.1	60	38	20	13	519	62	37918.5
1	20180205195422	521.2	32	38	20	13	521	36	NaN
2	20180205195432	521.1	43	38	20	13	521	46	NaN
3	20180205195442	523.8	38	38	20	13	522	41	NaN
4	20180205195452	528.3	-55	38	20	13	533	-118	NaN

order=pd.read_csv('Vehicle No.0_Fragment No.0.csv',encoding='gbk')
baby.head(5)

time	total_voltage	total_current	soc	temp_max	temp_min	motor_voltage	motor_current	mileage
0	20180205195412	520.1	60	38	20	13	519	62	37918.5
1	20180205195422	521.2	32	38	20	13	521	36	NaN
2	20180205195432	521.1	43	38	20	13	521	46	NaN
3	20180205195442	523.8	38	38	20	13	522	41	NaN
4	20180205195452	528.3	-55	38	20	13	533	-118	NaN

order.info()#输出字段的信息
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 212 entries, 0 to 211
Data columns (total 9 columns):
time             212 non-null int64
total_voltage    212 non-null float64
total_current    212 non-null int64
soc              212 non-null int64
temp_max         212 non-null int64
temp_min         212 non-null int64
motor_voltage    212 non-null int64
motor_current    212 non-null int64
mileage          1 non-null float64
dtypes: float64(2), int64(7)
memory usage: 15.0 KB

order=pd.read_csv('Vehicle No.0_Fragment No.0.csv',encoding='utf-8',dtype={'time':str,'soc':str})#更改表头的数据类型,object即是字符串类型,
order.head(5)
order.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 212 entries, 0 to 211
Data columns (total 9 columns):
time             212 non-null object
total_voltage    212 non-null float64
total_current    212 non-null int64
soc              212 non-null object
temp_max         212 non-null int64
temp_min         212 non-null int64
motor_voltage    212 non-null int64
motor_current    212 non-null int64
mileage          1 non-null float64
dtypes: float64(2), int64(5), object(2)
memory usage: 15.0+ KB
#只显示前100行
baby=pd.read_csv('Vehicle No.0_Fragment No.0.csv',encoding='utf-8',nrows=100)
#最多显示n列
pd.set_option('display.max_columns',5)

#最多显示n行
pd.set_option('display.max_rows',20)
#保存,一般不保存行索引
import os
os.getcwd()
baby.to_csv('a.csv',encoding='utf-8',index=False)
发布了65 篇原创文章 · 获赞 20 · 访问量 2万+

猜你喜欢

转载自blog.csdn.net/l641208111/article/details/104221051