DataCamp Data Scientist with Python track 学习笔记

Importing Data in Python: 

Customizing your pandas import: 

# Import matplotlib.pyplot as plt
import matplotlib.pyplot as plt

# Assign filename: file
file = 'titanic_corrupt.txt'

# Import file: data
data = pd.read_csv(file, sep='\t', comment='#', na_values='Nothing')

# Print the head of the DataFrame
print(data.head())

# Plot 'Age' variable in a histogram
pd.DataFrame.hist(data[['Age']])
plt.xlabel('Age (years)')
plt.ylabel('count')
plt.show()

也许有的时候pandas默认被当作的缺失值还不能满足要求,我们可以通过设置na_values,将指定的值替换成为NaN值。语句中的意思是将 'Nothing' 用NaN进行替代,将所有的Nothing都替换成了NaN。

'sep' is the 'pandas' version of 'delim', which in this case is tab-delimited. 

data.head() #默认出5行, 括号里可以填其他数据。

Introduction to other file types: 

pickle提供了一个简单的持久化功能,可以将对象以文件的形式存放在磁盘上。python中几乎所有的数据类型(列表,字典,集合,类等)都可以用pickle来序列化,而pickle序列化后的数据可读性差。

If you merely want to be able to import them into Python, you can serialize them. All this means is converting the object into a sequence of bytes, or a bytestream. 

Customizing your spreadsheet import: 

# Parse the first sheet and rename the columns: df1
df1 = xl.parse(0, skiprows=[0], names=['Country', 'AAM due to War (2002)'])

# Print the head of the DataFrame df1
print(df1.head())

# Parse the first column of the second sheet and rename the column: df2
df2 = xl.parse(1, parse_cols=[0], skiprows=[0], names=['Country'])

# Print the head of the DataFrame df2
print(df2.head())

猜你喜欢

转载自blog.csdn.net/weixin_41803041/article/details/84316784