Importing Data in Python:
Customizing your pandas import:
# Import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
# Assign filename: file
file = 'titanic_corrupt.txt'
# Import file: data
data = pd.read_csv(file, sep='\t', comment='#', na_values='Nothing')
# Print the head of the DataFrame
print(data.head())
# Plot 'Age' variable in a histogram
pd.DataFrame.hist(data[['Age']])
plt.xlabel('Age (years)')
plt.ylabel('count')
plt.show()
也许有的时候pandas默认被当作的缺失值还不能满足要求,我们可以通过设置na_values,将指定的值替换成为NaN值。语句中的意思是将 'Nothing' 用NaN进行替代,将所有的Nothing都替换成了NaN。
'sep' is the 'pandas' version of 'delim', which in this case is tab-delimited.
data.head() #默认出5行, 括号里可以填其他数据。
Introduction to other file types:
pickle提供了一个简单的持久化功能,可以将对象以文件的形式存放在磁盘上。python中几乎所有的数据类型(列表,字典,集合,类等)都可以用pickle来序列化,而pickle序列化后的数据可读性差。
If you merely want to be able to import them into Python, you can serialize them. All this means is converting the object into a sequence of bytes, or a bytestream.
Customizing your spreadsheet import:
# Parse the first sheet and rename the columns: df1
df1 = xl.parse(0, skiprows=[0], names=['Country', 'AAM due to War (2002)'])
# Print the head of the DataFrame df1
print(df1.head())
# Parse the first column of the second sheet and rename the column: df2
df2 = xl.parse(1, parse_cols=[0], skiprows=[0], names=['Country'])
# Print the head of the DataFrame df2
print(df2.head())