Python Data Analysis (III): acquaintance Pandas

1 Introduction

Pandas based on NumPy development, it provides a fast, flexible, clear data structure designed simple and intuitive process data.

Pandas adapted to process the following types of data:

  • Ordered and unordered time-series data

  • Ranks of the matrix data with a tag, comprising a homogeneous or heterogeneous type data

  • Excel table with SQL or the like, including isomeric form data columns

  • Any other form of observation, statistical data set, the data structure of data transferred without prior numerals Pandas

Pandas main data structure is Series (one-dimensional data) and DataFrame (two-dimensional data), these two data structures enough to handle financial, statistical and other areas where the most typical use cases.

2. Series

Series can be custom labels (index), and then access the data in the array by an index, to find out the following by way of example.

from pandas import Series

'''
创建 Series 对象
如果不指定索引,则使用默认索引,范围是:[0,...,len(数据)-1]
'''
s1 = Series([1, 2, 3, 4, 5])
s2 = Series([1, 2, 3, 4, 5], index=['6', '7', '8', '9', '10'])
print(s1)
# 获取索引
print(s1.index)
# 获取值
print(s1.values)
# 获取索引和值
print(s1.iteritems)
# 取指定值
print(s2[0])
print(s2['6'])
# 连续取值
print(s2[1:3])
print(s2['7':'8'])
# 取不连续取值
print(s2[[1,4]])
print(s2[['7','10']])
# 基本运算
print(s1 + s2)
print(s1 - s2)
print(s1 * s2)
print(s1 / s1)

3. DataFrame

DataFrame is a two-dimensional data structure, similar to Excel, SQL table or dictionary object constituents Series, DataFrame Pandas is the most common objects, and as Series, DataFrame supports multiple types of input data, the following is further understood by way of example do .

3.1 Create

Let's look at how to create DataFrame.

from pandas import DataFrame
import numpy as np

# 直接创建
df1 = DataFrame(np.random.randn(5,5), index=list('abcde'), columns=list('abcde'))
print(df1)
# 使用字典创建
dic = {'name':['张三', '李四', '王五', '赵六', '朱七'], 'age':[20, 18, 30, 40, 50]}
df2 = DataFrame(dic)
print(df2)
df3 = DataFrame.from_dict(dic)
print(df3)
# 转为字典
d = df3.to_dict()
print(d)

3.2 Basic Operations

We look DataFrame of common operations point of view by example.

from pandas import DataFrame

dic = {'name':['张三', '李四', '王五', '赵六', '朱七'], 'age':[20, 18, 30, 40, 50], 'gender':['男', '女', '男', '女', '男']}
df = DataFrame(dic)
# 数据类型
print(df.dtypes)
# 维度
print(df.ndim)
# 概览
print(df.info())
# 行、列数
print(df.shape)
# 行索引
print(df.index.tolist())
# 列索引
print(df.columns.tolist())
# 数据(二维数组形式)
print(df.values)
# 前几行
print(df.head(2))
# 后几行
print(df.tail(2))
# 获取一列
print(df['name'])
# 类型为 Series
print(type(df['name']))
# 获取多列
print(df[['name', 'age']])
# 类型为 DataFrame
print(type(df[['name', 'age']]))
# 获取一行
print(df[1:2])
# 获取多行
print(df[1:4])
# 多行的某一列数据
print(df[1:4][['name']])
# 某一行某一列数据
print(df.loc[1, 'name'])
# 某一行指定列数据
print(df.loc[1, ['name', 'age']])
# 某一行所有列数据
print(df.loc[1, :])
# 连续多行和间隔的多列
print(df.loc[0:2, ['name', 'gender']])
# 间隔多行和间隔的多列
print(df.loc[[0, 2], ['name', 'gender']])
# 取一行
print(df.iloc[1])
# 取连续多行
print(df.iloc[0:3])
# 取间断的多行
print(df.iloc[[1, 3]])
# 取某一列
print(df.iloc[:, 0])
# 取某一个值
print(df.iloc[0, 1])

3.3 Add Remove

We look at how to add data to DataFrame example of view and how to delete data from it.

from pandas import DataFrame
import pandas as pd
import numpy as np

df1 = DataFrame([['张三', '22'], ['李四', '33'], ['王五', '11']], columns=['name', 'age'])
df2 = DataFrame([['张三', '22'], ['李四', '33'], ['王五', '11']], columns=['name', 'age'])
# 在某位置插入一列
# 方式 1
col = df1.columns.tolist()
col.insert(1, 'gender')
df1.reindex(columns=col)
df1['gender'] = ['男', '女', '保密']
print(df1)
# 方式 2
df1.insert(0, 'id', ['001', '002', '003'])
print(df1)
# 在某位置插入一行
row = ['004', '赵六', '66', '男']
df1.iloc[2] = row
print(df1)
df3 = DataFrame({'name':'赵六', 'age':'55'}, index=[0])
df2 = df2.append(df3, ignore_index=True)
print(df2)
# 合并
df4 = DataFrame(np.arange(6).reshape(3, 2), columns=['a', 'b'])
df5 = DataFrame(np.arange(6).reshape(2, 3), columns=['c', 'd', 'e'])
# 按行
pd6 = pd.concat([df4, df5], axis=1)
print(pd6)
# 按列
pd7 = pd.concat([df4, df5], axis=0, ignore_index=True)
print(pd7)
'''
删除
参数1:要删除的标签
参数2:0 表示行,1 表示列
参数3:是否在当前 df 中执行该操作
'''
df5.drop(['c'], axis=1, inplace=True)
print(df5)
df5.drop([1], axis=0, inplace=True)
print(df5)

Published 64 original articles · won praise 1276 · Views 300,000 +

Guess you like

Origin blog.csdn.net/ityard/article/details/104873042