10.pandas

Pandas

  • 两类非常重要的数据结构:Series(序列)和DataFrame(数据框)

    • Series
      • 类似于NumPy中的一维数组
    • 和DataFrame
      • 类似于NumPy中的二维数组
  • Series的创建

    • 通过一维数组来创建
    • 通过字典的方式来创建
    • 通过DataFrame中的某一行或某一列来创建
  • 和DataFrame的创建

    • 通过二维数组来创建
    • 通过字典的方式来创建
    • 通过数据框来创建
  • 优点

    • 允许使用行和列的标签
    • 可以计算时间序列数据的滚动统计数据
    • 易于处理NaN值
    • 能够将不同格式的数据加载到DataFrame中
    • 可以将不同的数据集连接并合并在一起
    • 与NumPy和matplotlib集成
  • Series

    • shape
      • 形状
    • ndim
      • 维数
    • size
      • 总共有多少个元素
    • index
      • 返回Series索引
    • values
      • 返回数据
import pandas as pd
import numpy as np
groceries = pd.Series(data=[30,6,'Yes','No'],index=['eggs','apples','milk','breed'])
# 既有字符串也有整数
print(groceries)
eggs       30
apples      6
milk      Yes
breed      No
dtype: object
'banana' in groceries

'apples' in groceries
False

True

Pandas和NumPy一个重要的不同是:可以给Pandas序列的各元素赋上索引标签

  • 允许我们以多种方式访问数据
    • 通过index
    • 通过位置进行索引
groceries['eggs']
groceries[['eggs','apples']]
30
apples      6
milk      Yes
dtype: object
groceries[-1]
No
  • 区分索引标签和数字标签

    • loc 表示位置
      • 标签索引
    • iloc integer索引
      • 数字标签
  • 删除

    • pd_name.drop(‘idx_name’,inplace = True or False)
      • 修改原序列或者不修改
  • Pandas 算数操作

    • 同numpy
print("fruits + 2 = \n{}".format(fruits + 2))
print("fruits - 2 = \n{}".format(fruits - 2))
print("fruits * 2 = \n{}".format(fruits * 2))
print("fruits / 2 = \n{}".format(fruits / 2))
fruits + 2 = 
apples     12
oranges     8
bananas     5
dtype: int64
fruits - 2 = 
apples     8
oranges    4
bananas    1
dtype: int64
fruits * 2 = 
apples     20
oranges    12
bananas     6
dtype: int64
fruits / 2 = 
apples     5.0
oranges    3.0
bananas    1.5
dtype: float64

DataFrame

  • 创建DataFrame
# 3 DataFrame
items = {"Bob" : pd.Series([245,25,55],['bike','pants','watch']),
        'Alice' : pd.Series([40,110,500,45],['book','glasses','bike','pants'])}


shopping_carts = pd.DataFrame(items)
shopping_carts
	Bob	Alice
bike	245.0	500.0
book	NaN	40.0
glasses	NaN	110.0
pants	25.0	45.0
watch	55.0	NaN
    • 若序列中没有给出索引则默认使用数字索引
# 3 DataFrame
items = {"Bob" : pd.Series([245,25,55]),
        'Alice' : pd.Series([40,110,500,45])}


shopping_carts = pd.DataFrame(items)
shopping_carts
Bob	Alice
0	245.0	40
1	25.0	110
2	55.0	500
3	NaN	45
    • df_name.values
    • df_name.size
    • df_name.ndim
  • 选择性录入
    • columns
    • index
bob_shopping_carts = pd.DataFrame(items,columns=['Bob'])
bob_shopping_carts

Bob
bike	245
pants	25
watch	55
sel_shopping_carts = pd.DataFrame(items,index=['pants','book'])
sel_shopping_carts
	Bob	Alice
pants	25.0	45
book	NaN	40
  • 利用字典来创建
  • 利用字典的list来创建
# We create a dictionary of lists (arrays)
data = {'Integers' : [1,2,3],
        'Floats' : [4.5, 8.2, 9.6]}

# We create a DataFrame 
df = pd.DataFrame(data)

# We display the DataFrame
df
 	** Floats** 	Integers
0 	4.5 	1
1 	8.2 	2
2 	9.6 	3
# We create a list of Python dictionaries
items2 = [{'bikes': 20, 'pants': 30, 'watches': 35}, 
          {'watches': 10, 'glasses': 50, 'bikes': 15, 'pants':5}]

# We create a DataFrame 
store_items = pd.DataFrame(items2)

# We display the DataFrame
store_items
 	** bikes** 	glasses 	pants 	watches
0 	20 	NaN 	30 	35
1 	15 	50.0 	5 	10

  • 访问DataFrame
    • 通过df_name[‘col_key’][‘row_key’]
    • 先列标签再行标签
x = []
for item in store_items.columns:
    x.append(item)
print(store_items[x]['store 1'])

上是不行的

store_items.loc['sotre 1']
bike       20.0
glasses     NaN
pants      30.0
watches    35.0
Name: store 1, dtype: float64
  • 添加新行2
store_items['shirts']=[15,2]
store_items
  • append 添加新行1
new_items = [{'bikes':20,'pants':30,'watches':35,'glasses':4}]
new_store = pd.DataFrame(new_items,index = ['store 3'])
new_store
	bike	bikes	glasses	pants	shirts	watches
store 1	20.0	NaN	NaN	30	15.0	35
store 2	50.0	NaN	50.0	5	2.0	10
store 3	NaN	20.0	4.0	30	NaN	35
  • 添加新列1
store_items['new watches'] = store_items['watches'][1:]

** bikes**	glasses	pants	shirts	suits	watches	new watches
store 1	20	NaN	30	15.0	45.0	35	NaN
store 2	15	50.0	5	2.0	7.0	10	10.0
store 3	20	4.0	30	NaN	NaN	35	35.0
  • 添加新列2
    • df_name.insert(loc,label,data)
      • loc : 第几列
      • label : 标签名
      • data : 属性值
# We insert a new column with label shoes right before the column with numerical index 4
store_items.insert(4, 'shoes', [8,5,0])

# we display the modified DataFrame
store_items
	** bikes**	glasses	pants	shirts	shoes	suits	watches	new watches
store 1	20	NaN	30	15.0	8	45.0	35	NaN
store 2	15	50.0	5	2.0	5	7.0	10	10.0
store 3	20	4.0	30	NaN	0	NaN	35	35.0
  • 删除行或列
    • df_name.op(attr_key)
      • 只允许删除列
    • df_name.drop([‘key1’,‘key2’],axis = 1,0)
      • 可删除行和列
# We remove the new watches column
store_items.pop('new watches')

# we display the modified DataFrame
store_items

** bikes**	glasses	pants	shirts	shoes	suits	watches
store 1	20	NaN	30	15.0	8	45.0	35
store 2	15	50.0	5	2.0	5	7.0	10
store 3	20	4.0	30	NaN	0	NaN	35
# We remove the watches and shoes columns
store_items = store_items.drop(['watches', 'shoes'], axis = 1)

# we display the modified DataFrame
store_items

** bikes**	glasses	pants	shirts	suits
store 1	20	NaN	30	15.0	45.0
store 2	15	50.0	5	2.0	7.0
store 3	20	4.0	30	NaN	NaN
# We remove the store 2 and store 1 rows
store_items = store_items.drop(['store 2', 'store 1'], axis = 0)

# we display the modified DataFrame
store_items

** bikes**	glasses	pants	shirts	suits
store 3	20	4.0	30	NaN	NaN
  • 重命名列属性标签
    • df_name.rename(columns//index// = {‘old_lname’ : ‘new_lname’})
# We change the column label bikes to hats
store_items = store_items.rename(columns = {'bikes': 'hats'})

# we display the modified DataFrame
store_items
** hats**	glasses	pants	shirts	suits
store 3	20	4.0	30	NaN	NaN
# We change the row label from store 3 to last store
store_items = store_items.rename(index = {'store 3': 'last store'})

# we display the modified DataFrame
store_items
	** hats**	glasses	pants	shirts	suits
last store	20	4.0	30	NaN	NaN
  • 以某一列属性作为index
# We change the row index to be the data in the pants column
store_items = store_items.set_index('pants')

# we display the modified DataFrame
store_items
pants	** hats**	glasses	shirts	suits
30	20	4.0	NaN	NaN

猜你喜欢

转载自blog.csdn.net/a245293206/article/details/89956306