pandas基础(part1)--Series

学习笔记,这个笔记以例子为主。
开发工具:Spyder



pandas介绍

pandas是基于NumPy 的一种工具,该工具是为了解决数据分析任务而创建的。Pandas 纳入 了大量库和一些标准的数据模型,提供了高效地操作大型结构化数据集所需的工具。

Series

Series可以理解为一个一维的数组,只是index名称可以自己改动。类似于定长的有序字典,有Index和 value。

创建Series

  • 语法
import pandas as pd

# 创建一个空的系列
s = pd.Series()
# 从ndarray创建一个系列
data = np.array(['a','b','c','d'])
s = pd.Series(data)
s = pd.Series(data,index=[100,101,102,103])
# 从字典创建一个系列	
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
# 从标量创建一个系列
s = pd.Series(5, index=[0, 1, 2, 3])
  • 例子

代码1(从ndarray创建一个系列):

import numpy as np
import pandas as pd

data = np.array(['Ada', 'Bunny', 'Jack', 'Black'])

s1 = pd.Series(data)
print(s1)

结果1:

0      Ada
1    Bunny
2     Jack
3    Black
dtype: object

代码2(自定义index):

s2 = pd.Series(data, index = [10, 20, 30, 40])
print(s2)

结果2:

10      Ada
20    Bunny
30     Jack
40    Black
dtype: object

代码3(从字典创建一个系列):

data = {"a":0, "b":1, "c":2, 'e':3}
#字典的key为Series的index
s3 = pd.Series(data)
print(s3)

结果3:

a    0
b    1
c    2
e    3
dtype: int64

代码4(从标量创建一个系列):

s4 = pd.Series(10, index = [0, 1, 2, 3])
print(s4)

结果4:

0    10
1    10
2    10
3    10
dtype: int64

访问Series中的数据

  • 语法
# 使用索引检索元素
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s[0], s[:3], s[-3:])
# 使用标签检索数据
print(s['a'], s[['a','c','d']])
  • 例子

代码1:

import numpy as np
import pandas as pd

data = np.array(['Ada', 'Bunny', 'Jack', 'Black'])

s = pd.Series(data, index = ["a", "b", "c", "d"])
print(s[0], '\n\n',s[:3],'\n\n', s[-3: ])

结果1:

Ada 

 a      Ada
b    Bunny
c     Jack
dtype: object 

 b    Bunny
c     Jack
d    Black
dtype: object

代码2:

print(s["a"], '\n\n',s[["a", "b", "c"]])

结果2:

Ada 

 a      Ada
b    Bunny
c     Jack
dtype: object

pandas日期处理

  • 语法
# pandas可以识别的日期字符串格式
dates = pd.Series(['2011', '2011-02', '2011-03-01', '2011/04/01', '2011/05/01 01:01:01', '01 Jun 2011'])
# to_datetime()方法可以转换为日期数据类型
dates = pd.to_datetime(dates)
  • 例子

代码1(识别日期):

import numpy as np
import pandas as pd

dates = pd.Series(['1997', '2015-09', '2019-03-01',
                   '2019/04/01', '2019/05/01 01:01:01',
                   '01 Jun 2019'])

print(dates)
print("-"*20)
dates = pd.to_datetime(dates)
print(dates)

结果1:

0                   1997
1                2015-09
2             2019-03-01
3             2019/04/01
4    2019/05/01 01:01:01
5            01 Jun 2019
dtype: object
--------------------
0   1997-01-01 00:00:00
1   2015-09-01 00:00:00
2   2019-03-01 00:00:00
3   2019-04-01 00:00:00
4   2019-05-01 01:01:01
5   2019-06-01 00:00:00
dtype: datetime64[ns]

代码2(日期运算):

delta = dates - pd.to_datetime('1970-01-01')
print(delta)
print("-"*20)
#通过Series的dt接口,可以访问偏移量数据
print(delta.dt.days)

结果2:

0    9862 days 00:00:00
1   16679 days 00:00:00
2   17956 days 00:00:00
3   17987 days 00:00:00
4   18017 days 01:01:01
5   18048 days 00:00:00
dtype: timedelta64[ns]
--------------------
0     9862
1    16679
2    17956
3    17987
4    18017
5    18048
dtype: int64

Series.dt提供了很多日期相关操作, 部分操作如下:

Series.dt的日期相关操作 含义
Series.dt.year The year of the datetime.
Series.dt.month The month as January=1, December=12.
Series.dt.day The days of the datetime.
Series.dt.hour The hours of the datetime.
Series.dt.minute The minutes of the datetime.
Series.dt.second The seconds of the datetime.
Series.dt.microsecond The microseconds of the datetime.
Series.dt.week The week ordinal of the year.
Series.dt.weekofyear The week ordinal of the year.
Series.dt.dayofweek The day of the week with Monday=0, Sunday=6.
Series.dt.weekday The day of the week with Monday=0, Sunday=6.
Series.dt.dayofyear The ordinal day of the year.
Series.dt.quarter The quarter of the date.
Series.dt.is_month_start Indicates whether the date is the first day of the month.
Series.dt.is_month_end Indicates whether the date is the last day of the month.
Series.dt.is_quarter_start Indicator for whether the date is the first day of a quarter.
Series.dt.is_quarter_end Indicator for whether the date is the last day of a quarter.
Series.dt.is_year_start Indicate whether the date is the first day of a year.
Series.dt.is_year_end Indicate whether the date is the last day of the year.
Series.dt.is_leap_year Boolean indicator if the date belongs to a leap year.
Series.dt.days_in_month The number of days in the month.

代码3(dt接口的各项操作演示):

print(dates.dt.month)

结果3:

0    1
1    9
2    3
3    4
4    5
5    6
dtype: int64

DateTimeIndex

通过指定周期和频率,使用pd.date_range()函数就可以创建日期序列。

  • 语法
import pandas as pd
# 以日为频率(默认值), 2019/08/21为起始,创建5个时间数据
datelist = pd.date_range('2019/08/21', periods = 5)

# 以月为频率
datelist = pd.date_range('2019/08/21', periods=5,freq='M')

# 构建某个区间的时间序列
start = pd.datetime(2017, 11, 1)
end = pd.datetime(2017, 11, 5)
dates = pd.date_range(start, end)
  • 例子

代码1:

import numpy as np
import pandas as pd


dates1 = pd.date_range('2020-01-01', periods = 5,
                       freq = 'D')
print(dates1)

print("-"*20)

dates2 = pd.date_range('2015-01-10', periods = 5,
                       freq = 'M')
print(dates2)

print("-"*20)

start_num = pd.datetime(2019, 1, 1)
end_num = pd.datetime(2019, 1, 5)
dates3 = pd.date_range(start_num, end_num)
print(dates3)

结果1:

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05'],
              dtype='datetime64[ns]', freq='D')
--------------------
DatetimeIndex(['2015-01-31', '2015-02-28', '2015-03-31', '2015-04-30',
               '2015-05-31'],
              dtype='datetime64[ns]', freq='M')
--------------------
DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-05'],
              dtype='datetime64[ns]', freq='D')

代码2:

dates1 = pd.bdate_range('2020-01-01', periods = 10)
print(dates1)

备注bdate_range()用来表示商业日期范围,不同于date_range(),它不包括星期六和星期天。

结果2:

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-06',
               '2020-01-07', '2020-01-08', '2020-01-09', '2020-01-10',
               '2020-01-13', '2020-01-14'],
              dtype='datetime64[ns]', freq='B')
发布了141 篇原创文章 · 获赞 24 · 访问量 8万+

猜你喜欢

转载自blog.csdn.net/m0_37422217/article/details/105217225