pandas study notes (windows system)

pandas study notes the following personal learning record only if it shows a reference source

1. Install pandas

IDE can be mounted pip python, the first line of code following to the current python installation directory on the computer and then copied to the TXT file saved as a bat file after running directly click
CD C: \ Program Files \ Python38
python -m pip the install pip --upgrade
python -m pip install numpy
python -m pip install PANDAS
Description: the first line to run python directory, the second behavior pip local version upgrade, the following two lines to install module

2.pandas structure:

The core structure for the Series pandas and DataFrame
Series for the one-dimensional array, and a data structure like the array numpy Only one of each type of data, can not have two or more types of data simultaneously. Such as: can only be int, str, float in one, so that the benefits can improve access speed
Times-Series: Time-indexed Series
DataFrame two-dimensional Series

3.Series creation:

3.1 created by the one-dimensional array:

#! /user/bin/env python
#_*_coding:utf-8 -*_
#__author__ = '株洲市易美智能工程有限责任公司'
#Email:[email protected]

import numpy as np
import pandas as pd
series = pd.Series([x for x in range(10,20,1)],index = None)#列表推导式创建数组
# series = pd.Series(['A','B','C','D'],index = [x for x in range(4)])#列表创建
print(series,series.dtype)

Creating a second election on it, behind the index can be omitted, if omitted, the default is None on
output not write
3.2 way to create the dictionary:

#! /user/bin/env python
#_*_coding:utf-8 -*_
#__author__ = '株洲市易美智能工程有限责任公司'
#Email:[email protected]

import numpy as np
import pandas as pd
series = pd.Series({'A':[x for x in range(5)],'B':[x for x in range(20,30,2)]})
print(series)

4.DataFrame creation:

DataFrame is a tabular structure, I think, and XLS bit like
DataFrame each row and each column is a Series
DataFrame of creation:
4.1 to create a two-dimensional array:

#! /user/bin/env python
#_*_coding:utf-8 -*_
#__author__ = '株洲市易美智能工程有限责任公司'
#Email:[email protected]

import numpy as np
import pandas as pd
Data = pd.DataFrame(np.array(np.arange(100).reshape(20,-1)))
print(Data)

4.2 way to create a dictionary

#! /user/bin/env python
#_*_coding:utf-8 -*_
#__author__ = '株洲市易美智能工程有限责任公司'
#Email:[email protected]

import numpy as np
import pandas as pd
Data = pd.DataFrame({'A':[x for x in range(10)],'B':['A',True,'C','D',88,'F','G','H','I',5]})
print(Data)

Here is not required when creating the same type of data columns must be the same length

5. Import stock list

5.1 Poly width data import
mounting jqdatasdk
installed anadonda and get, and then open anadonda promt et cd anaconda target directory
then copy the code to run

pip install git+https://github.com/JoinQuant/jqdatasdk.git -i https://mirrors.aliyun.com/pypi/simple/
#! /user/bin/env python
#_*_coding:utf-8 -*_
#__author__ = '株洲市易美智能工程有限责任公司'
#Email:[email protected]

import numpy as np
import pandas as pd
import jqdatasdk as jq
x = jq.auth('***********','******')#导入账号密码
data = jq.get_price('000001.XSHE',start_date = '2020-01-01',end_date = '2020-02-29')#查询代码000001,从2020-01-01 到了2020-02-29的数据

Get stock data sdk has QUANTAXIS, baostock, Tushare, OpenDataTools etc.

6.DataFrame operation

**这个代码在使用时请确定哪一行要用哪一行不要用,否则报错**
#! /user/bin/env python
#_*_coding:utf-8 -*_
#__author__ = '株洲市易美智能工程有限责任公司'
#Email:[email protected]

import pandas as pd
import numpy as np
import jqdatasdk as jq
x = jq.auth('1887318','00aa')
data = jq.get_price('000001.XSHE',start_date = '2020-01-01',end_date = '2020-02-29')
# for x in data.columns:
    print(x, end = '  ') #列出所有的列名
print('\n')
print(data[['close','high']].head(n= 15)) #查看多列的前15行数据,如果n = 15不写默认为5
print(data.close.tail(n = 15))#查看单列的后15个数据,如果n = 15不写默认为5
print(data.index)#输出所有的行名
data = data.T #行列翻转
data.columns = ['a','b','c','d','e','f']#列名改名,如果是这样改的话要所有列名要在列表内,否则报错
data = data.rename(columns = {'open':'o'})#列改名,如果只要改一个要做成字典
Series = data.open #把open行改为Series
data1 = data.copy()#复制DataFrame,这个只是简单的复制,如果有一个数据改变另外一个同时改变
print(data)
data2 = data[['open','close']]#这个列深层的切片,源数据改变不影响另一个,如果这个是只有用一对[]的话返回Series且不支持两个或以上的
# columns也不是深度拷贝
data3 = data['2020-02-07':'2020-02-17']
data3 = data[1:5]#DataFrame 行切片可以用行名和行序号,切片方法如果是行名的话两头都包括,如果是序列的话只包前不包后
data4 = data.iloc[0:5,0:15]#iloc,为index locate 用index索引进行定位,所以参数是整型,索引会在范围内寻找,
# 如果没有的话会忽略为空
data5 = data.loc['2020-01-02':'2020-03-12','open':'low']#loc 为columns索引参数必须为columns,如果是数字索引基本结构
# 必须保持不要求精确,DataFrame会在这个范围内寻找,如果是字母索引必须是正确的索引号,否则报错
data6 = data[1:5]#这种方式只能行索引切片
data7= data.iat[3,1]#这个iat 和上面的iloc的i是一样的为index
***data8 = data.at[5,'open']#这个at按理前面是index 后面是columns不知哪里不对***
data9 = data[data.colose <16] #按条件取行,返回所有符合条件的行

**这个代码在使用时请确定哪一行要用哪一行不要用,否则报错**

DataFrame sort

#! /user/bin/env python
#_*_coding:utf-8 -*_
#__author__ = '株洲市易美智能工程有限责任公司'
#Email:[email protected]
import pandas as pd
import numpy as np
import jqdatasdk as jq
jq.auth('1887','0aa')
data = jq.get_price('000001.XSHE',start_date = '2020-01-01',end_date = '2020-02-29')
data1 = data.sort_values(by = ['close','open'],ascending= [True,False])#排序,close 对应True,open对应False,
# 这个排序方式是先用close 以升序排列再用open以降序排列,axis没写默认行排列

DataFrame insert delete columns

#! /user/bin/env python
#_*_coding:utf-8 -*_
#__author__ = '株洲市易美智能工程有限责任公司'
#Email:[email protected]
import pandas as pd
import numpy as np
import jqdatasdk as jq
x = jq.auth('188718','0a')
data = jq.get_price('000001.XSHE',start_date = '2020-01-01',end_date = '2020-02-29')
data.insert(0,'aa',1) #插入列数据,前机的0为列序号,aa为列的columns,1为要插入的值,这个1可以为列表,
# 但列表的长度必须为DataFrame的列长度一至,要手入的值如果为一个值的话这个值会全列都是这一个值,如果是列表的话会按序号填充
data.insert(0,'inster',np.arange(len(data))) #插入列表
del data['close']#删除整列
data3 =data.drop('open',axis = 1)#返回删除的数据,对原数据没有影响

pandas - the concept of moving window rolling

#! /user/bin/env python
#_*_coding:utf-8 -*_
#__author__ = '株洲市易美智能工程有限责任公司'
#Email:[email protected]
import numpy as np
import pandas as pd
index = pd.date_range('2020-01-01',periods = 200)#取从2019-01-01后延后200天的日期数值
data = pd.DataFrame(np.arange(len(index)),index = index,columns = ['test'])#产生DataFrame
print(data)
data['sum'] = data.test.rolling(5).sum()#3为向前移动5位,求移动三位后的的,添加sum列
data['mean'] = data.test.rolling(5).mean()#求移动5信后的一平均值,添加mean 列
data['mean-2'] = data.test.rolling(5,min_periods = 2).mean() #求移动5位后的平均值,但可以最小求两个的平均值,添加mean-2列
print(data)

DataFram join

#! /user/bin/env python
#_*_coding:utf-8 -*_
#__author__ = '株洲市易美智能工程有限责任公司'
#Email:[email protected]

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.ones(12).reshape(3,4),columns = list('abcd'),index = list('ABC'))
df2 = pd.DataFrame(np.zeros((3,5)),columns = list('efghi'),index = list('ABC'))
df3 = pd.DataFrame(np.zeros((5,2)),columns = list('jk'),index = list('AEBGH'))

print(df2.join(df1))#两个都有相同的行的效果
print(df1.join(df3)) #df3 和df1都有A,B行就可以join在一起,df3没有C行所以join后的C行就是'NaN'

Released nine original articles · won praise 1 · views 437

Guess you like

Origin blog.csdn.net/yinghu5312/article/details/104577466