1. Getting to know Pandas for the first time
Pandas is a very useful library based on NumPy. It has two unique basic data structures Series (one-dimensional) and DataFrame (two-dimensional), which make data manipulation easier. Although Pandas has two data structures, it is still a library of Python, so some data types in Python are still applicable here, and you can also use classes to define data types yourself.
In the field of financial data analysis, Pandas plays a very important role, such as for quantitative trading. Pandas incorporates a large number of libraries and some standard data models, providing the tools needed to efficiently manipulate large data sets, and it is also very suitable for processing billions of data.
Second, the basic operation of Pandas
1. Creating a
Series There are three main ways to create a Series:
1) Create a Series from a one-dimensional array
import numpy as npimport pandas as pd#创建一维数组a = np.arange(10)
print(a)
s = pd.Series(a)
print(s)
The output is as follows:
2) Create a Series by means of a dictionary
import numpy as npimport pandas as pd#创建字典d = {'a':1,'b':2,'c':3,'d':4,'e':5}
print(d)
s = pd.Series(d)
print(s)
The output is as follows:
3) Create a Series from a row or column in the DataFrame
See s = df3['one'] in the third way to create a DataFrame below.
2. Creation of DataFrame
There are three main ways to create a DataFrame:
1) Create a DataFrame from a 2D array
import numpy as npimport pandas as pd#创建二维数组a = np.array(np.arange(12)).reshape(3,4)
print(a)
df1 = pd.DataFrame(a)
print(df1)
The output is as follows:
2) Create a DataFrame by means of a dictionary
The following creates a data frame with two dictionaries, one is a list of dictionaries and the other is a nested dictionary.
import numpy as npimport pandas as pd
d1 = {'a':[1,2,3,4],'b':[5,6,7,8],'c':[9,10,11,12],'d':[14,14,15,16]}
print(d1)
df1 = pd.DataFrame(d1)
print(df1)
d2 = {'one':{'a':1,'b':2,'c':3,'d':4},'two':{'a':5,'b':6,'c':7,'d':8},'three':{'a':9,'b':10,'c':11,'d':12}}
print(d2)
df2 = pd.DataFrame(d2)
print(df2)
The output is as follows:
3) Create DataFrame by way of DataFrame
We take out df2 in 2) to create df3
df2 = pd.DataFrame(d2)print(df2)
df3 = df2[['one','two']]print(df3)
s = df3['one']print(s)
The output is as follows:
3. Processing stock data
Next, we use examples to learn the application of Pandas in processing stock data.
We use pandas_datareader to get Alibaba stock data.
1) Import the following libraries:
import pandas as pdimport pandas_datareader.data as web#绘图使用import matplotlib.pyplot as plt#获取时间使用import datetime
2) Set stock name and time parameters
name = "BABA"start = datetime.datetime(2015,1,1)end = datetime.date.today()
3) Get stock data
prices = web.DataReader(name, "google", start, end)
4) View the type of prices
print(type(prices))
It prints as follows:
<class 'pandas.core.frame.DataFrame'>
You can see that the returned data type is the DataFrame type.
5) View summary information for stocks
print(prices.describe()
It prints as follows:
Open High Low Close Volumecount 791.000000 791.000000 792.000000 792.000000 7.920000e+02mean 106.632099 107.793186 105.355164 106.614520 1.610571e+07std 38.191772 38.539981 37.719848 38.156416 9.941683e+06min 57.300000 58.650000 57.200000 57.390000 2.457439e+06
25% 79.855000 80.945000 79.157500 79.935000 1.003487e+07
50% 91.000000 91.740000 89.925000 90.705000 1.350020e+07
75% 119.315000 120.400000 118.462500 120.205000 1.879724e+07max 204.830000 206.200000 202.800000 205.220000 9.704593e+07
Then print the latest three pieces of information
print(prices.tail(3))
Open High Low Close VolumeDate 2018-02-21 189.37 193.17 188.46 188.82 22071585
2018-02-22 190.20 190.74 187.77 188.75 12282843
2018-02-23 190.18 193.40 189.95 193.29 16937275
6) Drawing
We plot Alibaba's stock data against the opening price.
plt.plot(prices.index, prices["Open"])plt.show()
From the figure, we can see that Alibaba's stock has been climbing all the way. If you look carefully, you will find that there is a high point in November every year.
四、总结
Pandas是以NumPy和Matplotlib为基础封装的金融数据分析的库,对于量化交易十分有用,通过可视化的效果能帮我们一定程度分析股市的走向。