python stock data analysis

1. Getting to know Pandas for the first time

Pandas is a very useful library based on NumPy. It has two unique basic data structures Series (one-dimensional) and DataFrame (two-dimensional), which make data manipulation easier. Although Pandas has two data structures, it is still a library of Python, so some data types in Python are still applicable here, and you can also use classes to define data types yourself.

In the field of financial data analysis, Pandas plays a very important role, such as for quantitative trading. Pandas incorporates a large number of libraries and some standard data models, providing the tools needed to efficiently manipulate large data sets, and it is also very suitable for processing billions of data.

Second, the basic operation of Pandas

1. Creating a
Series There are three main ways to create a Series:

1) Create a Series from a one-dimensional array
import numpy as npimport pandas as pd#创建一维数组a = np.arange(10)
print(a)
s = pd.Series(a)
print(s)

The output is as follows:



2) Create a Series by means of a dictionary
import numpy as npimport pandas as pd#创建字典d = {'a':1,'b':2,'c':3,'d':4,'e':5}
print(d)

s = pd.Series(d)
print(s)

The output is as follows:



3) Create a Series from a row or column in the DataFrame

See s = df3['one'] in the third way to create a DataFrame below.

2. Creation of DataFrame

There are three main ways to create a DataFrame:

1) Create a DataFrame from a 2D array
import numpy as npimport pandas as pd#创建二维数组a = np.array(np.arange(12)).reshape(3,4)
print(a)

df1 = pd.DataFrame(a)
print(df1)

The output is as follows:



2) Create a DataFrame by means of a dictionary

The following creates a data frame with two dictionaries, one is a list of dictionaries and the other is a nested dictionary.

import numpy as npimport pandas as pd

d1 = {'a':[1,2,3,4],'b':[5,6,7,8],'c':[9,10,11,12],'d':[14,14,15,16]}
print(d1)

df1 = pd.DataFrame(d1)
print(df1)

d2 = {'one':{'a':1,'b':2,'c':3,'d':4},'two':{'a':5,'b':6,'c':7,'d':8},'three':{'a':9,'b':10,'c':11,'d':12}}
print(d2)

df2 = pd.DataFrame(d2)
print(df2)

The output is as follows:



3) Create DataFrame by way of DataFrame

We take out df2 in 2) to create df3

df2 = pd.DataFrame(d2)print(df2)

df3 = df2[['one','two']]print(df3)

s = df3['one']print(s)

The output is as follows:



3. Processing stock data

Next, we use examples to learn the application of Pandas in processing stock data.
We use pandas_datareader to get Alibaba stock data.

1) Import the following libraries:
import pandas as pdimport pandas_datareader.data as web#绘图使用import matplotlib.pyplot as plt#获取时间使用import datetime
2) Set stock name and time parameters
name = "BABA"start = datetime.datetime(2015,1,1)end = datetime.date.today()
3) Get stock data
prices = web.DataReader(name, "google", start, end)
4) View the type of prices
print(type(prices))

It prints as follows:

<class 'pandas.core.frame.DataFrame'>

You can see that the returned data type is the DataFrame type.

5) View summary information for stocks
print(prices.describe()

It prints as follows:

             Open        High         Low       Close        Volumecount  791.000000  791.000000  792.000000  792.000000  7.920000e+02mean   106.632099  107.793186  105.355164  106.614520  1.610571e+07std     38.191772   38.539981   37.719848   38.156416  9.941683e+06min     57.300000   58.650000   57.200000   57.390000  2.457439e+06
25%     79.855000   80.945000   79.157500   79.935000  1.003487e+07
50%     91.000000   91.740000   89.925000   90.705000  1.350020e+07
75%    119.315000  120.400000  118.462500  120.205000  1.879724e+07max    204.830000  206.200000  202.800000  205.220000  9.704593e+07

Then print the latest three pieces of information

print(prices.tail(3))
              Open    High     Low   Close    VolumeDate                                                2018-02-21  189.37  193.17  188.46  188.82  22071585
2018-02-22  190.20  190.74  187.77  188.75  12282843
2018-02-23  190.18  193.40  189.95  193.29  16937275
6) Drawing

We plot Alibaba's stock data against the opening price.

plt.plot(prices.index, prices["Open"])plt.show()



From the figure, we can see that Alibaba's stock has been climbing all the way. If you look carefully, you will find that there is a high point in November every year.

四、总结

Pandas是以NumPy和Matplotlib为基础封装的金融数据分析的库,对于量化交易十分有用,通过可视化的效果能帮我们一定程度分析股市的走向。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325720123&siteId=291194637