Python Stock Analysis Series - stock data base operations (b)
This video series has been transported to bilibili: Click to view
Welcome to Part 4 Python for Finance tutorial series. In this tutorial, we'll create Candlestick / OHLC chart based Adj Close column, which would allow me to introduce re-sampling and other data visualization concept.
FIG OHLC Candlestick Chart called the opening price of an Open, the highest price High, Low and lowest all Close dataset closing price of a good in a graph format. In addition, it makes a beautiful color, and remember I told you about the appearance of the charts?
In the previous tutorial we have been involved in this:
import datetime as dt import matplotlib.pyplot as plt from matplotlib import style import pandas as pd import pandas_datareader.data as web style.use('ggplot') df = pd.read_csv('tsla.csv', parse_dates=True, index_col=0)
Unfortunately, even if you create OHLC data directly from the Pandas produced Candlestick is not built-in. I am convinced that the future of this type of chart will be provided, but not now. It does not matter, we can make it! First, we need to import two new libraries:
from matplotlib.finance import candlestick_ohlc import matplotlib.dates as mdates
The first is the introduction of OHLC matplotlib from picture type, is introduced into the second type of special mdates, it mostly just a pain in the ass, but this is a date type matplotlib pattern. pandas will be automatically processed for you, but like I said, we do not have the luxury candlestick.
First, we need proper OHLC data. Our current data does have value OHLC, unless I'm wrong, Tesla has never been split, but you will never be so lucky. Therefore, we will create our own OHLC data, which will also allow us to show another from Pandas data conversion:
df_ohlc = df['Adj Close'].resample('10D').ohlc()
Here we have done is to create a basis df [ 'Adj Close'] column of the new data frame, the 10-day window re-encapsulated, and a resampling is OHLC (open level off). We can also make the sum of the average of 10 days or 10 days with .mean () or .sum (). Remember, this 10-day average is the average of 10 days, rather than the average. Since our data are daily data, thus re-sampling of 10 days of data will significantly reduce the size of the data. This is how you can standardize multiple data sets. Sometimes, you may record monthly data recorded in the month, every month, the other end of each month the data record and, ultimately, some of the data were recorded weekly. You can re-sampling each month to the end of the frame data, and effectively standardized! If you like it, which is more advanced features of the Panda, you can learn more from the panda family.
We want to draw candlestick data and volume data. We do not have to re-sampled data, but we should, because it is too delicate compared to our 10D pricing data.
df_volume = df['Volume'].resample('10D').sum()
Here we use the money, because we really want to know the total amount of this transaction within 10 days, but you can also use average. Now if we do this:
print(df_ohlc.head())
we got:
open high low close Date 2010-06-29 23.889999 23.889999 15.800000 17.459999 2010-07-09 17.400000 20.639999 17.049999 20.639999 2010-07-19 21.910000 21.910000 20.219999 20.719999 2010-07-29 20.350000 21.950001 19.590000 19.590000 2010-08-08 19.600000 19.600000 17.600000 19.150000
This is expected, however, we now want to move to matplotlib this information, and the date is converted to mdates version. Since we just want to draw in a column in Matplotlib, so we actually do not want the date to be indexed, so we can do this:
df_ohlc = df_ohlc.reset_index()
The date now just an ordinary column. Next, we want to convert it:
df_ohlc ['Date'] = df_ohlc ['Date']。map(mdates.date2num)
Now we want to set this number:
fig = plt.figure() ax1 = plt.subplot2grid((6,1),(0,0),rowspan = 5,colspan = 1) ax2 = plt.subplot2grid((6,1),(5,0),rowspan = 1,colspan = 1,sharex = ax1) ax1.xaxis_date()
In addition to ax1.xaxis_date (), you've seen all the content. This is the shaft into a date from the original generation number for us.
Now we can draw candlestick chart:
candlestick_ohlc(ax1,df_ohlc.values,width = 2,colorup ='g')
Then do the amount of:
ax2.fill_between(df_volume.index.map(mdates.date2num),df_volume.values,0)
fill_between function will draw x, y, and then filling contents / between. In our example, we select 0.
plt.show()
Complete code:
import datetime as dt import matplotlib.pyplot as plt from matplotlib import style from matplotlib.finance import candlestick_ohlc import matplotlib.dates as mdates import pandas as pd import pandas_datareader.data as web style.use('ggplot') df = pd.read_csv('tsla.csv', parse_dates=True, index_col=0) df_ohlc = df['Adj Close'].resample('10D').ohlc() df_volume = df['Volume'].resample('10D').sum() df_ohlc.reset_index(inplace=True) df_ohlc['Date'] = df_ohlc['Date'].map(mdates.date2num) ax1 = plt.subplot2grid((6,1), (0,0), rowspan=5, colspan=1) ax2 = plt.subplot2grid((6,1), (5,0), rowspan=1, colspan=1,sharex = ax1) ax1.xaxis_date () candlestick_ohlc(ax1, df_ohlc.values, width=5, colorup='g') ax2.fill_between(df_volume.index.map(mdates.date2num), df_volume.values, 0) plt.show()
This video series has been transported to bilibili: Click to view