Hejing Community Data Analysis Weekly Challenge [Ninety-fifth Issue: Netflix Stock Price Prediction and Analysis]
Article Directory
- Hejing Community Data Analysis Weekly Challenge [Ninety-fifth Issue: Netflix Stock Price Prediction and Analysis]
I. Introduction
This week's challenge content is: Netflix stock price forecast analysis
You can go to my and whale homepage to view this project.
1. Background description
This data set collects the stock price data of Netflix from 2002 to 2022, and the data source is Yahoo Finance.
Netflix (NFLX) is a very successful company in the field of video streaming media, operating one of the world's largest video streaming subscription platforms (currently has more than 230 million paid members).
Its business model is also relatively simple. It mainly earns income by charging customers monthly or annual membership fees. After customers subscribe to the membership, they can obtain TV series and movies of various types and languages on the Netflix platform.
Netflix currently provides video streaming services to customers in more than 190 countries, and customers can watch Netflix content anytime, anywhere, and on any device.
2. Data Description
field | illustrate |
---|---|
Date | date |
Open | The opening price is the price at which a financial security begins trading in the market. |
High | Refers to the highest trading price of a stock within a period of time. |
Low | Refers to the lowest trading price of a stock during a period. |
Close | Closing price generally refers to the last trading price of a stock during normal trading hours. |
Adj Close | Adjusted closing price, which refers to a stock's closing price corrected to reflect the stock's value after accounting, uses the closing price as a starting point, but it takes into account factors such as dividends, stock splits and new stock issuances to determine value. The adjusted closing price represents a more accurate reflection of the stock's value. |
Volume | Volume/Volume, which measures the number of shares traded for a stock or the number of contracts traded for a futures or options. |
3. Dataset preview
The data set comes from this activity provided:
2. Data reading and data preprocessing
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
# 加载数据集
df = pd.read_csv('/content/Netflix Stock Price Data set 2002-2022.csv')
# 将日期列转换为datetime类型
df['Date'] = pd.to_datetime(df['Date'])
df
3. Visualization of historical stock price data
plt.figure(figsize=(12, 6))
plt.plot(df['Date'], df['Adj Close'])
plt.xlabel('Date')
plt.ylabel('Adjusted Close Price')
plt.title('Netflix Stock Price')
plt.grid(True)
plt.show()
4. Use the LinearRegression model in sklearn for stock price prediction analysis
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# 创建特征和目标变量
X = df[['Open', 'High', 'Low', 'Volume']]
y = df['Adj Close']
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 使用线性回归模型进行训练和预测
model = LinearRegression()
model.fit(X_train, y_train)
predicted_prices = model.predict(X_test)
# 可视化预测结果
plt.figure(figsize=(12, 6))
plt.plot(df['Date'], df['Adj Close'], label='Actual')
plt.plot(df['Date'].iloc[-len(y_test):], predicted_prices, label='Predicted')
plt.xlabel('Date')
plt.ylabel('Adjusted Close Price')
plt.title('Netflix Stock Price Prediction')
plt.legend()
plt.grid(True)
plt.show()
5. Use the LSTM model in Pytorch for stock price prediction analysis
1. Set random seed and check GPU availability
import torch
import torch.nn as nn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
# 设置随机种子
torch.manual_seed(42)
# 检查GPU可用性
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device
device(type='cuda')
2. Data preprocessing
# 选择调整后的收盘价作为目标变量
data = df['Adj Close'].values.reshape(-1, 1)
# 归一化数据
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(data)
# 划分训练集和测试集
train_size = int(len(scaled_data) * 0.8)
train_data = scaled_data[:train_size, :]
test_data = scaled_data[train_size:, :]
# 创建训练集和测试集的特征和目标变量
def create_dataset(data, lookback):
X, y = [], []
for i in range(len(data) - lookback):
X.append(data[i:i+lookback, 0])
y.append(data[i+lookback, 0])
return np.array(X), np.array(y)
lookback = 60 # 使用前60个时间步作为输入特征
X_train, y_train = create_dataset(train_data, lookback)
X_test, y_test = create_dataset(test_data, lookback)
# 转换数据为PyTorch张量,并移动到GPU上
X_train = torch.from_numpy(X_train).float().to(device)
y_train = torch.from_numpy(y_train).float().to(device)
X_test = torch.from_numpy(X_test).float().to(device)
y_test = torch.from_numpy(y_test).float().to(device)
3. Build a simple LSTM model
class LSTMModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(LSTMModel, self).__init__()
self.hidden_size = hidden_size
self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x):
out, _ = self.lstm(x)
out = self.fc(out[:, -1, :])
return out
input_size = 1
hidden_size = 64
output_size = 1
# 初始化模型,并移动到GPU上
model = LSTMModel(input_size, hidden_size, output_size).to(device)
model
LSTMModel(
(lstm): LSTM(1, 64, batch_first=True)
(fc): Linear(in_features=64, out_features=1, bias=True)
)
4. Define the loss function and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
5. Training model
num_epochs = 100
batch_size = 32
train_loss_history = []
for epoch in range(num_epochs):
for i in range(0, len(X_train), batch_size):
inputs = X_train[i:i+batch_size]
targets = y_train[i:i+batch_size]
# 前向传播
outputs = model(inputs.unsqueeze(2))
loss = criterion(outputs.squeeze(), targets)
# 反向传播和优化
optimizer.zero_grad()
loss.backward()
optimizer.step()
train_loss_history.append(loss.item())
if epoch % 10 == 0:
print(f'Epoch [{
epoch+1}/{
num_epochs}], Loss: {
loss.item():.6f}')
Epoch [1/100], Loss: 0.007953
Epoch [11/100], Loss: 0.000076
Epoch [21/100], Loss: 0.000081
Epoch [31/100], Loss: 0.000081
Epoch [41/100], Loss: 0.000136
Epoch [51/100], Loss: 0.000155
Epoch [61/100], Loss: 0.000073
Epoch [71/100], Loss: 0.000125
Epoch [81/100], Loss: 0.000320
Epoch [91/100], Loss: 0.000099
6. Test model
model.eval()
with torch.no_grad():
test_inputs = X_test.unsqueeze(2)
test_outputs = model(test_inputs)
test_loss = criterion(test_outputs.squeeze(), y_test)
predicted_prices = scaler.inverse_transform(test_outputs.cpu().numpy())
print(f'Test Loss: {
test_loss.item():.6f}')
Test Loss: 0.000395
7. Visualize training loss
plt.figure(figsize=(12, 6))
plt.plot(train_loss_history, label='Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss History')
plt.legend()
plt.show()
8. Visualize prediction results
plt.figure(figsize=(12, 6))
plt.plot(df['Date'][train_size+lookback:], scaler.inverse_transform(test_data[lookback:]), label='Actual')
plt.plot(df['Date'][train_size+lookback:], predicted_prices, label='Predicted')
plt.xlabel('Date')
plt.ylabel('Stock Price')
plt.title('Netflix Stock Price Prediction')
plt.legend()
plt.show()