Hejing Community Data Analysis Weekly Challenge [Ninety-fifth Issue: Netflix Stock Price Prediction and Analysis]

Hejing Community Data Analysis Weekly Challenge [Ninety-fifth Issue: Netflix Stock Price Prediction and Analysis]

I. Introduction

This week's challenge content is: Netflix stock price forecast analysis

You can go to my and whale homepage to view this project.

insert image description here

1. Background description

This data set collects the stock price data of Netflix from 2002 to 2022, and the data source is Yahoo Finance.

Netflix (NFLX) is a very successful company in the field of video streaming media, operating one of the world's largest video streaming subscription platforms (currently has more than 230 million paid members).
Its business model is also relatively simple. It mainly earns income by charging customers monthly or annual membership fees. After customers subscribe to the membership, they can obtain TV series and movies of various types and languages ​​on the Netflix platform.
Netflix currently provides video streaming services to customers in more than 190 countries, and customers can watch Netflix content anytime, anywhere, and on any device.

2. Data Description

field illustrate
Date date
Open The opening price is the price at which a financial security begins trading in the market.
High Refers to the highest trading price of a stock within a period of time.
Low Refers to the lowest trading price of a stock during a period.
Close Closing price generally refers to the last trading price of a stock during normal trading hours.
Adj Close Adjusted closing price, which refers to a stock's closing price corrected to reflect the stock's value after accounting, uses the closing price as a starting point, but it takes into account factors such as dividends, stock splits and new stock issuances to determine value. The adjusted closing price represents a more accurate reflection of the stock's value.
Volume Volume/Volume, which measures the number of shares traded for a stock or the number of contracts traded for a futures or options.

3. Dataset preview

The data set comes from this activity provided:

insert image description here

2. Data reading and data preprocessing

import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime

# 加载数据集
df = pd.read_csv('/content/Netflix Stock Price Data set 2002-2022.csv')

# 将日期列转换为datetime类型
df['Date'] = pd.to_datetime(df['Date'])
df

insert image description here

3. Visualization of historical stock price data

plt.figure(figsize=(12, 6))
plt.plot(df['Date'], df['Adj Close'])
plt.xlabel('Date')
plt.ylabel('Adjusted Close Price')
plt.title('Netflix Stock Price')
plt.grid(True)
plt.show()

insert image description here

4. Use the LinearRegression model in sklearn for stock price prediction analysis

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression 

# 创建特征和目标变量
X = df[['Open', 'High', 'Low', 'Volume']]
y = df['Adj Close']

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 使用线性回归模型进行训练和预测
model = LinearRegression()
model.fit(X_train, y_train)
predicted_prices = model.predict(X_test)

# 可视化预测结果
plt.figure(figsize=(12, 6))
plt.plot(df['Date'], df['Adj Close'], label='Actual')
plt.plot(df['Date'].iloc[-len(y_test):], predicted_prices, label='Predicted')
plt.xlabel('Date')
plt.ylabel('Adjusted Close Price')
plt.title('Netflix Stock Price Prediction')
plt.legend()
plt.grid(True)
plt.show()

insert image description here

5. Use the LSTM model in Pytorch for stock price prediction analysis

1. Set random seed and check GPU availability

import torch
import torch.nn as nn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

# 设置随机种子
torch.manual_seed(42)

# 检查GPU可用性
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device
device(type='cuda')

2. Data preprocessing

# 选择调整后的收盘价作为目标变量
data = df['Adj Close'].values.reshape(-1, 1)

# 归一化数据
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(data)

# 划分训练集和测试集
train_size = int(len(scaled_data) * 0.8)
train_data = scaled_data[:train_size, :]
test_data = scaled_data[train_size:, :]

# 创建训练集和测试集的特征和目标变量
def create_dataset(data, lookback):
    X, y = [], []
    for i in range(len(data) - lookback):
        X.append(data[i:i+lookback, 0])
        y.append(data[i+lookback, 0])
    return np.array(X), np.array(y)

lookback = 60  # 使用前60个时间步作为输入特征
X_train, y_train = create_dataset(train_data, lookback)
X_test, y_test = create_dataset(test_data, lookback)

# 转换数据为PyTorch张量,并移动到GPU上
X_train = torch.from_numpy(X_train).float().to(device)
y_train = torch.from_numpy(y_train).float().to(device)
X_test = torch.from_numpy(X_test).float().to(device)
y_test = torch.from_numpy(y_test).float().to(device)

3. Build a simple LSTM model

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(LSTMModel, self).__init__()
        self.hidden_size = hidden_size
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, _ = self.lstm(x)
        out = self.fc(out[:, -1, :])
        return out

input_size = 1
hidden_size = 64
output_size = 1

# 初始化模型,并移动到GPU上
model = LSTMModel(input_size, hidden_size, output_size).to(device)
model
LSTMModel(
  (lstm): LSTM(1, 64, batch_first=True)
  (fc): Linear(in_features=64, out_features=1, bias=True)
)

4. Define the loss function and optimizer

criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

5. Training model

num_epochs = 100
batch_size = 32
train_loss_history = []

for epoch in range(num_epochs):
    for i in range(0, len(X_train), batch_size):
        inputs = X_train[i:i+batch_size]
        targets = y_train[i:i+batch_size]

        # 前向传播
        outputs = model(inputs.unsqueeze(2))
        loss = criterion(outputs.squeeze(), targets)

        # 反向传播和优化
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    train_loss_history.append(loss.item())

    if epoch % 10 == 0:
        print(f'Epoch [{
      
      epoch+1}/{
      
      num_epochs}], Loss: {
      
      loss.item():.6f}')
Epoch [1/100], Loss: 0.007953
Epoch [11/100], Loss: 0.000076
Epoch [21/100], Loss: 0.000081
Epoch [31/100], Loss: 0.000081
Epoch [41/100], Loss: 0.000136
Epoch [51/100], Loss: 0.000155
Epoch [61/100], Loss: 0.000073
Epoch [71/100], Loss: 0.000125
Epoch [81/100], Loss: 0.000320
Epoch [91/100], Loss: 0.000099

6. Test model

model.eval()
with torch.no_grad():
    test_inputs = X_test.unsqueeze(2)
    test_outputs = model(test_inputs)
    test_loss = criterion(test_outputs.squeeze(), y_test)
    predicted_prices = scaler.inverse_transform(test_outputs.cpu().numpy())

print(f'Test Loss: {
      
      test_loss.item():.6f}')
Test Loss: 0.000395

7. Visualize training loss

plt.figure(figsize=(12, 6))
plt.plot(train_loss_history, label='Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss History')
plt.legend()
plt.show()

insert image description here

8. Visualize prediction results

plt.figure(figsize=(12, 6))
plt.plot(df['Date'][train_size+lookback:], scaler.inverse_transform(test_data[lookback:]), label='Actual')
plt.plot(df['Date'][train_size+lookback:], predicted_prices, label='Predicted')
plt.xlabel('Date')
plt.ylabel('Stock Price')
plt.title('Netflix Stock Price Prediction')
plt.legend()
plt.show()

insert image description here

Guess you like

Origin blog.csdn.net/qq_52417436/article/details/131394823