Python量化交易学习笔记(42)——深度学习挖短线股2

上篇文章介绍了深度学习挖短线股的数据预处理部分,本文将介绍模型训练的内容。

模型训练

模型训练过程主要参考Keras的官方示例Structured data classification from scratch(https://keras.io/examples/structured_data/structured_data_classification_from_scratch/),该示例展示了对结构化的数据进行分类,分别处理了特征为字符串、整形、浮点型的数据。对于当前我们所选用的股票训练特征而言,我们只需要处理浮点型特征。

首先,判断预处理后的数据是否存在,如果存在则直接读入,否则进行预处理计算:

for stk_code in stk_list:
    print('processing {}...'.format(stk_code))
    # 判断是否已经经过预处理(文件是否存在)
    data_file = './baostock/data_pre/{}.csv'.format(stk_code)
    if os.path.exists(data_file):
        df = pd.read_csv(data_file)
    else:
        df = pd.read_csv('./baostock/data_ext/{}.csv'.format(stk_code))
        df = df[df['date'] <= '2017-12-31']
        df = data_preprocessing(df, stk_code, FEATURE_N)

然后,依次获取特征的维度,分割训练集及验证集,将数据转为Dataset,并设置batch的大小。

    # 特征维度
    ft_num = df.shape[1] - 1
    # 分割训练集与验证集
    val_df = df.sample(frac=0.2, random_state=1337)
    train_df = df.drop(val_df.index)
    print(
        "Using %d samples for training and %d for validation"
        % (len(train_df), len(val_df))
    )
    # 生成tf.data.Dataset
    train_ds = dataframe_to_dataset(train_df)
    val_ds = dataframe_to_dataset(val_df)
    # 打包
    train_ds = train_ds.batch(32)
    val_ds = val_ds.batch(32)

在经过数据预处理后,我们得到的特征数据为220维(每日特征22维*10日),这些特征数据均为浮点型,参考Keras官方示例,我们调用方法encode_numerical_feature,将数据正则化为均值为0,标准差为1。

# 浮点型特征数据正则化
def encode_numerical_feature(feature, name, dataset):
    # Create a Normalization layer for our feature
    normalizer = Normalization()
    # Prepare a Dataset that only yields our feature
    feature_ds = dataset.map(lambda x, y: x[name])
    feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1))
    # Learn the statistics of the data
    normalizer.adapt(feature_ds)
    # Normalize the input feature
    encoded_feature = normalizer(feature)
    return encoded_feature
    # 特征处理
    all_inputs = []
    ft_list = []
    for i in range(ft_num):
        name = '{}'.format(i)
        ki = keras.Input(shape = (1, ), name = name)
        all_inputs.append(ki)
        ft_list.append(encode_numerical_feature(ki, name, train_ds))

接着,我们对每只股票创建模型并进行训练,将训练的结果保存到本地。这里使用了最简单的全链接模型,各层节点均较少,以节省计算时间。即便使用了简单的模型,对2600多只股票进行训练,也花费2天多的时间。训练得到的每个模型大小约5MB。代码如下:

    # 创建模型
    all_features = layers.concatenate(ft_list)
    x = layers.Dense(128, activation="relu")(all_features)
    x = layers.Dense(32, activation="relu")(x)
    x = layers.Dropout(0.5)(x)
    output = layers.Dense(1, activation="sigmoid")(x)
    model = keras.Model(all_inputs, output)
    model.compile("adam", "binary_crossentropy", metrics=["accuracy"])
    # 训练模型
    model.fit(train_ds, epochs=50, validation_data=val_ds)
    model.save('./model/{}'.format(stk_code))

最后,需要且一定要做的工作是,清理每次循环内Keras训练单只股票模型所占用的内存。如果不添加下面这行代码的话,随着训练的进行,程序会把内存刷爆,导致程序崩溃。

    # 清理内存
    backend.clear_session()

以上就完成了使用Keras对股票进行训练的过程,完整代码见文末。后续将继续介绍使用训练得到的模型做预测以及进行量化回测的过程。

import tensorflow as tf
import numpy as np
import pandas as pd
import os
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental.preprocessing import Normalization
from tensorflow.keras.layers.experimental.preprocessing import CategoryEncoding
from tensorflow.keras.layers.experimental.preprocessing import StringLookup
from tensorflow.keras import backend

FEATURE_N = 10

# 浮点型特征数据正则化
def encode_numerical_feature(feature, name, dataset):
    # Create a Normalization layer for our feature
    normalizer = Normalization()
    # Prepare a Dataset that only yields our feature
    feature_ds = dataset.map(lambda x, y: x[name])
    feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1))
    # Learn the statistics of the data
    normalizer.adapt(feature_ds)
    # Normalize the input feature
    encoded_feature = normalizer(feature)
    return encoded_feature

# 预处理,将n行数据作为输入特征
def data_preprocessing(df, stk_code, n):
    df = df.copy()
    # 删除无效数据列,保留特征数据
    ft_df = df.drop(columns = ['date', 'buy'])
    # 返回值
    out_df = pd.DataFrame()
    # 生成新特征数据
    for i in range(n, df.shape[0]):
        # 取n行数据
        part_df = ft_df.iloc[i - n : i]
        # 将n行合并为一行
        new_ft_df = pd.DataFrame(part_df.values.reshape(1, -1))
        out_df = out_df.append(new_ft_df)
    out_df['target'] = df.iloc[n:df.shape[0]]['buy'].values
    out_df = out_df.reset_index(drop = True)
    out_df.to_csv('./baostock/data_pre/{}.csv'.format(stk_code), index = False)
    return out_df

def dataframe_to_dataset(dataframe):
    dataframe = dataframe.copy()
    labels = dataframe.pop("target")
    ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
    ds = ds.shuffle(buffer_size=len(dataframe))
    return ds

stk_code_file = './stk_data/dp_stock_list.csv'
stk_list = pd.read_csv(stk_code_file)['code'].tolist()
start_tag = False
for stk_code in stk_list:
    print('processing {}...'.format(stk_code))
    # 判断是否已经经过预处理(文件是否存在)
    data_file = './baostock/data_pre/{}.csv'.format(stk_code)
    if os.path.exists(data_file):
        df = pd.read_csv(data_file)
    else:
        df = pd.read_csv('./baostock/data_ext/{}.csv'.format(stk_code))
        df = df[df['date'] <= '2017-12-31']
        df = data_preprocessing(df, stk_code, FEATURE_N)
    # 特征维度
    ft_num = df.shape[1] - 1
    # 分割训练集与验证集
    val_df = df.sample(frac=0.2, random_state=1337)
    train_df = df.drop(val_df.index)
    print(
        "Using %d samples for training and %d for validation"
        % (len(train_df), len(val_df))
    )
    # 生成tf.data.Dataset
    train_ds = dataframe_to_dataset(train_df)
    val_ds = dataframe_to_dataset(val_df)
    # 打包
    train_ds = train_ds.batch(32)
    val_ds = val_ds.batch(32)
    # 特征处理
    all_inputs = []
    ft_list = []
    for i in range(ft_num):
        name = '{}'.format(i)
        ki = keras.Input(shape = (1, ), name = name)
        all_inputs.append(ki)
        ft_list.append(encode_numerical_feature(ki, name, train_ds))
    # 创建模型
    all_features = layers.concatenate(ft_list)
    x = layers.Dense(128, activation="relu")(all_features)
    x = layers.Dense(32, activation="relu")(x)
    x = layers.Dropout(0.5)(x)
    output = layers.Dense(1, activation="sigmoid")(x)
    model = keras.Model(all_inputs, output)
    model.compile("adam", "binary_crossentropy", metrics=["accuracy"])
    # 训练模型
    model.fit(train_ds, epochs=50, validation_data=val_ds)
    model.save('./model/{}'.format(stk_code))
    # 清理内存
    backend.clear_session()

欢迎大家关注、点赞、转发、留言,感谢支持!
为了便于相互交流学习,已建微信群,感兴趣的读者请加微信。

在这里插入图片描述

おすすめ

転載: blog.csdn.net/m0_46603114/article/details/109153550