Tool Series: TensorFlow Decision Forest_(8) Combined Decision Forest and Neural Network Model

introduce

Welcome to the model combination tutorial for TensorFlow Decision Forests (TF-DF) . This tutorial will show you how to combine multiple decision forest and neural network models together using common preprocessing layers and the Keras functional API .

You might want to combine models together to improve predictive performance (ensemble), to get the best results from different modeling techniques (heterogeneous model ensemble), to train different parts of the model on different datasets (e.g. pretraining), or Create stacked models (e.g., one model operates on the predictions of another model).

This tutorial covers advanced use cases for model composition using the functional API. You can find examples of simpler model combination scenarios in the "Feature Preprocessing" section of this tutorial and in the "Using Pretrained Text Embeddings" section of this tutorial.

Here is the structure of the model you will build:

# 安装graphviz库
!pip install graphviz -U --quiet

# 导入graphviz库中的Source类
from graphviz import Source

# 创建一个Source对象,传入一个字符串表示的dot语言图形描述
Source("""
digraph G {
  raw_data [label="Input features"];  # 创建一个节点,表示原始数据
  preprocess_data [label="Learnable NN pre-processing", shape=rect];  # 创建一个节点,表示可学习的神经网络预处理

  raw_data -> preprocess_data  # 原始数据指向神经网络预处理节点

  subgraph cluster_0 {
    color=grey;
    a1[label="NN layer", shape=rect];  # 创建一个节点,表示神经网络层
    b1[label="NN layer", shape=rect];  # 创建一个节点,表示神经网络层
    a1 -> b1;  # 神经网络层之间的连接
	label = "Model #1";  # 设置子图的标签为"Model #1"
  }

   subgraph cluster_1 {
    color=grey;
    a2[label="NN layer", shape=rect];  # 创建一个节点,表示神经网络层
    b2[label="NN layer", shape=rect];  # 创建一个节点,表示神经网络层
    a2 -> b2;  # 神经网络层之间的连接
	label = "Model #2";  # 设置子图的标签为"Model #2"
  }

  subgraph cluster_2 {
    color=grey;
    a3[label="Decision Forest", shape=rect];  # 创建一个节点,表示决策森林
	label = "Model #3";  # 设置子图的标签为"Model #3"
  }

  subgraph cluster_3 {
    color=grey;
    a4[label="Decision Forest", shape=rect];  # 创建一个节点,表示决策森林
	label = "Model #4";  # 设置子图的标签为"Model #4"
  }

  preprocess_data -> a1;  # 神经网络预处理节点指向神经网络层节点
  preprocess_data -> a2;  # 神经网络预处理节点指向神经网络层节点
  preprocess_data -> a3;  # 神经网络预处理节点指向决策森林节点
  preprocess_data -> a4;  # 神经网络预处理节点指向决策森林节点

  b1  -> aggr;  # 神经网络层节点指向聚合节点
  b2  -> aggr;  # 神经网络层节点指向聚合节点
  a3 -> aggr;  # 决策森林节点指向聚合节点
  a4 -> aggr;  # 决策森林节点指向聚合节点

  aggr [label="Aggregation (mean)", shape=rect]  # 创建一个节点,表示聚合操作(平均值)
  aggr -> predictions  # 聚合节点指向预测结果节点
}
""")

Insert image description here

Your combined model has three stages:

  1. The first stage is a preprocessing layer, consisting of neural networks, common to all models in the next stage. In practice, such a preprocessing layer can be a pretrained embedding layer for fine-tuning, or a randomly initialized neural network.
  2. The second stage is an ensemble of two decision forests and two neural network models.
  3. The final stage is to average the predictions from the second stage model. It does not contain any learnable weights.

Neural networks are trained using the backpropagation algorithm and gradient descent. This algorithm has two important properties: (1) a neural network layer can be trained if it receives a loss gradient (more precisely, a loss gradient calculated from the output of that layer); (2) the algorithm will The loss gradient "passes" from the output of the layer to the input of the layer (this is the "chain rule"). For these two reasons, backpropagation can train multiple layers of neural networks stacked on top of each other simultaneously.

In this example, the decision forest is trained using the Random Forest (RF) algorithm. Unlike backpropagation, training of RF does not transfer the loss gradient from its output to its input. Therefore, traditional RF algorithms cannot be used to train or fine-tune neural networks. In other words, the "decision forest" stage cannot be used to train the "learnable NN preprocessing block".

  1. Training preprocessing and neural network stages.
  2. Training decision forest stage.

Install TensorFlow Decision Forests

Install TF-DF by running the following cells.

!pip install tensorflow_decision_forests -U --quiet

Wurlitzer is required to display detailed training logs in Colabs (when used in the model constructor verbose=2).

# 安装wurlitzer库,用于在Jupyter Notebook中显示命令行输出信息
!pip install wurlitzer -U --quiet

Import library

# 导入所需的库

# 导入tensorflow_decision_forests库
import tensorflow_decision_forests as tfdf

# 导入其他库
import os
import numpy as np
import pandas as pd
import tensorflow as tf
import math
import matplotlib.pyplot as plt

data set

In this tutorial, you will use a simple synthetic dataset to make the final model easier to interpret.

# 定义函数make_dataset,用于生成数据集
# 参数:
#   - num_examples: 数据集中的样本数量
#   - num_features: 每个样本的特征数量
#   - seed: 随机种子,用于生成随机数
# 返回值:
#   - features: 生成的特征矩阵,形状为(num_examples, num_features)
#   - labels: 生成的标签矩阵,形状为(num_examples,)

def make_dataset(num_examples, num_features, seed=1234):
    # 设置随机种子
    np.random.seed(seed)
    
    # 生成特征矩阵,形状为(num_examples, num_features)
    features = np.random.uniform(-1, 1, size=(num_examples, num_features))
    
    # 生成噪声矩阵,形状为(num_examples,)
    noise = np.random.uniform(size=(num_examples))
    
    # 计算左侧部分
    left_side = np.sqrt(
        np.sum(np.multiply(np.square(features[:, 0:2]), [1, 2]), axis=1))
    
    # 计算右侧部分
    right_side = features[:, 2] * 0.7 + np.sin(
        features[:, 3] * 10) * 0.5 + noise * 0.0 + 0.5
    
    # 根据左侧和右侧的大小关系,生成标签矩阵
    labels = left_side <= right_side
    
    # 将标签矩阵转换为整数类型,并返回特征矩阵和标签矩阵
    return features, labels.astype(int)

Generate some examples:

make_dataset(num_examples=5, num_features=4)
(array([[-0.6169611 ,  0.24421754, -0.12454452,  0.57071717],
        [ 0.55995162, -0.45481479, -0.44707149,  0.60374436],
        [ 0.91627871,  0.75186527, -0.28436546,  0.00199025],
        [ 0.36692587,  0.42540405, -0.25949849,  0.12239237],
        [ 0.00616633, -0.9724631 ,  0.54565324,  0.76528238]]),
 array([0, 0, 0, 1, 0]))

You can also plot them to get an idea of ​​the composition pattern:

# 生成数据集
plot_features, plot_label = make_dataset(num_examples=50000, num_features=4)

# 设置图形大小
plt.rcParams["figure.figsize"] = [8, 8]

# 设置散点图的公共参数
common_args = dict(c=plot_label, s=1.0, alpha=0.5)

# 创建子图1,并绘制散点图
plt.subplot(2, 2, 1)
plt.scatter(plot_features[:, 0], plot_features[:, 1], **common_args)

# 创建子图2,并绘制散点图
plt.subplot(2, 2, 2)
plt.scatter(plot_features[:, 1], plot_features[:, 2], **common_args)

# 创建子图3,并绘制散点图
plt.subplot(2, 2, 3)
plt.scatter(plot_features[:, 0], plot_features[:, 2], **common_args)

# 创建子图4,并绘制散点图
plt.subplot(2, 2, 4)
plt.scatter(plot_features[:, 0], plot_features[:, 3], **common_args)
<matplotlib.collections.PathCollection at 0x7fad984548e0>

Note that this mode is smooth and not axis-aligned. This will benefit neural network models. This is because it is easier for neural networks to have circular and non-aligned decision boundaries than for decision trees.

On the other hand, we will train the model on a small dataset of 2500 examples. This will benefit decision forest models. This is because decision forests are more efficient, taking advantage of all available sample information (decision forests are "sample efficient").

Our neural network and decision forest ensemble will have the best of both worlds.

Let's create one for training and testing tf.data.Dataset:

# 定义函数make_tf_dataset,参数为batch_size和其他参数
def make_tf_dataset(batch_size=64, **args):
  # 调用make_dataset函数,返回features和labels
  features, labels = make_dataset(**args)
  # 使用tf.data.Dataset.from_tensor_slices将features和labels转换为Dataset类型,并按batch_size划分batch
  return tf.data.Dataset.from_tensor_slices(
      (features, labels)).batch(batch_size)

# 定义变量num_features为10

# 调用make_tf_dataset函数,生成训练集train_dataset,包含2500个样本,每个样本包含num_features个特征,每个batch包含100个样本,随机数种子为1234
train_dataset = make_tf_dataset(
    num_examples=2500, num_features=num_features, batch_size=100, seed=1234)

# 调用make_tf_dataset函数,生成测试集test_dataset,包含10000个样本,每个样本包含num_features个特征,每个batch包含100个样本,随机数种子为5678
test_dataset = make_tf_dataset(
    num_examples=10000, num_features=num_features, batch_size=100, seed=5678)

Model structure

Define the model structure as follows:

# 输入特征
raw_features = tf.keras.layers.Input(shape=(num_features,))

# 阶段1
# =======

# 公共可学习的预处理
preprocessor = tf.keras.layers.Dense(10, activation=tf.nn.relu6)
preprocess_features = preprocessor(raw_features)

# 阶段2
# =======

# 模型1:神经网络
m1_z1 = tf.keras.layers.Dense(5, activation=tf.nn.relu6)(preprocess_features)
m1_pred = tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)(m1_z1)

# 模型2:神经网络
m2_z1 = tf.keras.layers.Dense(5, activation=tf.nn.relu6)(preprocess_features)
m2_pred = tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)(m2_z1)


# 模型3:决策树随机森林
model_3 = tfdf.keras.RandomForestModel(num_trees=1000, random_seed=1234)
m3_pred = model_3(preprocess_features)

# 模型4:决策树随机森林
model_4 = tfdf.keras.RandomForestModel(
    num_trees=1000,
    #split_axis="SPARSE_OBLIQUE", # 取消注释此行以提高该模型的质量
    random_seed=4567)
m4_pred = model_4(preprocess_features)

# 由于TF-DF使用确定性学习算法,您应该将模型的训练种子设置为不同的值,否则两个`tfdf.keras.RandomForestModel`将完全相同。

# 阶段3
# =======

mean_nn_only = tf.reduce_mean(tf.stack([m1_pred, m2_pred], axis=0), axis=0)
mean_nn_and_df = tf.reduce_mean(
    tf.stack([m1_pred, m2_pred, m3_pred, m4_pred], axis=0), axis=0)

# Keras模型
# ============

ensemble_nn_only = tf.keras.models.Model(raw_features, mean_nn_only)
ensemble_nn_and_df = tf.keras.models.Model(raw_features, mean_nn_and_df)
Warning: The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.


WARNING:absl:The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.


Use /tmpfs/tmp/tmpeqn1u3t4 as temporary training directory
Warning: The model was called directly (i.e. using `model(data)` instead of using `model.predict(data)`) before being trained. The model will only return zeros until trained. The output shape might change after training Tensor("inputs:0", shape=(None, 10), dtype=float32)


WARNING:absl:The model was called directly (i.e. using `model(data)` instead of using `model.predict(data)`) before being trained. The model will only return zeros until trained. The output shape might change after training Tensor("inputs:0", shape=(None, 10), dtype=float32)


Warning: The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.


WARNING:absl:The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.


Use /tmpfs/tmp/tmpzrq7x74t as temporary training directory
Warning: The model was called directly (i.e. using `model(data)` instead of using `model.predict(data)`) before being trained. The model will only return zeros until trained. The output shape might change after training Tensor("inputs:0", shape=(None, 10), dtype=float32)


WARNING:absl:The model was called directly (i.e. using `model(data)` instead of using `model.predict(data)`) before being trained. The model will only return zeros until trained. The output shape might change after training Tensor("inputs:0", shape=(None, 10), dtype=float32)

Before training the model, you can plot it to check if it is similar to the initial plot.

# 导入plot_model函数
from keras.utils import plot_model

# 使用plot_model函数将模型ensemble_nn_and_df可视化,并保存为图片
# 参数to_file指定保存的文件路径为/tmp/model.png
# 参数show_shapes设置为True,表示在可视化图中显示每个层的输入输出形状
plot_model(ensemble_nn_and_df, to_file="/tmp/model.png", show_shapes=True)

Model training

The preprocessing and two neural network layers are first trained using the backpropagation algorithm.

%%time
# 编译模型
ensemble_nn_only.compile(
    optimizer=tf.keras.optimizers.Adam(),  # 使用Adam优化器来优化模型的参数
    loss=tf.keras.losses.BinaryCrossentropy(),  # 使用二元交叉熵作为损失函数
    metrics=["accuracy"]  # 使用准确率作为评估指标
)

# 训练模型
ensemble_nn_only.fit(
    train_dataset,  # 使用训练数据集进行训练
    epochs=20,  # 迭代20次
    validation_data=test_dataset  # 使用测试数据集进行验证
)
Epoch 1/20

 1/25 [>.............................] - ETA: 1:49 - loss: 0.5916 - accuracy: 0.7200
18/25 [====================>.........] - ETA: 0s - loss: 0.5695 - accuracy: 0.7556  
25/25 [==============================] - 5s 15ms/step - loss: 0.5691 - accuracy: 0.7500 - val_loss: 0.5662 - val_accuracy: 0.7392
Epoch 2/20

 1/25 [>.............................] - ETA: 0s - loss: 0.5743 - accuracy: 0.7200
19/25 [=====================>........] - ETA: 0s - loss: 0.5510 - accuracy: 0.7574
25/25 [==============================] - 0s 9ms/step - loss: 0.5542 - accuracy: 0.7500 - val_loss: 0.5554 - val_accuracy: 0.7392
Epoch 3/20

 1/25 [>.............................] - ETA: 0s - loss: 0.5623 - accuracy: 0.7200
19/25 [=====================>........] - ETA: 0s - loss: 0.5396 - accuracy: 0.7574
25/25 [==============================] - 0s 9ms/step - loss: 0.5434 - accuracy: 0.7500 - val_loss: 0.5467 - val_accuracy: 0.7392
Epoch 4/20

 1/25 [>.............................] - ETA: 0s - loss: 0.5525 - accuracy: 0.7200
17/25 [===================>..........] - ETA: 0s - loss: 0.5362 - accuracy: 0.7529
25/25 [==============================] - 0s 10ms/step - loss: 0.5342 - accuracy: 0.7500 - val_loss: 0.5384 - val_accuracy: 0.7392
Epoch 5/20

 1/25 [>.............................] - ETA: 0s - loss: 0.5433 - accuracy: 0.7200
18/25 [====================>.........] - ETA: 0s - loss: 0.5244 - accuracy: 0.7556
25/25 [==============================] - 0s 10ms/step - loss: 0.5250 - accuracy: 0.7500 - val_loss: 0.5298 - val_accuracy: 0.7392
Epoch 6/20

 1/25 [>.............................] - ETA: 0s - loss: 0.5338 - accuracy: 0.7200
18/25 [====================>.........] - ETA: 0s - loss: 0.5152 - accuracy: 0.7556
25/25 [==============================] - 0s 10ms/step - loss: 0.5154 - accuracy: 0.7500 - val_loss: 0.5205 - val_accuracy: 0.7392
Epoch 7/20

 1/25 [>.............................] - ETA: 0s - loss: 0.5241 - accuracy: 0.7200
19/25 [=====================>........] - ETA: 0s - loss: 0.5023 - accuracy: 0.7574
25/25 [==============================] - 0s 10ms/step - loss: 0.5053 - accuracy: 0.7500 - val_loss: 0.5107 - val_accuracy: 0.7392
Epoch 8/20

 1/25 [>.............................] - ETA: 0s - loss: 0.5137 - accuracy: 0.7200
19/25 [=====================>........] - ETA: 0s - loss: 0.4921 - accuracy: 0.7574
25/25 [==============================] - 0s 10ms/step - loss: 0.4947 - accuracy: 0.7500 - val_loss: 0.5007 - val_accuracy: 0.7392
Epoch 9/20

 1/25 [>.............................] - ETA: 0s - loss: 0.5029 - accuracy: 0.7200
18/25 [====================>.........] - ETA: 0s - loss: 0.4854 - accuracy: 0.7556
25/25 [==============================] - 0s 10ms/step - loss: 0.4841 - accuracy: 0.7500 - val_loss: 0.4909 - val_accuracy: 0.7392
Epoch 10/20

 1/25 [>.............................] - ETA: 0s - loss: 0.4916 - accuracy: 0.7200
19/25 [=====================>........] - ETA: 0s - loss: 0.4717 - accuracy: 0.7574
25/25 [==============================] - 0s 10ms/step - loss: 0.4738 - accuracy: 0.7500 - val_loss: 0.4815 - val_accuracy: 0.7392
Epoch 11/20

 1/25 [>.............................] - ETA: 0s - loss: 0.4799 - accuracy: 0.7200
19/25 [=====================>........] - ETA: 0s - loss: 0.4618 - accuracy: 0.7574
25/25 [==============================] - 0s 9ms/step - loss: 0.4637 - accuracy: 0.7500 - val_loss: 0.4724 - val_accuracy: 0.7392
Epoch 12/20

 1/25 [>.............................] - ETA: 0s - loss: 0.4680 - accuracy: 0.7200
19/25 [=====================>........] - ETA: 0s - loss: 0.4522 - accuracy: 0.7574
25/25 [==============================] - 0s 9ms/step - loss: 0.4541 - accuracy: 0.7500 - val_loss: 0.4639 - val_accuracy: 0.7392
Epoch 13/20

 1/25 [>.............................] - ETA: 0s - loss: 0.4559 - accuracy: 0.7200
18/25 [====================>.........] - ETA: 0s - loss: 0.4473 - accuracy: 0.7556
25/25 [==============================] - 0s 9ms/step - loss: 0.4453 - accuracy: 0.7500 - val_loss: 0.4561 - val_accuracy: 0.7392
Epoch 14/20

 1/25 [>.............................] - ETA: 0s - loss: 0.4441 - accuracy: 0.7200
18/25 [====================>.........] - ETA: 0s - loss: 0.4392 - accuracy: 0.7556
25/25 [==============================] - 0s 9ms/step - loss: 0.4373 - accuracy: 0.7500 - val_loss: 0.4491 - val_accuracy: 0.7398
Epoch 15/20

 1/25 [>.............................] - ETA: 0s - loss: 0.4332 - accuracy: 0.7300
19/25 [=====================>........] - ETA: 0s - loss: 0.4280 - accuracy: 0.7621
25/25 [==============================] - 0s 10ms/step - loss: 0.4300 - accuracy: 0.7552 - val_loss: 0.4426 - val_accuracy: 0.7439
Epoch 16/20

 1/25 [>.............................] - ETA: 0s - loss: 0.4227 - accuracy: 0.7300
18/25 [====================>.........] - ETA: 0s - loss: 0.4252 - accuracy: 0.7667
25/25 [==============================] - 0s 10ms/step - loss: 0.4234 - accuracy: 0.7624 - val_loss: 0.4366 - val_accuracy: 0.7508
Epoch 17/20

 1/25 [>.............................] - ETA: 0s - loss: 0.4132 - accuracy: 0.7400
19/25 [=====================>........] - ETA: 0s - loss: 0.4153 - accuracy: 0.7753
25/25 [==============================] - 0s 9ms/step - loss: 0.4173 - accuracy: 0.7692 - val_loss: 0.4310 - val_accuracy: 0.7608
Epoch 18/20

 1/25 [>.............................] - ETA: 0s - loss: 0.4047 - accuracy: 0.7500
19/25 [=====================>........] - ETA: 0s - loss: 0.4095 - accuracy: 0.7800
25/25 [==============================] - 0s 9ms/step - loss: 0.4115 - accuracy: 0.7764 - val_loss: 0.4255 - val_accuracy: 0.7752
Epoch 19/20

 1/25 [>.............................] - ETA: 0s - loss: 0.3966 - accuracy: 0.7600
18/25 [====================>.........] - ETA: 0s - loss: 0.4076 - accuracy: 0.7922
25/25 [==============================] - 0s 10ms/step - loss: 0.4059 - accuracy: 0.7880 - val_loss: 0.4201 - val_accuracy: 0.7847
Epoch 20/20

 1/25 [>.............................] - ETA: 0s - loss: 0.3887 - accuracy: 0.7900
19/25 [=====================>........] - ETA: 0s - loss: 0.3981 - accuracy: 0.8053
25/25 [==============================] - 0s 9ms/step - loss: 0.4003 - accuracy: 0.7988 - val_loss: 0.4148 - val_accuracy: 0.7913
CPU times: user 8.67 s, sys: 1.46 s, total: 10.1 s
Wall time: 9.49 s





<keras.src.callbacks.History at 0x7fac640c79a0>

Let's evaluate this including only the preprocessing and two neural network parts:

# 评估神经网络模型(仅使用NN #1和NN #2)
evaluation_nn_only = ensemble_nn_only.evaluate(test_dataset, return_dict=True)

# 打印准确率(仅使用NN #1和NN #2)
print("Accuracy (NN #1 and #2 only): ", evaluation_nn_only["accuracy"])

# 打印损失值(仅使用NN #1和NN #2)
print("Loss (NN #1 and #2 only): ", evaluation_nn_only["loss"])
  1/100 [..............................] - ETA: 0s - loss: 0.3536 - accuracy: 0.8400
 30/100 [========>.....................] - ETA: 0s - loss: 0.4103 - accuracy: 0.7967
 59/100 [================>.............] - ETA: 0s - loss: 0.4093 - accuracy: 0.7920
 88/100 [=========================>....] - ETA: 0s - loss: 0.4119 - accuracy: 0.7917
100/100 [==============================] - 0s 2ms/step - loss: 0.4148 - accuracy: 0.7913
Accuracy (NN #1 and #2 only):  0.7912999987602234
Loss (NN #1 and #2 only):  0.4147580564022064

Let us train the two decision forest components in sequence.

# 对训练数据集进行预处理
# 使用map函数对train_dataset中的每个样本进行预处理,preprocessor函数用于对样本进行处理
# 返回的结果是一个新的数据集train_dataset_with_preprocessing,其中每个样本都经过了预处理
train_dataset_with_preprocessing = train_dataset.map(lambda x,y: (preprocessor(x), y))

# 对测试数据集进行预处理
# 使用map函数对test_dataset中的每个样本进行预处理,preprocessor函数用于对样本进行处理
# 返回的结果是一个新的数据集test_dataset_with_preprocessing,其中每个样本都经过了预处理
test_dataset_with_preprocessing = test_dataset.map(lambda x,y: (preprocessor(x), y))

# 使用model_3对预处理后的训练数据集进行训练
model_3.fit(train_dataset_with_preprocessing)

# 使用model_4对预处理后的训练数据集进行训练
model_4.fit(train_dataset_with_preprocessing)
WARNING:tensorflow:AutoGraph could not transform <function <lambda> at 0x7fad5d4b6700> and will run it as-is.
Cause: could not parse the source code of <function <lambda> at 0x7fad5d4b6700>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert


WARNING:tensorflow:AutoGraph could not transform <function <lambda> at 0x7fad5d4b6700> and will run it as-is.
Cause: could not parse the source code of <function <lambda> at 0x7fad5d4b6700>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert


WARNING: AutoGraph could not transform <function <lambda> at 0x7fad5d4b6700> and will run it as-is.
Cause: could not parse the source code of <function <lambda> at 0x7fad5d4b6700>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function <lambda> at 0x7facb40f80d0> and will run it as-is.
Cause: could not parse the source code of <function <lambda> at 0x7facb40f80d0>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert


WARNING:tensorflow:AutoGraph could not transform <function <lambda> at 0x7facb40f80d0> and will run it as-is.
Cause: could not parse the source code of <function <lambda> at 0x7facb40f80d0>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert


WARNING: AutoGraph could not transform <function <lambda> at 0x7facb40f80d0> and will run it as-is.
Cause: could not parse the source code of <function <lambda> at 0x7facb40f80d0>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
Reading training dataset...
Training dataset read in 0:00:03.527053. Found 2500 examples.
Training model...


[INFO 23-07-10 11:10:25.0183 UTC kernel.cc:1243] Loading model from path /tmpfs/tmp/tmpeqn1u3t4/model/ with prefix 03256340d0ca40b0


Model trained in 0:00:01.894803
Compiling model...


[INFO 23-07-10 11:10:25.9915 UTC decision_forest.cc:660] Model loaded with 1000 root(s), 314626 node(s), and 10 input feature(s).
[INFO 23-07-10 11:10:25.9915 UTC abstract_model.cc:1311] Engine "RandomForestOptPred" built
[INFO 23-07-10 11:10:25.9916 UTC kernel.cc:1075] Use fast generic engine


WARNING:tensorflow:AutoGraph could not transform <function simple_ml_inference_op_with_handle at 0x7fac685de700> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert


WARNING:tensorflow:AutoGraph could not transform <function simple_ml_inference_op_with_handle at 0x7fac685de700> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert


WARNING: AutoGraph could not transform <function simple_ml_inference_op_with_handle at 0x7fac685de700> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
Model compiled.
Reading training dataset...
Training dataset read in 0:00:00.210194. Found 2500 examples.
Training model...


[INFO 23-07-10 11:10:28.3455 UTC kernel.cc:1243] Loading model from path /tmpfs/tmp/tmpzrq7x74t/model/ with prefix a093792264d04fac


Model trained in 0:00:01.800354
Compiling model...


[INFO 23-07-10 11:10:29.2816 UTC decision_forest.cc:660] Model loaded with 1000 root(s), 316314 node(s), and 10 input feature(s).
[INFO 23-07-10 11:10:29.2816 UTC kernel.cc:1075] Use fast generic engine


Model compiled.
CPU times: user 20.1 s, sys: 1.49 s, total: 21.6 s
Wall time: 8.92 s





<keras.src.callbacks.History at 0x7fac5073e430>

Evaluate Decision Forest

Let's evaluate the decision forest one by one.

# 给模型添加评估指标
model_3.compile(["accuracy"])
model_4.compile(["accuracy"])

# 使用预处理后的测试数据对模型3进行评估,并返回评估结果的字典形式
evaluation_df3_only = model_3.evaluate(test_dataset_with_preprocessing, return_dict=True)

# 使用预处理后的测试数据对模型4进行评估,并返回评估结果的字典形式
evaluation_df4_only = model_4.evaluate(test_dataset_with_preprocessing, return_dict=True)

# 打印模型3的准确率评估结果
print("Accuracy (DF #3 only): ", evaluation_df3_only["accuracy"])

# 打印模型4的准确率评估结果
print("Accuracy (DF #4 only): ", evaluation_df4_only["accuracy"])
  1/100 [..............................] - ETA: 29s - loss: 0.0000e+00 - accuracy: 0.8600
  6/100 [>.............................] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8200 
 12/100 [==>...........................] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8300
 17/100 [====>.........................] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8218
 22/100 [=====>........................] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8173
 28/100 [=======>......................] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8129
 34/100 [=========>....................] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8124
 40/100 [===========>..................] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8138
 46/100 [============>.................] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8161
 52/100 [==============>...............] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8173
 58/100 [================>.............] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8178
 64/100 [==================>...........] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8156
 69/100 [===================>..........] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8165
 75/100 [=====================>........] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8175
 80/100 [=======================>......] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8166
 86/100 [========================>.....] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8166
 92/100 [==========================>...] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8153
 98/100 [============================>.] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8152
100/100 [==============================] - 1s 10ms/step - loss: 0.0000e+00 - accuracy: 0.8150

  1/100 [..............................] - ETA: 12s - loss: 0.0000e+00 - accuracy: 0.8500
  6/100 [>.............................] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8250 
 12/100 [==>...........................] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8325
 18/100 [====>.........................] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8228
 24/100 [======>.......................] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8158
 30/100 [========>.....................] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8127
 36/100 [=========>....................] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8122
 42/100 [===========>..................] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8148
 48/100 [=============>................] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8144
 54/100 [===============>..............] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8176
 60/100 [=================>............] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8153
 66/100 [==================>...........] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8150
 71/100 [====================>.........] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8169
 76/100 [=====================>........] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8176
 81/100 [=======================>......] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8167
 86/100 [========================>.....] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8162
 91/100 [==========================>...] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8149
 96/100 [===========================>..] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8147
100/100 [==============================] - 1s 10ms/step - loss: 0.0000e+00 - accuracy: 0.8149
Accuracy (DF #3 only):  0.8149999976158142
Accuracy (DF #4 only):  0.8148999810218811

Let's evaluate the entire model portfolio:

# 编译模型
ensemble_nn_and_df.compile(
    loss=tf.keras.losses.BinaryCrossentropy(), metrics=["accuracy"])

# 评估模型
evaluation_nn_and_df = ensemble_nn_and_df.evaluate(
    test_dataset, return_dict=True)

# 打印准确率和损失值
print("Accuracy (2xNN and 2xDF): ", evaluation_nn_and_df["accuracy"])
print("Loss (2xNN and 2xDF): ", evaluation_nn_and_df["loss"])
  1/100 [..............................] - ETA: 23s - loss: 0.3324 - accuracy: 0.8600
  6/100 [>.............................] - ETA: 0s - loss: 0.3850 - accuracy: 0.8267 
 12/100 [==>...........................] - ETA: 0s - loss: 0.3650 - accuracy: 0.8317
 18/100 [====>.........................] - ETA: 0s - loss: 0.3679 - accuracy: 0.8261
 24/100 [======>.......................] - ETA: 0s - loss: 0.3723 - accuracy: 0.8229
 30/100 [========>.....................] - ETA: 0s - loss: 0.3752 - accuracy: 0.8200
 35/100 [=========>....................] - ETA: 0s - loss: 0.3742 - accuracy: 0.8200
 40/100 [===========>..................] - ETA: 0s - loss: 0.3736 - accuracy: 0.8198
 46/100 [============>.................] - ETA: 0s - loss: 0.3723 - accuracy: 0.8207
 52/100 [==============>...............] - ETA: 0s - loss: 0.3716 - accuracy: 0.8213
 58/100 [================>.............] - ETA: 0s - loss: 0.3722 - accuracy: 0.8193
 64/100 [==================>...........] - ETA: 0s - loss: 0.3754 - accuracy: 0.8178
 70/100 [====================>.........] - ETA: 0s - loss: 0.3745 - accuracy: 0.8184
 76/100 [=====================>........] - ETA: 0s - loss: 0.3753 - accuracy: 0.8170
 82/100 [=======================>......] - ETA: 0s - loss: 0.3757 - accuracy: 0.8151
 88/100 [=========================>....] - ETA: 0s - loss: 0.3760 - accuracy: 0.8147
 94/100 [===========================>..] - ETA: 0s - loss: 0.3785 - accuracy: 0.8130
100/100 [==============================] - ETA: 0s - loss: 0.3795 - accuracy: 0.8133
100/100 [==============================] - 1s 10ms/step - loss: 0.3795 - accuracy: 0.8133
Accuracy (2xNN and 2xDF):  0.8133000135421753
Loss (2xNN and 2xDF):  0.37953513860702515

To get things done, let's fine-tune the neural network layers a little more. Note that we do not fine-tune the pretrained embeddings as the DF model relies on it (unless we also retrain them later).

To summarize, you have:

# 输出NN #1和#2的准确率
print(f"Accuracy (NN #1 and #2 only):\t{
      
      evaluation_nn_only['accuracy']:.6f}")
# 输出DF #3的准确率
print(f"Accuracy (DF #3 only):\t\t{
      
      evaluation_df3_only['accuracy']:.6f}")
# 输出DF #4的准确率
print(f"Accuracy (DF #4 only):\t\t{
      
      evaluation_df4_only['accuracy']:.6f}")
# 输出分割线
print("----------------------------------------")
# 输出2xNN和2xDF的准确率
print(f"Accuracy (2xNN and 2xDF):\t{
      
      evaluation_nn_and_df['accuracy']:.6f}")

# 定义一个函数,计算准确率的增长百分比
def delta_percent(src_eval, key):
  # 获取源准确率
  src_acc = src_eval["accuracy"]
  # 获取最终准确率
  final_acc = evaluation_nn_and_df["accuracy"]
  # 计算准确率的增长
  increase = final_acc - src_acc
  # 输出增长百分比
  print(f"\t\t\t\t  {
      
      increase:+.6f} over {
      
      key}")

# 分别计算NN #1和#2、DF #3、DF #4的准确率增长百分比
delta_percent(evaluation_nn_only, "NN #1 and #2 only")
delta_percent(evaluation_df3_only, "DF #3 only")
delta_percent(evaluation_df4_only, "DF #4 only")
Accuracy (NN #1 and #2 only):	0.791300
Accuracy (DF #3 only):		0.815000
Accuracy (DF #4 only):		0.814900
----------------------------------------
Accuracy (2xNN and 2xDF):	0.813300
				  +0.022000 over NN #1 and #2 only
				  -0.001700 over DF #3 only
				  -0.001600 over DF #4 only

Here you can see that the combined model outperforms its individual parts. This is why the integrated approach works so well.

What's next?

In this example, you saw how to combine a decision forest with a neural network. An additional step to further train neural networks and decision forests.

Furthermore, for clarity, the decision forest only receives preprocessed input. However, decision forests are often very good at consuming raw data. The model can be improved by feeding the original features to the decision forest model as well.

In this example, the final model is the average of the individual model predictions. This solution works well if all models perform similarly. However, if one of the sub-models is very good, aggregating it with the other models may actually be harmful (or vice versa; e.g. try reducing the number of examples to 1k and see how it severely affects the neural network; or in the second Enable "SPARSE_OBLIQUE" splitting in random forest model).

Guess you like

Origin blog.csdn.net/wjjc1017/article/details/135189684