Machine Learning Notes - A way to estimate the sample size needed for model training

1. Brief introduction

        High quality and sufficient data are fundamental to developing any machine learning model. In the absence of an ex-ante estimate of the optimal amount of data needed to model a particular system, data collection ends up producing either too little to allow effective training, or so much that resources are wasted.

        I am often asked how many images are needed for this scene. Usually I will give an estimated range according to the complexity of the scene, but it is difficult to directly answer the question of how much data is needed, because the amount of data required depends on the problem. The complexity of , also depends on the complexity of the chosen algorithm.

        But in many practical scenarios, the amount of image data available to train deep learning models is very limited. If we can estimate the required sample size relatively accurately, it will save a lot of manpower and material costs.

        In fact, many researchers have proposed many estimation methods for estimating the number of images required to achieve the best model performance. Here we understand one of them.

2. Balanced subsampling

        A balanced subsampling scheme is used here to determine the optimal sample size for our model. This is done by selecting a random subsample consisting of Y images and using that subsample to train the model. The model is then evaluated on an independent test set. The process is repeated N times for each subsample, with replacement, to construct the mean and confidence interval for the observed performance.

1. Import package

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds
from tensorflow.keras import layers

# Define seed and fixed variables
seed = 42
tf.random.set_seed(seed)
np.random.seed(seed)
AUTO = tf.data.AUTOTUNE

2. Load the dataset

# Spe

おすすめ

転載: blog.csdn.net/bashendixie5/article/details/131181148