Apple SiliconM1機械学習パフォーマンスの簡単なテスト

Appleは、M1チップを搭載したマシンがハードウェアアクセラレーションを使用できるように、テンソルフローを公式にサポートしています。この記事では、テストと比較のために、Macbook Air M1、2015 Macbook Pro 13 "、およびGoogleが提供するCoLabプラットフォームGPUとTPUを使用します。

試験方法

Tensorflow_macosプロジェクトのIssuesでWillian-Zhangによって提供されたBenchmark:CNNスクリプトを使用して、上記のプラットフォームを個別テストし、スクリプトの実行に使用された合計時間を計算します。スクリプトはgithubで入手するか、この記事の最後にあります。

スクリプトを実行する前に、macOSインストール用のconda環境とtensorflow構成を構成する必要があります。具体的な方法については、「macOS M1(Apple Silicon)のインストールと構成Conda環境」の記事を参照してください。

「MacOSM1(AppleSilicon)はTensorFlow環境をインストールします」

「MacOSIntel(x86_64)インストールTensorFlow環境」

試験結果

4つのプラットフォームでのテストの結果は次のとおりです。
試験結果
上の図の結果からわかるように、Macbook Pro13 2015はスクリプトを完了するのに50分以上かかりますが、M1 MacbookAirは約2と30分。ColabGPU84秒ほど良くはありませんが、Colabの不安定性と地域の使用制限を考慮すると、一般に、単純な機械学習にAppleSIliconのMacを使用することをお勧めします。

付録1-テストスクリプト

import tensorflow.compat.v2 as tf
import tensorflow_datasets as tfds
import time
from datetime import timedelta


tf.enable_v2_behavior()


from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()

# 使用Colab或者其他平台,应注释一下这行
# 非AppleSilicon的Mac只要配置了Apple提供的tensorflow,可以正常运行不需要作修改
from tensorflow.python.compiler.mlcompute import mlcompute
mlcompute.set_mlc_device(device_name='gpu')




(ds_train, ds_test), ds_info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)


def normalize_img(image, label):
"""Normalizes images: `uint8` -> `float32`."""
return tf.cast(image, tf.float32) / 255., label


batch_size = 128


ds_train = ds_train.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(batch_size)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)




ds_test = ds_test.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(batch_size)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)




model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(32, kernel_size=(3, 3),
activation='relu'),
tf.keras.layers.Conv2D(64, kernel_size=(3, 3),
activation='relu'),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
# tf.keras.layers.Dropout(0.25),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
# tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(
loss='sparse_categorical_crossentropy',
optimizer=tf.keras.optimizers.Adam(0.001),
metrics=['accuracy'],
)


start = time.time()


model.fit(
ds_train,
epochs=10,
# validation_steps=1, 
# steps_per_epoch=469,
# validation_data=ds_test # 此处如果按原脚本添加这行,脚本无法运行,暂时未有解决方法
)


delta = (time.time() - start)
elapsed = str(timedelta(seconds=delta))
print('Elapsed Time: {}'.format(elapsed))

付録2-テスト結果

# Macbook Air 2020 M1

2021-02-16 14:46:25.367705: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-02-16 14:46:25.368515: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
Train on None steps
Epoch 1/10
2021-02-16 14:46:25.721520: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.7121 - accuracy: 0.8496
Epoch 2/10
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1144 - accuracy: 0.9731
Epoch 3/10
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0732 - accuracy: 0.9820
Epoch 4/10
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0546 - accuracy: 0.9858
Epoch 5/10
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0452 - accuracy: 0.9889
Epoch 6/10
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0371 - accuracy: 0.9907
Epoch 7/10
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0325 - accuracy: 0.9918
Epoch 8/10
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0272 - accuracy: 0.9937
Epoch 9/10
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0243 - accuracy: 0.9941
Epoch 10/10
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0209 - accuracy: 0.9957
Elapsed Time: 0:03:23.720809
# Macbook Pro 13 2015
2021-02-12 12:58:05.524324: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-12 12:58:05.524539: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-12 12:58:05.675857: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
Train on None steps
Epoch 1/10
2021-02-12 12:58:06.441954: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
469/469 [==============================] - 310s 652ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1541 - accuracy: 0.9539
Epoch 2/10
469/469 [==============================] - 309s 649ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0400 - accuracy: 0.9878
Epoch 3/10
469/469 [==============================] - 312s 656ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0252 - accuracy: 0.9920
Epoch 4/10
469/469 [==============================] - 312s 655ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0183 - accuracy: 0.9938
Epoch 5/10
469/469 [==============================] - 306s 642ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0127 - accuracy: 0.9959
Epoch 6/10
469/469 [==============================] - 306s 642ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0091 - accuracy: 0.9971
Epoch 7/10
469/469 [==============================] - 306s 642ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0083 - accuracy: 0.9973
Epoch 8/10
469/469 [==============================] - 311s 655ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0050 - accuracy: 0.9982
Epoch 9/10
469/469 [==============================] - 312s 655ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0046 - accuracy: 0.9986
Epoch 10/10
469/469 [==============================] - 313s 658ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0050 - accuracy: 0.9983
Elapsed Time: 0:51:38.627279
Colab GPU
Train on 469 steps
Epoch 1/10
469/469 [==============================] - 14s 7ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1589 - accuracy: 0.9530
Epoch 2/10
469/469 [==============================] - 8s 6ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0436 - accuracy: 0.9868
Epoch 3/10
469/469 [==============================] - 8s 6ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0266 - accuracy: 0.9917
Epoch 4/10
469/469 [==============================] - 8s 6ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0192 - accuracy: 0.9938
Epoch 5/10
469/469 [==============================] - 8s 6ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0131 - accuracy: 0.9959
Epoch 6/10
469/469 [==============================] - 8s 6ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0109 - accuracy: 0.9964
Epoch 7/10
469/469 [==============================] - 8s 6ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0065 - accuracy: 0.9980
Epoch 8/10
469/469 [==============================] - 8s 6ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0067 - accuracy: 0.9980
Epoch 9/10
469/469 [==============================] - 8s 6ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0060 - accuracy: 0.9981
Epoch 10/10
469/469 [==============================] - 8s 6ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9978
Elapsed Time: 0:01:24.286260
# Colab TPU
Train on 469 steps
Epoch 1/10
469/469 [==============================] - 136s 275ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1504 - accuracy: 0.9553
Epoch 2/10
469/469 [==============================] - 135s 276ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0412 - accuracy: 0.9874
Epoch 3/10
469/469 [==============================] - 134s 274ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0272 - accuracy: 0.9915
Epoch 4/10
469/469 [==============================] - 135s 275ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0177 - accuracy: 0.9944
Epoch 5/10
469/469 [==============================] - 134s 274ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0127 - accuracy: 0.9957
Epoch 6/10
469/469 [==============================] - 134s 273ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0096 - accuracy: 0.9967
Epoch 7/10
469/469 [==============================] - 134s 273ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0070 - accuracy: 0.9977
Epoch 8/10
469/469 [==============================] - 133s 272ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0062 - accuracy: 0.9981
Epoch 9/10
469/469 [==============================] - 133s 272ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9980
Epoch 10/10
469/469 [==============================] - 133s 271ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0045 - accuracy: 0.9986
Elapsed Time: 0:22:22.303190

おすすめ

転載: blog.csdn.net/weixin_37272286/article/details/113825532