Apple Silicon M1 machine learning performance simple test

Apple officially supports tensorflow so that machines with M1 chips can use hardware acceleration. This article will use the Macbook Air M1, 2015 Macbook Pro 13" and the CoLab platform GPU and TPU provided by Google for testing and comparison.

testing method

Use the Benchmark: CNN script provided by Willian-Zhang in Issues of the tensorflow_macos project to test the above platforms separately, and calculate the total time used to run the script. The script can be obtained on github or found at the end of this article.

Before running the script, you need to configure the conda environment and tensorflow configuration for macOS installation. For specific methods, please refer to the article:
"macOS M1 (Apple Silicon) Installation and Configuration Conda Environment"

"MacOS M1 (AppleSilicon) install TensorFlow environment"

"MacOS Intel (x86_64) install TensorFlow environment"

Test Results

The results of the tests on the 4 platforms are as follows: As
Test Results
can be seen from the above results, the Macbook Pro13 2015 takes more than 50 minutes to complete the script, while the M1 Macbook Air only took about 2 and a half minutes, although it is not as good as the Colab GPU. 84 seconds, but considering the instability of Colab and regional usage restrictions, in general, using AppleSIlicon's Mac for simple machine learning may be a good choice.

Appendix 1-Test Script

import tensorflow.compat.v2 as tf
import tensorflow_datasets as tfds
import time
from datetime import timedelta


tf.enable_v2_behavior()


from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()

# 使用Colab或者其他平台,应注释一下这行
# 非AppleSilicon的Mac只要配置了Apple提供的tensorflow,可以正常运行不需要作修改
from tensorflow.python.compiler.mlcompute import mlcompute
mlcompute.set_mlc_device(device_name='gpu')




(ds_train, ds_test), ds_info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)


def normalize_img(image, label):
"""Normalizes images: `uint8` -> `float32`."""
return tf.cast(image, tf.float32) / 255., label


batch_size = 128


ds_train = ds_train.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(batch_size)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)




ds_test = ds_test.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(batch_size)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)




model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(32, kernel_size=(3, 3),
activation='relu'),
tf.keras.layers.Conv2D(64, kernel_size=(3, 3),
activation='relu'),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
# tf.keras.layers.Dropout(0.25),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
# tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(
loss='sparse_categorical_crossentropy',
optimizer=tf.keras.optimizers.Adam(0.001),
metrics=['accuracy'],
)


start = time.time()


model.fit(
ds_train,
epochs=10,
# validation_steps=1, 
# steps_per_epoch=469,
# validation_data=ds_test # 此处如果按原脚本添加这行,脚本无法运行,暂时未有解决方法
)


delta = (time.time() - start)
elapsed = str(timedelta(seconds=delta))
print('Elapsed Time: {}'.format(elapsed))

Appendix 2-Test Results

# Macbook Air 2020 M1

2021-02-16 14:46:25.367705: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-02-16 14:46:25.368515: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
Train on None steps
Epoch 1/10
2021-02-16 14:46:25.721520: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.7121 - accuracy: 0.8496
Epoch 2/10
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1144 - accuracy: 0.9731
Epoch 3/10
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0732 - accuracy: 0.9820
Epoch 4/10
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0546 - accuracy: 0.9858
Epoch 5/10
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0452 - accuracy: 0.9889
Epoch 6/10
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0371 - accuracy: 0.9907
Epoch 7/10
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0325 - accuracy: 0.9918
Epoch 8/10
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0272 - accuracy: 0.9937
Epoch 9/10
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0243 - accuracy: 0.9941
Epoch 10/10
469/469 [==============================] - 20s 40ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0209 - accuracy: 0.9957
Elapsed Time: 0:03:23.720809
# Macbook Pro 13 2015
2021-02-12 12:58:05.524324: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-12 12:58:05.524539: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-12 12:58:05.675857: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
Train on None steps
Epoch 1/10
2021-02-12 12:58:06.441954: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
469/469 [==============================] - 310s 652ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1541 - accuracy: 0.9539
Epoch 2/10
469/469 [==============================] - 309s 649ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0400 - accuracy: 0.9878
Epoch 3/10
469/469 [==============================] - 312s 656ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0252 - accuracy: 0.9920
Epoch 4/10
469/469 [==============================] - 312s 655ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0183 - accuracy: 0.9938
Epoch 5/10
469/469 [==============================] - 306s 642ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0127 - accuracy: 0.9959
Epoch 6/10
469/469 [==============================] - 306s 642ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0091 - accuracy: 0.9971
Epoch 7/10
469/469 [==============================] - 306s 642ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0083 - accuracy: 0.9973
Epoch 8/10
469/469 [==============================] - 311s 655ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0050 - accuracy: 0.9982
Epoch 9/10
469/469 [==============================] - 312s 655ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0046 - accuracy: 0.9986
Epoch 10/10
469/469 [==============================] - 313s 658ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0050 - accuracy: 0.9983
Elapsed Time: 0:51:38.627279
Colab GPU
Train on 469 steps
Epoch 1/10
469/469 [==============================] - 14s 7ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1589 - accuracy: 0.9530
Epoch 2/10
469/469 [==============================] - 8s 6ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0436 - accuracy: 0.9868
Epoch 3/10
469/469 [==============================] - 8s 6ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0266 - accuracy: 0.9917
Epoch 4/10
469/469 [==============================] - 8s 6ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0192 - accuracy: 0.9938
Epoch 5/10
469/469 [==============================] - 8s 6ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0131 - accuracy: 0.9959
Epoch 6/10
469/469 [==============================] - 8s 6ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0109 - accuracy: 0.9964
Epoch 7/10
469/469 [==============================] - 8s 6ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0065 - accuracy: 0.9980
Epoch 8/10
469/469 [==============================] - 8s 6ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0067 - accuracy: 0.9980
Epoch 9/10
469/469 [==============================] - 8s 6ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0060 - accuracy: 0.9981
Epoch 10/10
469/469 [==============================] - 8s 6ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9978
Elapsed Time: 0:01:24.286260
# Colab TPU
Train on 469 steps
Epoch 1/10
469/469 [==============================] - 136s 275ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1504 - accuracy: 0.9553
Epoch 2/10
469/469 [==============================] - 135s 276ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0412 - accuracy: 0.9874
Epoch 3/10
469/469 [==============================] - 134s 274ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0272 - accuracy: 0.9915
Epoch 4/10
469/469 [==============================] - 135s 275ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0177 - accuracy: 0.9944
Epoch 5/10
469/469 [==============================] - 134s 274ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0127 - accuracy: 0.9957
Epoch 6/10
469/469 [==============================] - 134s 273ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0096 - accuracy: 0.9967
Epoch 7/10
469/469 [==============================] - 134s 273ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0070 - accuracy: 0.9977
Epoch 8/10
469/469 [==============================] - 133s 272ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0062 - accuracy: 0.9981
Epoch 9/10
469/469 [==============================] - 133s 272ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9980
Epoch 10/10
469/469 [==============================] - 133s 271ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0045 - accuracy: 0.9986
Elapsed Time: 0:22:22.303190

Guess you like

Origin blog.csdn.net/weixin_37272286/article/details/113825532
Recommended