[tensorflow]: use la ventana acoplable para compilar tensorflow desde el código fuente. El entorno en la imagen está configurado. Simplemente ejecute la compilación, configure la memoria y los parámetros de la CPU.

prefacio


El enlace original de este artículo es:
https://blog.csdn.net/freewebsys/article/details/129331455

No se puede reproducir sin el permiso del blogger.
La dirección CSDN del bloguero es: https://blog.csdn.net/freewebsys
La dirección nugget del bloguero es: https://juejin.cn/user/585379920479288
La dirección Zhihu del bloguero es: https://www.zhihu.com/ gente/freewebsystem

1. Con respecto a gpt2, antes tardó 1 hora y 20 minutos en ejecutarse en el i7 de Intel, esta vez inténtelo en la CPU de AMD


https://yanghuaiyuan.blog.csdn.net/article/details/129327664

El resultado fue trágico. El modelo se entrenó una vez y se lanzó. El modelo predeterminado no parece admitir AMD CPU.

# time python demo.py 
2023-03-04 01:10:38.853476: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-04 01:10:40.558346: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
########### init start ###########
2023-03-04 01:10:48.376120: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:357] MLIR V1 optimization pass is not enabled
Loading checkpoint models/124M/model.ckpt
Loading dataset...
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.19s/it]
dataset has 338025 tokens
Training...
[1 | 35.05] loss=4.39 avg=4.39
Killed

real	1m16.797s
user	11m24.346s
sys	1m55.479s

Como resultado, salió directamente. Pero no importa, simplemente vuelva a compilar y, de paso, resuelva el problema de la CPU.

2. Usa la ventana acoplable para compilar TensorFlow


Consulte directamente la documentación china oficial de goolge:
https://tensorflow.google.cn/install/docker?hl=zh-cn

Se debe prestar especial atención aquí, con devel es la última compilación de código fuente de TensorFlow que solo admite paquetes de CPU.
No contiene una biblioteca de tensorflow, que es muy grande.

docker pull tensorflow/tensorflow:devel
docker pull tensorflow/tensorflow:devel-gpu
docker run --name tfbuild -itd -v $PWD:/mnt \
    -e HOST_PERMS="$(id -u):$(id -g)" tensorflow/tensorflow:devel

Luego inicie sesión en el contenedor para la compilación. Esto se puede hacer poco a poco de acuerdo con el manual de operación en el sitio web oficial.

docker exec -it tfbuild bash

#选择默认值就行。
#首先要切换到tags 的分支 2.11.0 ,默认是master是开发的版本,不是正式版本。
#切换分支代码,master是开发版本

cd /tensorflow_src/
# 大工程分支特别多,耐心等待。
git fetch --tags
git checkout v2.11.0

# git status
HEAD detached at v2.11.0

# 一顿默认配置就行。
./configure  
# 设置8G内存,太高会导致killed 
# https://github.com/tensorflow/tensorflow/issues/41480

# 这个是两个斜杠!!好像是个参数的问题。当前目录下有个文件:
# /tensorflow_src/tensorflow/tools/pip_package/build_pip_package.sh
# 
bazel build --config=mkl --local_ram_resources=8000  --local_cpu_resources=4 //tensorflow/tools/pip_package:build_pip_package

# create package
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /mnt/out  

chown $HOST_PERMS /tensorflow/out/tensorflow-version-tags.whl

De esta forma, se completa la optimización de la compilación para la cpu de AMD, y whl es el archivo binario. Luego colócalo directamente en el contenedor y síguelo.
Empezando a compilar, una larga espera.
inserte la descripción de la imagen aquí

Debido a que no hay una imagen de ubuntu en el sitio web oficial de python, solo use la imagen de tensorflow para desinstalarlo primero y luego instalarlo.

La compilación falla con el error:

# bazel build --config=opt --local_ram_resources=8000 //tensorflow/tools/pip_package:build_pip_package
Extracting Bazel installation...

Starting local Bazel server and connecting to it...
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=150
INFO: Reading rc options for 'build' from /tensorflow_src/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /tensorflow_src/.bazelrc:
  'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=false
INFO: Reading rc options for 'build' from /tensorflow_src/.tf_configure.bazelrc:
  'build' options: --action_env PYTHON_BIN_PATH=/usr/bin/python3 --action_env PYTHON_LIB_PATH=/usr/lib/python3/dist-packages --python_path=/usr/bin/python3
INFO: Reading rc options for 'build' from /tensorflow_src/.bazelrc:
  'build' options: --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/ir,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_jitrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/graph_executor,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils
INFO: Found applicable config definition build:short_logs in file /tensorflow_src/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /tensorflow_src/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:opt in file /tensorflow_src/.tf_configure.bazelrc: --copt=-Wno-sign-compare --host_copt=-Wno-sign-compare
INFO: Found applicable config definition build:linux in file /tensorflow_src/.bazelrc: --host_copt=-w --copt=-Wno-all --copt=-Wno-extra --copt=-Wno-deprecated --copt=-Wno-deprecated-declarations --copt=-Wno-ignored-attributes --copt=-Wno-unknown-warning --copt=-Wno-array-parameter --copt=-Wno-stringop-overflow --copt=-Wno-array-bounds --copt=-Wunused-result --copt=-Werror=unused-result --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /tensorflow_src/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/tensorflow/runtime/archive/4ce3e4da2e21ae4dfcee9366415e55f408c884ec.tar.gz failed: class java.io.FileNotFoundException GET returned 404 Not Found
WARNING: Download from https://mirror.bazel.build/github.com/bazelbuild/rules_cc/archive/081771d4a0e9d7d3aa0eed2ef389fa4700dfb23e.tar.gz failed: class java.io.FileNotFoundException GET returned 404 Not Found
WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/openxla/stablehlo/archive/fdd47908468488cbbb386bb7fc723dc19321cb83.zip failed: class java.io.FileNotFoundException GET returned 404 Not Found
WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/google/XNNPACK/archive/e8f74a9763aa36559980a0c2f37f587794995622.zip failed: class java.io.FileNotFoundException GET returned 404 Not Found
WARNING: Download from https://golang.org/dl/?mode=json&include=all failed: class java.io.IOException connect timed out
INFO: Analyzed target //tensorflow/tools/pip_package:build_pip_package (544 packages loaded, 31179 targets configured).
INFO: Found 1 target...
ERROR: /tensorflow_src/tensorflow/core/kernels/BUILD:3680:18: Compiling tensorflow/core/kernels/cwise_op_greater.cc failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 248 arguments skipped)
gcc: fatal error: Killed signal terminated program cc1plus
compilation terminated.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 28285.593s, Critical Path: 21597.74s
INFO: 6601 processes: 1056 internal, 5545 local.
FAILED: Build did NOT complete successfully

Más tarde, la consulta encontró que fue causado por espacio de intercambio insuficiente durante el proceso de compilación.Los siguientes métodos se pueden usar para compilar normalmente:

time bazel build -c opt //tensorflow/tools/pip_package:build_pip_package --local_ram_resources=4000 --local_cpu_resources=4

inserte la descripción de la imagen aquí

# bazel build -c opt //tensorflow/tools/pip_package:build_pip_package --local_ram_resources=4000 --local_cpu_resources=4
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=140
INFO: Reading rc options for 'build' from /tensorflow_src/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /tensorflow_src/.bazelrc:
  'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=false
INFO: Reading rc options for 'build' from /tensorflow_src/.tf_configure.bazelrc:
  'build' options: --action_env PYTHON_BIN_PATH=/usr/bin/python3 --action_env PYTHON_LIB_PATH=/usr/lib/python3/dist-packages --python_path=/usr/bin/python3
INFO: Reading rc options for 'build' from /tensorflow_src/.bazelrc:
  'build' options: --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/ir,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_jitrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/graph_executor,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils
INFO: Found applicable config definition build:short_logs in file /tensorflow_src/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /tensorflow_src/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:linux in file /tensorflow_src/.bazelrc: --host_copt=-w --copt=-Wno-all --copt=-Wno-extra --copt=-Wno-deprecated --copt=-Wno-deprecated-declarations --copt=-Wno-ignored-attributes --copt=-Wno-unknown-warning --copt=-Wno-array-parameter --copt=-Wno-stringop-overflow --copt=-Wno-array-bounds --copt=-Wunused-result --copt=-Werror=unused-result --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /tensorflow_src/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/tensorflow/runtime/archive/4ce3e4da2e21ae4dfcee9366415e55f408c884ec.tar.gz failed: class java.io.FileNotFoundException GET returned 404 Not Found
WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/openxla/stablehlo/archive/fdd47908468488cbbb386bb7fc723dc19321cb83.zip failed: class java.io.FileNotFoundException GET returned 404 Not Found
WARNING: Download from https://mirror.bazel.build/github.com/bazelbuild/rules_cc/archive/081771d4a0e9d7d3aa0eed2ef389fa4700dfb23e.tar.gz failed: class java.io.FileNotFoundException GET returned 404 Not Found
WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/google/XNNPACK/archive/e8f74a9763aa36559980a0c2f37f587794995622.zip failed: class java.io.FileNotFoundException GET returned 404 Not Found
WARNING: Download from https://golang.org/dl/?mode=json&include=all failed: class java.io.IOException connect timed out
INFO: Analyzed target //tensorflow/tools/pip_package:build_pip_package (544 packages loaded, 31179 targets configured).
INFO: Found 1 target...
Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
  bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 12172.184s, Critical Path: 319.39s
INFO: 13069 processes: 1519 internal, 11550 local.
INFO: Build completed successfully, 13069 total actions


real	97m39.864s
user	0m0.620s
sys	0m0.355s

3. Finalmente, se completa la compilación. Durante el período, ocurrieron varias muertes, muy probablemente debido a memoria y CPU insuficientes, y luego se empaquetan e instalan.


Después de 97 minutos de compilación, finalmente está listo y luego empaquetado;

# 打包生成 安装文件
# ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /mnt/out  

  check.warn(importable)
/usr/local/lib/python3.8/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
Mon Mar 6 06:12:35 UTC 2023 : === Output wheel file is in: /mnt/out

Pruebe si existe la biblioteca tf:

   python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

在没有安装前:
    raise ImportError("Could not import tensorflow. Do not import tensorflow "
安装之后:
mkdir /root/.pip/

# 增加 pip 的源,再进行安装 tf
echo "[global]" > ~/.pip/pip.conf
echo "index-url = https://mirrors.aliyun.com/pypi/simple/" >> ~/.pip/pip.conf
echo "[install]" >> ~/.pip/pip.conf
echo "trusted-host=mirrors.aliyun.com" >> ~/.pip/pip.conf

# 直接安装 tf的编译后的whl 文件和相关的依赖库:
p3 install tensorflow-2.11.0-cp38-cp38-linux_x86_64.whl 


python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
2023-03-06 06:26:01.652690: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-06 06:26:03.657543: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
tf.Tensor(-155.91696, shape=(), dtype=float32)

4. Resumen


https://u.jd.com/IsILuAE

Planeo encontrar un host barato, quitar el disco del sistema y hacer un entrenamiento modelo.Cuando necesite actualizar la configuración de la computadora, venderlo y comprar uno nuevo.
inserte la descripción de la imagen aquí
¿Por qué elegir una computadora tan barata? El uso principal es la memoria de video 12G de rtx3060.
Puede comprar muchos conjuntos y luego realizar grupos de Tensorflow para el entrenamiento. Lo principal es que no tienen dinero y son pobres.

Tesnsorflow usa devel para admitir la compilación del código fuente, lo cual es muy conveniente.
El interior de babel está todo configurado, siempre que la memoria esté configurada, espere el resultado de la compilación lentamente.
–local_ram_resources=4000
–local_cpu_resources=4
se puede instalar directamente en la imagen actual.

El enlace original de este artículo es:
https://blog.csdn.net/freewebsys/article/details/129331455

inserte la descripción de la imagen aquí

Supongo que te gusta

Origin blog.csdn.net/freewebsys/article/details/129331455
Recomendado
Clasificación