xgboost multi-threading, solve the problem that the default number of open threads is the number of cpu

surroundings

python 3.6
xgboost 1.0.1

phenomenon

On a 48c server, just import xgboost. Before training, I found through the command that the number of threads reached 48
codes:

import time
import xgboost

if __name__ == '__main__':
    print("睡眠开始")
    time.sleep(15)
    print("睡眠结束")

A mirror is started here, and the number of threads is queried through /proc/pid/status in Linux

pid=`docker top fad7c792ccf35b65ddd | grep test.py |  awk '{print $2}'`;cat /proc/$pid/status  

The number of threads queried is 48
Insert picture description here

principle

In XGBoost, single-machine multi-threading is not achieved through explicit pthreads, but multi-threaded processing is completed through OpenMP. This may be relatively simple with the multi-threaded processing logic in XGBoost, without complicated threads. The need for inter-synchronization, so it can be better supported by OpenMP, and it also simplifies the code development and maintenance burden.

OpenMP

OpenMP is the abbreviation of Open MultiProcessing. It is a multi-threaded concurrent programming API that supports cross-platform shared memory mode.
When the project program has been completed, you do not need to modify the source code significantly, you only need to add a special pragma to indicate your intentions, so the compiler can automatically parallelize the program and add synchronization where necessary Mutual exclusion and communication.
E.g#pragma omp parallel for

solution

omp_num_threads

For programs that call OpenMP's lib and compiled into OpenMP, for code with #pragma added, by default, the same number of threads as your CPU cores will be called to execute this program. The number of threads can be controlled by setting the environment variable OMP_NUM_THREADS.

Python can set the environment variable omp_num_threads through the following

import os
os.environ['OMP_NUM_THREADS'] = "1"

xgboost multithreading

For sklearn's XGBClassifier and XGBModel, the
number of threads can be controlled by setting n_jobs

For native xgboost, the number of threads is controlled through nthread

If it is not controlled by the OMP_NUM_THREADS variable, then n_jobs or nthread is configured, and then adjust the number of threads on the basis of the original 48 cores.

Guess you like

Origin blog.csdn.net/qq_33873431/article/details/108362471