Preface
In pytorch, we often need to measure the running time of a certain code segment, a certain class, a certain function or a certain operator to determine the speed bottleneck of the entire model. This article introduces some common methods.
∇ \nabla ∇ Contact:
E-mail: [email protected]
QQ: 973 926 198
GitHub: https://github.com/FesianXu
Generally speaking, we can use two major methods to measure code segments:
timeit
, It is equivalent to measuring the start time and end time of the code, and then find the difference.profile
, Some pytorch comes with or third-party code time-consuming tools.
timeit
This way of working is similar to the following code:
import time
begin = time.clock()
run_main_code()
end = time.clock()
print(end-begin)
# time consuming
However, pytorch's code often runs on the GPU, and the operation on the GPU is asynchronous, which means that the general timeit operation cannot accurately get the runtime sum, so we generally need to use the pytorch
built-in timing tools and synchronization tools , The code is like [1]:
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)
start.record()
z = x + y
end.record()
# Waits for everything to finish running
torch.cuda.synchronize()
print(start.elapsed_time(end))
profile
timeit
The method of testing some small codes is still barely applicable, but it will obviously become very troublesome in large-scale testing. Of course, you can add decorators to simplify the process of adding these time measurement codes manually. Boring, but this is not the best solution.
Fortunately, it pytorch
comes with time-consuming tools for each part of the calculation model, which can calculate the time-consuming cpu or gpu, which is very practical. This magical tool is called profile
[3], and in pytorch.autograd
it, the usage is very simple.
x = torch.randn((1, 1), requires_grad=True)
with torch.autograd.profiler.profile(enabled=True) as prof:
for _ in range(100): # any normal python code, really!
y = x ** 2
print(prof.key_averages().table(sort_by="self_cpu_time_total"))
Reference
[1]. https://discuss.pytorch.org/t/how-to-measure-time-in-pytorch/26964/2
[2]. https://blog.csdn.net/LoseInVain/article/details/82055524
[3]. https://pytorch.org/docs/stable/autograd.html?highlight=autograd%20profiler#torch.autograd.profiler.profile