3Dcnn false positive model debugging (two)

ps: Already driven crazy, too slow (3dcnn). It is beyond my tolerance. The data of the second method of the previous blog only ran 1/5 of an epoch for more than 60 hours., ( It takes about 6 hours to convert 1/50 of the data to run an epoch).

1. Simplify the input data of the second method to the simpler first method and only take 1/50 of the previous data. It is still very slow. Only 1.5 epochs ran in 12 hours (converting an enpoch requires 8 hours, which is slower than the above). I can only calm down and see where the speed is limited, and try to adjust it to speed up. I always feel that the data reading and writing cause the running slow.

 

2. First look at the CPU situation according to the instructions in the https://www.cnblogs.com/bugutian/p/6138880.html blog

查看CPU信息(型号)
pacs@pacs-Z170X-UD3:~$ cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c
      8  Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
查看物理CPU个数
pacs@pacs-Z170X-UD3:~$ cat /proc/cpuinfo| grep "physical id"| sort| uniq| wc -l
1
查看每个物理CPU中core的个数(即核数)
pacs@pacs-Z170X-UD3:~$ cat /proc/cpuinfo| grep "cpu cores"| uniq
cpu cores    : 4
查看逻辑CPU的个数
pacs@pacs-Z170X-UD3:~$ cat /proc/cpuinfo| grep "processor"| wc -l
8
CPU总核数 = 物理CPU个数 * 每颗物理CPU的核数 
总逻辑CPU数 = 物理CPU个数 * 每颗物理CPU的核数 * 超线程数

From the results of the above execution, it is proved that a physical cpu of the computer used has 1*4=4 physical cores, and each physical core has 2 hyperthreads, so there are 8 logical cpus.

Let's take a look at my cpu and gpu

cpu:

top - 11:29:53 up 17:25,  1 user,  load average: 15.53, 15.42, 15.22
Tasks: 292 total,   1 running, 289 sleeping,   0 stopped,   2 zombie
%Cpu(s):  8.3 us,  4.0 sy,  0.0 ni, 33.6 id, 54.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 49409720 total,  2158676 free, 14970332 used, 32280712 buff/cache
KiB Swap: 15625212 total, 15476768 free,   148444 used. 33094040 avail Mem 

It can be seen from the above that the load average load of the cup is very high, reaching about 15

There is also 33.6 id: percentage of idle CPU time, 54.1 wa: percentage of CPU time waiting for I/O. The percentage of idle and waiting for I/O cpu time is too high.

The swap space used as the 32280712 buff/cache cache is huge.

 

3. The "grep -c'model name' /proc/cpuinfo" command directly returns the total number of cores of the CPU.

pacs@pacs-Z170X-UD3:~$ grep 'model name' /proc/cpuinfo | wc -l
8

This proves that eight cores are used (according to the second point, one physical cpu, each cpu has 4 physical cores, and each core has two hyperthreads. A total of 1 cpu and 8 cores)

In terms of system load, multi-core CPUs have similar effects to multi-CPUs, so when considering system load, you must consider how many CPUs the computer has and how many cores each CPU has. Then, divide the system load by the total number of cores. As long as the load of each core does not exceed 1.0, the computer is operating normally.

Is 1.0 the ideal value for system load?
Not necessarily, system administrators often leave a little leeway. When this value reaches 0.7, it should be paid attention. The rule of thumb is this:
when the system load continues to be greater than 0.7, you must start investigating where the problem is and prevent the situation from getting worse.
When the system load continues to be greater than 1.0, you must start to find a solution to lower this value.
When the system load reaches 5.0, it indicates that your system has a serious problem, has not responded for a long time, or is close to crashing. You should not let the system reach this value.

Reference: https://blog.csdn.net/jackliu16/article/details/79382993

My computer has 8 cores, so the load is 5.6 is more suitable, and the speed can be up to 8. Now it has reached 15 and it has been blocked.

Therefore, reset the size of num_workers so that its value is between 5.6 and 8.

Post again

The function of DataLoader is defined as follows:

DataLoader(dataset, batch_size=1, shuffle=False, sampler=None, num_workers=0, collate_fn=default_collate, pin_memory=False, drop_last=False)

1. dataset: the loaded data set (Dataset object)
2. batch_size: batch size
3. shuffle:: whether to shuffle the data
4. sampler: sample sampling, which will be described in detail later
5. num_workers: the number of processes loaded by multiple processes , 0 means not using multiple processes
6. collate_fn: How to splice multiple sample data into a batch, generally use the default splicing method
7. pin_memory: whether to save the data in the pin memory area, the data in the pin memory is transferred GPU will be faster.
8. drop_last: The number of data in the dataset may not be an integer multiple of batch_size. If drop_last is True, data that is less than one batch will be discarded.
 

 

 

Guess you like

Origin blog.csdn.net/qq_36401512/article/details/89468458