scp transmission speed to the extreme

Read the article by the big guy to improve SCP transmission speed

Use tar+lz4/pigz+ssh for faster data transfer:

http://www.orczhou.com/index.php/2013/11/tranfer-data-faster-on-the-fly/

Speed ​​up scp transmission speed:

http://www.orczhou.com/index.php/2013/11/make-scp-faster-with-cipher-and-compression/

[root@mysql141 binlog]# time scp -r mysql-bin.000214 172.25.2.142:/opt/soft/test_scp/
mysql-bin.000214                                                                 100% 1024MB 126.0MB/s   00:08    
real    0m8.331s
user    0m6.597s
sys     0m2.721s
[root@mysql141 binlog]# time scp -r -c aes192-cbc mysql-bin.000214 172.25.2.142:/opt/soft/test_scp/
mysql-bin.000214                                                                 100% 1024MB 177.4MB/s   00:05    
real    0m5.959s
user    0m4.444s
sys     0m2.825s
[root@mysql141 binlog]# time tar -c mysql-bin.000214|lz4 -B4|ssh -c aes192-cbc 172.25.2.142 "lz4 -d |tar -xC /opt/soft/test_scp"
using blocks of size 64 KB 
real    0m4.024s
user    0m5.049s
sys     0m2.634s


I have nothing to do, thinking that scp is a serial mode, can the parallel mode be used to further increase the speed, and I did the following experiment:

192.168.11.81 read:

dd if=20210331.txt of=/dev/null bs=1M count=1000

1.8 GB/s

scp transmission:

scp -c aes192-cbc 20210331.txt 192.168.11.82:~/

20210331.txt                     100% 1000MB 114.2MB/s   00:08

192.168.11.82 writes:

dd if=/dev/zero of=20210331.txt bs=1M count=1000

892 MB / s

Overall process

Disk read------->scp transmission--------->drop to disk

>1.8 GB/s       >114.2MB/s       >892 MB/s

The bottleneck is still in the scp transmission. Looking at the bandwidth of the iperf test, 114.2MB/s obviously does not reach the bandwidth limit.

ethtool eth0|grep Speed

 Speed: 1000Mb/s 

iperf -c 192.168.11.82 -t 60 -f M

[ ID] Interval       Transfer     Bandwidth

[  3]  0.0-60.0 sec  28010 MBytes   467 MBytes/sec

iperf -c 192.168.11.82 -P 10 -t 60 -f M 

[ ID] Interval       Transfer     Bandwidth

[SUM]  0.0-60.0 sec  29660 MBytes   494 MBytes/sec

Now think of a way to maximize the speed of the scp transmission process, compare the speed of serial and parallel

Create test data, 10 files of 1000M:

for i in {1..10}
do
 dd if=/dev/zero of=${i}_20210331.txt bs=1M count=1000
done

Conventional serial execution:

time scp -r -c aes192-cbc *_20210331.txt 192.168.11.82:~/
10_20210331.txt                   100% 1000MB  70.1MB/s   00:14    
1_20210331.txt                    100% 1000MB  75.7MB/s   00:13    
2_20210331.txt                    100% 1000MB  83.3MB/s   00:12    
3_20210331.txt                    100% 1000MB  76.6MB/s   00:13    
4_20210331.txt                    100% 1000MB  78.1MB/s   00:12    
5_20210331.txt                    100% 1000MB  76.6MB/s   00:13    
6_20210331.txt                    100% 1000MB  76.8MB/s   00:13    
7_20210331.txt                    100% 1000MB  83.2MB/s   00:12    
8_20210331.txt                    100% 1000MB  77.3MB/s   00:12    
9_20210331.txt                    100% 1000MB  83.3MB/s   00:12

real    2m11.144s

user    1m2.778s

sys     1m20.256s

Write the parallel execution script parallel_scp.sh, the content and execution method are as follows:

#Script passing value in sequence is: file name (support wildcards) target IP target path scp concurrent number

sh parallel_scp.sh '*_20210331.txt' 192.168.11.82 '/root/' 10

cat parallel_scp.sh
#!/bin/bash
fileName=$1
remoteHost=$2
remoteDir=$3
maxScp=$4
ls ${fileName} > filelist.lst
cat filelist.lst | while read line
do
scpNum = `ps -ef | awk '{print $ 8}' | grep ^ scp | egrep -v 'grep | tail' | wc -l`
while [ ${scpNum} -ge ${maxScp} ]
do
  sleep 5
  scpNum = `ps -ef | awk '{print $ 8}' | grep ^ scp | egrep -v 'grep | tail' | wc -l`
done
scpFileName=`echo ${line}| awk '{print "scp_"$0".sh"}'`
echo "scp -r -c aes192-cbc ${line} ${remoteHost}:${remoteDir}" >  ${scpFileName}
time nohup sh ${scpFileName} > ${scpFileName}.out 2>&1 &
#echo "Current progress: nohup sh ${scpFileName}> ${scpFileName}.out &"
done
#echo "Transfer complete."


sh parallel_scp.sh '*_20210331.txt' 192.168.11.82 '/root/' 10

real    1m12.449s

user    0m5.911s

sys     0m5.150s

According to the execution time of theory and test experiment, parallel scp will be faster than serial time.


The above are all based on the same file size. If the file sizes to be transferred are different, if the same batch of parallel files are all large files or all small files when executed in parallel, the transmission speed must be uneven. Next, consider the same batch Parallel files are combined into a similar size

ls -lS /root/*_20210331.txt|awk'{print $9}'> 123456.txt--Sorted by file size, from largest to smallest

123456.txt exists in 100 lines, the content is

/root/1_20210331.txt --10G

/root/2_20210331.txt --9G

....

/root/100_20210331.txt --1MB


awk'{lines[NR]=$0} END{i=50; while(i>0){print lines[i];--i}}' 123456.txt >p1.txt - arrange files in reverse order, 100 The top 50 files in the file are sorted in reverse order

awk'{lines[NR]=$0} END{j=51; while(j<101){print lines[j];++j}}' 123456.txt >p2.txt - arrange files in sequence, 100 The order of the top 50 files in the file

method one:

paste -d "\n" p1.txt p2.txt >c.txt ---cross merge two files line by line, the content of c.txt is the sorting of the files after evenly dividing the size

Method two (the paste command does not necessarily exist in the unix environment):

Cross merge the contents of two files by line

for ((i=1;i<=$line;i++));do

cat a.txt | tail -n +$i |head -n 1 >>c.txt #Extract the i-th line of p1.txt and append to c.txt

cat b.txt | tail -n +$i |head -n 1 >>c.txt #Extract the i-th line of p2.txt and append to c.txt

done

The last c.txt here is replaced with filelist.lst in the script, and then parallel_scp.sh can be called to execute.


to sum up:

1. Parallel is used here to maximize the speed of scp without considering the performance impact. The production environment needs to be used with caution.

2. For very large binary files (data type), consider using tar+lz4/pigz+ssh for data transmission.

3. By analogy, you can also consider using this method for transmission tools such as ftp and sftp that are executed in the same serial, and the script can be changed slightly.


Guess you like

Origin blog.51cto.com/wyzwl/2678452
scp