Read the article by the big guy to improve SCP transmission speed
Use tar+lz4/pigz+ssh for faster data transfer:
http://www.orczhou.com/index.php/2013/11/tranfer-data-faster-on-the-fly/
Speed up scp transmission speed:
http://www.orczhou.com/index.php/2013/11/make-scp-faster-with-cipher-and-compression/
[root@mysql141 binlog]# time scp -r mysql-bin.000214 172.25.2.142:/opt/soft/test_scp/ mysql-bin.000214 100% 1024MB 126.0MB/s 00:08 real 0m8.331s user 0m6.597s sys 0m2.721s [root@mysql141 binlog]# time scp -r -c aes192-cbc mysql-bin.000214 172.25.2.142:/opt/soft/test_scp/ mysql-bin.000214 100% 1024MB 177.4MB/s 00:05 real 0m5.959s user 0m4.444s sys 0m2.825s [root@mysql141 binlog]# time tar -c mysql-bin.000214|lz4 -B4|ssh -c aes192-cbc 172.25.2.142 "lz4 -d |tar -xC /opt/soft/test_scp" using blocks of size 64 KB real 0m4.024s user 0m5.049s sys 0m2.634s
I have nothing to do, thinking that scp is a serial mode, can the parallel mode be used to further increase the speed, and I did the following experiment:
192.168.11.81 read:
dd if=20210331.txt of=/dev/null bs=1M count=1000
1.8 GB/s
scp transmission:
scp -c aes192-cbc 20210331.txt 192.168.11.82:~/
20210331.txt 100% 1000MB 114.2MB/s 00:08
192.168.11.82 writes:
dd if=/dev/zero of=20210331.txt bs=1M count=1000
892 MB / s
Overall process
Disk read------->scp transmission--------->drop to disk
>1.8 GB/s >114.2MB/s >892 MB/s
The bottleneck is still in the scp transmission. Looking at the bandwidth of the iperf test, 114.2MB/s obviously does not reach the bandwidth limit.
ethtool eth0|grep Speed
Speed: 1000Mb/s
iperf -c 192.168.11.82 -t 60 -f M
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-60.0 sec 28010 MBytes 467 MBytes/sec
iperf -c 192.168.11.82 -P 10 -t 60 -f M
[ ID] Interval Transfer Bandwidth
[SUM] 0.0-60.0 sec 29660 MBytes 494 MBytes/sec
Now think of a way to maximize the speed of the scp transmission process, compare the speed of serial and parallel
Create test data, 10 files of 1000M:
for i in {1..10} do dd if=/dev/zero of=${i}_20210331.txt bs=1M count=1000 done
Conventional serial execution:
time scp -r -c aes192-cbc *_20210331.txt 192.168.11.82:~/ 10_20210331.txt 100% 1000MB 70.1MB/s 00:14 1_20210331.txt 100% 1000MB 75.7MB/s 00:13 2_20210331.txt 100% 1000MB 83.3MB/s 00:12 3_20210331.txt 100% 1000MB 76.6MB/s 00:13 4_20210331.txt 100% 1000MB 78.1MB/s 00:12 5_20210331.txt 100% 1000MB 76.6MB/s 00:13 6_20210331.txt 100% 1000MB 76.8MB/s 00:13 7_20210331.txt 100% 1000MB 83.2MB/s 00:12 8_20210331.txt 100% 1000MB 77.3MB/s 00:12 9_20210331.txt 100% 1000MB 83.3MB/s 00:12
real 2m11.144s
user 1m2.778s
sys 1m20.256s
Write the parallel execution script parallel_scp.sh, the content and execution method are as follows:
#Script passing value in sequence is: file name (support wildcards) target IP target path scp concurrent number
sh parallel_scp.sh '*_20210331.txt' 192.168.11.82 '/root/' 10
cat parallel_scp.sh #!/bin/bash fileName=$1 remoteHost=$2 remoteDir=$3 maxScp=$4 ls ${fileName} > filelist.lst cat filelist.lst | while read line do scpNum = `ps -ef | awk '{print $ 8}' | grep ^ scp | egrep -v 'grep | tail' | wc -l` while [ ${scpNum} -ge ${maxScp} ] do sleep 5 scpNum = `ps -ef | awk '{print $ 8}' | grep ^ scp | egrep -v 'grep | tail' | wc -l` done scpFileName=`echo ${line}| awk '{print "scp_"$0".sh"}'` echo "scp -r -c aes192-cbc ${line} ${remoteHost}:${remoteDir}" > ${scpFileName} time nohup sh ${scpFileName} > ${scpFileName}.out 2>&1 & #echo "Current progress: nohup sh ${scpFileName}> ${scpFileName}.out &" done #echo "Transfer complete."
sh parallel_scp.sh '*_20210331.txt' 192.168.11.82 '/root/' 10
real 1m12.449s
user 0m5.911s
sys 0m5.150s
According to the execution time of theory and test experiment, parallel scp will be faster than serial time.
The above are all based on the same file size. If the file sizes to be transferred are different, if the same batch of parallel files are all large files or all small files when executed in parallel, the transmission speed must be uneven. Next, consider the same batch Parallel files are combined into a similar size
ls -lS /root/*_20210331.txt|awk'{print $9}'> 123456.txt--Sorted by file size, from largest to smallest
123456.txt exists in 100 lines, the content is
/root/1_20210331.txt --10G
/root/2_20210331.txt --9G
....
/root/100_20210331.txt --1MB
awk'{lines[NR]=$0} END{i=50; while(i>0){print lines[i];--i}}' 123456.txt >p1.txt - arrange files in reverse order, 100 The top 50 files in the file are sorted in reverse order
awk'{lines[NR]=$0} END{j=51; while(j<101){print lines[j];++j}}' 123456.txt >p2.txt - arrange files in sequence, 100 The order of the top 50 files in the file
method one:
paste -d "\n" p1.txt p2.txt >c.txt ---cross merge two files line by line, the content of c.txt is the sorting of the files after evenly dividing the size
Method two (the paste command does not necessarily exist in the unix environment):
Cross merge the contents of two files by line
for ((i=1;i<=$line;i++));do
cat a.txt | tail -n +$i |head -n 1 >>c.txt #Extract the i-th line of p1.txt and append to c.txt
cat b.txt | tail -n +$i |head -n 1 >>c.txt #Extract the i-th line of p2.txt and append to c.txt
done
The last c.txt here is replaced with filelist.lst in the script, and then parallel_scp.sh can be called to execute.
to sum up:
1. Parallel is used here to maximize the speed of scp without considering the performance impact. The production environment needs to be used with caution.
2. For very large binary files (data type), consider using tar+lz4/pigz+ssh for data transmission.
3. By analogy, you can also consider using this method for transmission tools such as ftp and sftp that are executed in the same serial, and the script can be changed slightly.