[Shell-Multiple Concurrency] Use Shell scripts to perform multiple concurrent operations in a loop

1) Method 1 (using for loop)

1.1. Case 1 (lack of wait leads to wrong execution sequence)

>  for i in `seq 1 10`
do
    sleep 1 &; echo $i
done
echo "all weakup"

This example asks to print "all weakup" after all the commands in the for loop (sleep 1) are executed. If you follow this script, you will find that this is not the case, because the for loop will not wait for the execution of the sleep command to end, but will exit after submitting the command to the system, and before a sleep is executed, "all weakup" is already printed.

1.2. Case 2 (lack of control blocking causes all tasks to be executed)

In order to meet the requirements of the topic, it is necessary to add the wait command before the echo "all weakup" command, which means to wait for the execution of all background tasks that have been affected by & above before continuing.

>  for i in `seq 1 10`
do
    sleep 1 &; echo $i
done
wait
echo "all weakup"

1.3. Case 3 (final version)

In the example above, the for loop will background all the commands. Obviously, this method is not advisable if each command requires relatively large overhead and the number of loops is too large. Then, it is required that the for loop is partially executed or after the loop reaches a certain number of times, it is necessary to wait, wait for the tasks submitted by the previous batch to be executed, and then submit a certain number of commands (recycle a certain number of times) before continuing to wait. Although this solution is not elegant, at least it will not cause too many background tasks to be submitted to the system at one time.

degree=4
for i in `seq 1 10`
do
    sleep 1 & # 提交到后台的任务
    echo $i
    [ `expr $i % $degree` -eq 0 ] && wait
done

In the above example, a variable degree is set to indicate the degree of parallelism. The key to controlling blocking in the entire loop is the statement [ -eq 0 ] expr $i % $degree&& wait, that is, at the nth loop, if n happens to be equal to the modulo of degree 0, then the loop will block first, wait for the execution of the previous n background tasks before continuing, and so on.

2) Method 2 (using named pipes as task queues)

cosDbName=${cosDbName}
x8vDbName=${x8vDbName}
x5lTableName=${x5lTableName}
x8vTableName=${x8vTableName}
sampleDateFile=${sampleDateFile}
parallelism=${parallelism}
start_time=`date +%s`              #定义脚本运行的开始时间

[ -e /tmp/fd1 ] || mkfifo /tmp/fd1 #创建有名管道
exec 3<>/tmp/fd1                   #创建文件描述符,以可读(<)可写(>)的方式关联管道文件,这时候文件描述符3就有了有名管道文件的所有特性
rm -rf /tmp/fd1                    #关联后的文件描述符拥有管道文件的所有特性,所以这时候管道文件可以删除,我们留下文件描述符来用就可以了
for ((i=1;i<=${parallelism};i++))
do
        echo >&3                   #&3代表引用文件描述符3,这条命令代表往管道里面放入了一个"令牌"
done

for date in `cat /opt/corns/${
     
     sampleDateFile}`
do

read -u3

{
    
    
kinit -kt /opt/conf/x9e.keytab [email protected]

yarn jar /opt/corns/cos-distcp-1.12-3.1.0.jar -libjars /opt/corns/cos_api-bundle-5.6.69.jar,/opt/corns/hadoop-cos-3.1.0-8.1.7.jar \
-Dfs.cosn.credentials.provider=org.apache.hadoop.fs.auth.SimpleCredentialProvider \
-Dfs.cosn.userinfo.secretId=******************************** \
-Dfs.cosn.userinfo.secretKey=******************************** \
-Dfs.cosn.bucket.region=ap-guangzhou \
-Dfs.cosn.impl=org.apache.hadoop.fs.CosFileSystem \
-Dfs.AbstractFileSystem.cosn.impl=org.apache.hadoop.fs.CosN \
-Dmapred.job.queue.name=*** \
--bandWidth=50 \
--taskNumber=10 \
--workerNumber=1 \
--jobName=cos2hdfs-${x5lTableName}-${date} \
--skipMode=length \
--checkMode=length \
--src cosn://buckets-name/user/x5l/hive/${cosDbName}/${x5lTableName}/sample_date=${date}/ \
--dest hdfs://prdns/warehouse/tablespace/external/hive/${x8vDbName}.db/${x8vTableName}/sample_date=${date}/

hadoop distcp \
-D mapred.task.timeout=60000000 \
-D mapreduce.job.name=hdfs2s3-${x8vTableName}-${date} \
-Dmapred.job.queue.name=x9e \
-Dfs.s3a.access.key=******************* \
-Dfs.s3a.secret.key=************************************** \
-Dfs.s3a.endpoint=test01obs.gaccloud.com.cn \
-Dfs.s3a.connection.ssl.enabled=true \
-Dfs.s3a.signing-algorithm=S3SignerType \
-Dfs.s3a.ssl.channel.mode=default_jsse_with_gcm \
-direct \
-bandwidth=150 \
-m=20 \
-numListstatusThreads=40 \
hdfs://prdns/warehouse/tablespace/external/hive/${x8vDbName}.db/${x8vTableName}/sample_date=${date}/* \
s3a://buckets-name/prd/data/${x8vDbName}/${x8vTableName}/sample_date=${date}/

if [ $? -eq 0 ]; then
    echo ${date}": succeed"
   else
    break
fi

echo >&3
} &
done
wait

stop_time=`date +%s`  #定义脚本运行的结束时间

echo "TIME:`expr $stop_time - $start_time`"
exec 3<&-                       #关闭文件描述符的读
exec 3>&-                       #关闭文件描述符的写
  • named pipe

    The idea of ​​​​processing named pipes:
    it is equivalent to having 10 boiling water rooms with 10 keys at this time. At this time, 100 people want to turn on the water. Then the first 10 people grab the keys and go in first to turn on the water. The next 90 people It is necessary to return the key after the person in front comes out, and then go in with the key to open the water, so that the task of opening the water for 100 people can be controlled, and at the same time, the system resources will not be consumed too much at one time, increasing the pressure and reducing the water consumption. processing speed.

  • Knowledge points:

    1. Features of Named Pipes
    If the content of the pipe is empty, it will be blocked.
    The pipe has the property of reading one less than one, saving one and reading one, and the ones put back can be fetched repeatedly.
    Queue control can be realized

    2. If the pipeline puts a piece of content and no one takes it, it will block. To
    solve the above problems, you can use the file
    descriptor. The file descriptor has all the characteristics of the pipeline, and also has a characteristic that the pipeline does not have: unlimited storage without blocking, unlimited access without blocking Block without paying attention to pipe contents

  • How to create a named pipe

    # 1、创建命名管道
    mkfifo /tmp/fl
    #2、创建文件描述符100,并关联到管道文件
    exec 100<>/tmp/fl
    #3、调用文件描述符,向管道里存放内容,同时也表示将用完的管道内容在放回管道
    echo >&100
    #4、读取文件描述符关联管道中的内容
    `read -u100``
    #5、关闭文件描述符的读和写
    exec 100<&-
    exec 100>&-
    

3) The meaning of $ in Shell script

$0 这个程式的执行名字
$n 这个程式的第n个参数值,n=1…9
$* 这个程式的所有参数,此选项参数可超过9个。
$# 这个程式的参数个数
$$ 这个程式的PID(脚本运行的当前进程ID号)
$! 执行上一个背景指令的PID(后台运行的最后一个进程的进程ID号)
$? 执行上一个指令的返回值 (显示最后命令的退出状态。0表示没有错误,其他任何值表明有错误)
$- 显示shell使用的当前选项,与set命令功能相同
@ 跟 @ 跟@跟*类似,但是可以当作数组用

Guess you like

Origin blog.csdn.net/weixin_53543905/article/details/131420670