Android malicious application identification (extra chapter) (Python parallel (multi-threaded or multi-process) execution of cmd)

Preface

In order to batch decompile, I had to involve batch execution. I didn't fully understand the concept of multi-threading and python methods before. Now I can only try it step by step and practice it, and write this article to record it.

1. Processes and Threads

1.1 What is a process?

1.1.1 Concept

A process is a program with independent functions. It is an entity that runs activities on a data collection. The focus is on running. Only running has the concept of process. Here I intercepted the task manager in my computer:
Insert image description here

You can see that the browser, Pycharm and Task Manager are the applications I currently open and are processes. Similarly, WeChat and QQ in my lower right corner are called background processes because they are also running. Each process will have a status code, called PID. When learning Linux, this often reports errors and is used to kill PIDkill the process.

1.1.2 Three states

Ready: When a process is allocated necessary resources other than the CPU, it can be executed immediately as long as it obtains the CPU again. The process is in the ready state at this time.
Blocked: When the executing process cannot continue to execute due to the occurrence of an event or the receipt of a message, it abandons the processor and is in a suspended state, that is, the execution of the process is blocked. This paused state is called a blocked state. Sometimes also called waiting state and blocking state.
Running: The process has obtained the CPU and its program is executing.

1.2 What is a thread?

A thread is the smallest unit for performing operations in a process. It is an entity in the process. It depends on the thread and is the basic unit that is independently scheduled and dispatched by the system. The thread itself does not own system resources, only a few things that are indispensable during operation. resources (a program counter, a set of registers, and a stack), but it can share all resources owned by the process with other threads belonging to the same process.

1.3 Advantages of multi-threading

  1. Concurrency can be achieved easily and efficiently through threads. A process can create multiple threads to execute different parts of the same program.

  2. Creating threads is faster than creating processes, requires less overhead, and takes up fewer resources.

  3. Application concurrency is achieved by creating multi-threaded processes, each thread running on a processor, so that each processor is fully run.

References: Operating Systems: Differences and Connections between Processes and Threads

2. Python multi-thread learning

The following is an example from a novice tutorial :

import thread
import time
 
# 为线程定义一个函数
def print_time( threadName, delay):
   count = 0
   while count < 5:
      time.sleep(delay)
      count += 1
      print "%s: %s" % ( threadName, time.ctime(time.time()) )
 
# 创建两个线程
try:
   thread.start_new_thread( print_time, ("Thread-1", 2, ) )
   thread.start_new_thread( print_time, ("Thread-2", 4, ) )
except:
   print "Error: unable to start thread"
 
while 1:
   pass

The output is as follows:

Thread-1: Thu Jan 22 15:42:17 2009
Thread-1: Thu Jan 22 15:42:19 2009
Thread-2: Thu Jan 22 15:42:19 2009
Thread-1: Thu Jan 22 15:42:21 2009
Thread-2: Thu Jan 22 15:42:23 2009
Thread-1: Thu Jan 22 15:42:23 2009
Thread-1: Thu Jan 22 15:42:25 2009
Thread-2: Thu Jan 22 15:42:27 2009
Thread-2: Thu Jan 22 15:42:31 2009
Thread-2: Thu Jan 22 15:42:35 2009

According to the output results, it can be seen that threads are also sequential, which is a queue nature. The first print_time (Thread-1) is executed first, and then the first print_time (Thread-1) is executed continuously without stopping. Then you see the first thread and the second thread. The threads are executing in parallel. Why not talk about concurrency here? After reviewing the detailed explanations of processes, threads, multi-threads, concurrency, and parallelism, and comparing the results, I found that the two threads are running synchronously. As for multi-threading, is it concurrency or parallelism? The multi-threads written may be assigned to one CPU core for execution, or may be assigned to different CPUs for execution. The allocation process is done by the operating system and cannot be controlled by humans. So is multithreading concurrent or parallel? It's possible.

3. Practical operations

My initial thought was that executing cmd commands one by one would take too long, so I searched online and found that there are many methods, such as os, subprocess, etc., which all generate sub-processes instead of multi-threads. We know that starting a thread takes far less space than starting a process, but I still tried it. I used a for loop nesting instead of thread startup. It turned out that it was executed in a queue, sequentially Executed, which means that you have to wait for the previous cmd command to finish running before proceeding to the next one.

subprocess.Popen(cmd_str, shell=True, stdout=None, stderr=None).wait()

As you can imagine, the decompilation speed is very slow.
Then, I looked up the multi-threaded execution method, [Python] Run multiple cmd commands in parallel

    # 是否需要并行运行
    if_parallel = True

    # 需要执行的命令列表
    # model_list = ['yolo', 'centernet']
    # cmds = ['python main.py --model ' + i for i in model_list]
    cmds = ["F: & cd F:\\benign_apk & " + "apktool.bat d -f " + "benign" + str(i) + ".apk" for i in range(65,70)]
    if if_parallel:
        # 并行
        threads = []
        for cmd in cmds:
            th = threading.Thread(target=execCmd, args=(cmd,))
            th.start()
            threads.append(th)

When I adjusted the parallel number to 5, the amount of memory rose immediately. When I adjusted it to more, the memory was almost full, and I didn't know how to stop this thread. Even if I turned off the compiler, these codes It's still running, so I'm wondering whether multi-threading is enabled and how to turn off these threads.

With the above doubts, I jumped the cmd parallel number to 15 and opened the resource monitor:
Insert image description here
I can see that this method actually opens multiple processes. In the task manager, my conjecture was also verified.
Insert image description here
From this point of view, the slowness is understandable. Why? Look at the following picture:
Insert image description here
The truth is clear. It is because the cmd command calls the jar package, which generates more memory for processing. When I want to process more than 900 files , of course it overflowed, sweating to death, and could only choose to process it in batches, because the computer's running memory is limited.
Finally, show the complete code:

import datetime
import os
import threading


def execCmd(cmd):
    try:
        print("命令%s开始运行%s" % (cmd, datetime.datetime.now()))
        subprocess.Popen(cmd, shell=True, stdout=None, stderr=None).wait()
        print("命令%s结束运行%s" % (cmd, datetime.datetime.now()))
    except:
        print('%s\t 运行失败' % (cmd))


if __name__ == '__main__':
    # 是否需要并行运行
    if_parallel = True

    # 需要执行的命令列表
    # model_list = ['yolo', 'centernet']
    # cmds = ['python main.py --model ' + i for i in model_list]
    cmds = ["F: & cd F:\\benign_apk & " + "apktool.bat d -f " + "benign" + str(i) + ".apk" for i in range(70,85)]
    if if_parallel:
        # 并行
        threads = []
        for cmd in cmds:
            th = threading.Thread(target=execCmd, args=(cmd,))
            th.start()
            threads.append(th)

        # 等待线程运行完毕
        for th in threads:
        # .join的作用:现在有 A, B, C 三件事情,只有做完 A 和 B 才能去做 C,而 A 和 B 可以并行完成。
            th.join()
            # 确保 A 完成
            print("OK!!!!!!!!!!!")
    else:
        # 串行
        for cmd in cmds:
            try:
                print("命令%s开始运行%s" % (cmd, datetime.datetime.now()))
                os.system(cmd)
                print("命令%s结束运行%s" % (cmd, datetime.datetime.now()))
            except:
                print('%s\t 运行失败' % (cmd))

Conclusion and Thoughts

Batch execution of cmd actually means multi-process execution, not multi-threading. Although the memory occupied by each cmd is not high, it needs to be combined with whether the execution of the command involves other processes. If other processes occupy high memory, they cannot be parallel or concurrent. of.
In addition, I am wondering whether I can modify the code in .bat so that a cmd window can execute decompilation commands in batches in parallel. This needs to be considered. In the short term, given my coding level, I probably can't figure it out. I feel like I still don’t understand a lot of it, and I hope someone can criticize and correct me.

Guess you like

Origin blog.csdn.net/weixin_44165950/article/details/133197732