Application of IDApython in malware analysis

IDA has yara's batch recognition capability. Although the speed is a bit slow, it has a powerful static analysis and processing capability interface. Virus analysts can obtain the key encryption and decryption code through reverse analysis or dynamic adjustment, and use IDApython to batch analyze and Process the samples to obtain valuable threat intelligence information.

The main role of IDA in malware analysis is to use the powerful automated analysis capabilities of IDA's interfaceless version idat64.exe toBatch process family samples, statically decrypt or de-obfuscate, and finally get the URL and IP address of the virus sample as the main value information
The premise of use is:
1. The encryption functions of virus family samples are regular and common.
2. The encryption function of the virus sample is relatively simple or standard, and the decryption process can be restored by writing code after reverse analysis.
3. The basic data that the virus needs to decrypt comes from the HEX data built into the PE, not the memory that needs to be dynamically requested by functions such as VirtualAlloc

Decrypt URL address

This article refers to Josh Grunzweig's 2016 blog Using IDAPython to Make Your Life Easier: Part 1, 2 and 6.
The most valuable application in this blog is Part 6. From this blog, you can see that IDApython can effectively and automatically decrypt the encrypted URL in malware analysis, basically realizing a quick and effective way to get the URL directly from the file PE .
Sample MD5: 4BEFA0F5B3F981E498ACD676EB352D45
Insert picture description here
sample code with regular characteristics.
Insert picture description here
In addition, the first part and the second part of the blog use IDApython to decrypt the name of the self-written encryption function, and map the integer value (such as CRC32 hash) to the string representation (function name). From this point of view, the role of IDApython in malware analysis is mainly a kind ofLightweight, automated decryption tool, Can be usedbatchDe-obfuscate string, decrypt function name, and decrypt URL.

The code has slightly changed:

import idautils, idc, idaapi
 
def decode(data):    #this write for yourself for different virus
    out = ""
    c = 0
    for d in data:
        out += chr(ord(d) - c - 10)
        c += 1
    return out
 

def main(): 
    url = ""
    for func in idautils.Functions():
        flags = idc.GetFunctionFlags(func)
    # Ignore THUNK (jump function) or library functons
        if flags & FUNC_LIB or flags & FUNC_THUNK:
            continue
        dism_addr = list(idautils.FuncItems(func))
        for c in range(len(dism_addr)):
            try:                    # Look at four instructions at a time
                v1 = dism_addr[c]
                v2 = dism_addr[c+1]
                v3 = dism_addr[c+2]
                v4 = dism_addr[c+3] 
        # Look for known markers indicating we're seeing the encoded strings
        # being copied to a variable.
                if idc.GetMnem(v1) == 'mov' and idc.GetOpnd(v1, 0) == 'esi':
                    if idc.GetMnem(v2) == 'pop' and idc.GetOpnd(v2, 0) == 'ecx':
                        if idc.GetMnem(v3) == 'lea' and idc.GetOpnd(v3, 0) == 'edi':
                            if idc.GetDisasm(v4) == 'rep movsd':
                                print hex(v1)
                                addr = idc.GetOperandValue(v1, 1)
                                print hex(addr)
                                data = ""
                                while Byte(addr) != 0x0:
                                    data += chr(Byte(addr))
                                    addr += 1
                                print data
                                decoded = decode(data)
                                url += decoded
            except IndexError:
       # Sliding window went past the end of the function
                None
     
    current_file = idaapi.get_root_filename()
    print current_file
    print ''.join(url)
    f = open("D:/Books/output.txt", 'ab')
    if url != "":
        f.write("[+] {0} : {1}\n".format(current_file, ''.join(url)))
    f.close()


if __name__=="__main__":   
    idaapi.autoWait()
    main()
    idc.Exit(0) 

Among them, FUNC_THUNK
indicates whether this function is a thunk function, and the thunk function indicates a simple jump function. For example,
.text:1A710606 Process32Next proc near
.text:1A710606 jmp ds:__imp_Process32Next
.text:1A710606 Process32Next endp
filters two function attributes: FUNC_THUNK and FUNC_LIB. For
details, see the ten flags of functions in "IDAPython-Book Translation byfoyjog.pdf" Introduction

Potential tapping

The potential of IDApython is shown inSimple implementation of batch processing to decrypt a large number of samples

1. Expand the decryption method to standardize it .
The decode function in IDApython needs to be written by yourself, and all decryption functions can be integrated into one python. When actually programming, security researchers only need to pass in IDApython's main function parameters to call directly. Python's Pycrypto, cryptography, and base64 libraries have all implemented mature encryption algorithms, including base64 decoding, CRC32 algorithm to make hash enumeration, ROR13 algorithm, RC4 algorithm, etc.
Of course, you can write the decryption method yourself, including the most common basic loops, exclusive ORs, etc., and you can write the decryption code according to your needs.

2. Identify the characteristic instructions before fetching the decrypted data

Obviously, being able to locate and extract the HEX data address embedded in the PE plays an important role in the decryption process. If the code before and after the data has obvious characteristics, it is more convenient, and the effect is more obvious when decrypting family samples in batches. For example
, the code characteristics in this example are as follows:

mov esi,xxx          #此处xxx获取内嵌在PE里的HEX数据
pop ecx
lea edi,xxx
rep movsd

Each family has different data fetching code characteristics, most of which should use mov to fetch global variables in the .data section. Here you can directly pass the parameter to the main function of idat64.exe to get the address of the data, but this does not identify the address of other samples to get the data. Therefore, the unique characteristics of the sample must be used, plus the assembly address of the data, it is recommended that the yara rule + the feature assembly method as in the above example locate the location of the data to be decrypted. Code features are provided to idapython in a specific format (such as json) for reading.

3. This method may save time when dealing with APT attacks, and has a certain speed and simplicity. Because the sample may be encrypted using the encrypted URL address method that has been used. Then once the encryption algorithm that has been used is used, the URL can be decrypted directly from the PE according to the decryption algorithm function written before. Or exhaust all the previously written decryption methods to decrypt the data, once the decrypted data appears http://, it can be determined that the decryption is successful.

Effect picture

Use idat64.exe batch to decrypt the Cmstar family URL:
Insert picture description here

references

https://unit42.paloaltonetworks.com/unit42-using-idapython-to-make-your-life-easier-part-1/
https://unit42.paloaltonetworks.com/unit42-using-idapython-to-make-your-life-easier-part-2/
https://unit42.paloaltonetworks.com/unit42-using-idapython-to-make-your-life-easier-part-6/

Guess you like

Origin blog.csdn.net/qq_43312649/article/details/110055723