vuzzer specific principle analysis

table of Contents

1. Installation (installation under vmware 15.01 environment):

2. vuzzer instructions

3. The vuzzer principle

3.1 Weight file and file generation with cmp information

3.2 Vuzzer seed generation, mutation principle

3.2.1 runfuzz.py

3.2.2.gautils.py:


1. Installation (installation under vmware 15.01 environment):

Since vuzzer is a relatively long-term project and no one updates it, the environment used is relatively old. We need to install the ubuntu14.04 version of the lunu system and reduce the ubuntu14.04 system kernel to 3.13.0-24. The specific practices are as follows:

#下载3.13.0-24版的内核
sudo apt-get install linux-image-3.13.0-24-generic 
#重启
sudo reboot

Then when we enter the initial interface, press "esc" to enter the kernel selection interface, select 3.13.0-24 to enter (note that there are some problems in the interface display at this time, the full screen will cause the system to be black, so it is displayed in a small window. ), After entering, we can use uname -r to check the kernel version, and then uninstall the original kernel

sudo apt-get purge linux-image-版本
sudo apt-get purge linux-headers-版本

Next we install vuzzer

#下载vuzzer源码
git clone https://github.com/vusec/vuzzer 
gcc --version
g++ --version
#查询gcc和g++版本,若不为4.8则用以下命令安装
sudo apt-get install gcc-4.8
sudo apt-get install g++-4.8
#在官网上下载pin-2.14版本的pin,并在vuzzer中创建到pin的链接,回到vuzzer文件夹下
ln -s /path-to-pin-homes pin

python --version
#检查是否带有带有python2.7,如果未安装执行以下命令安装
sudo apt-get install python-2.7
#下载EWAHBoolArray源码
git clone https://github.com/lemire/EWAHBoolArray
#将EWAHBoolArray中headers文件夹下的4个头文件拷贝到/usr/include/目录下
sudo cp headers/* /usr/include/
#安装BitMagic
sudo apt-get install bmagic
#安装BitVector,可在https://engineering.purdue.edu/kak/dist/BitVector-2.2.html下载,解压后在BitVector目录下执行以下命令
sudo python setup.py install
#安装vuzzer,首先回到vuzzer文件夹下
export PIN_ROOT=$(pwd)/pin
cd ./support/libdft/src
make clean
#再回到vuzzer文件夹下
make support-libdft
make 
make -f mymakefile
#当我们可以找到obj-ia32/dtracker.so和obj-i32/bbcounts2.so,则说明我们已经安装成功

2. vuzzer instructions

The entry file of vuzzer is runfuzzer.py, we run python runfuzzer.py -h, the results are as follows

Among them, the parameter bit after -s: is the command line of the test program, for example -s '/ bin / a% s', pay attention to change the location of the transferred file to% s, so that the vuzzer uses the input here as a basis for vulnerability mining

The parameter after -i is the folder where the initial seed is located, for example -i 'datatemp / a /', note that there must be three or more initial seed files

The parameter after -w is the .pkl file generated by the program (that is, the program block weight file), and the parameter after -n is the .names file generated by the program (that is, the cmp instruction information file)

The parameter after -l is the number of binary files that need to be monitored, the parameter after -o is the starting address of the program or library, -band the parameter after is the name of the library to be monitored. The following will introduce how vuzzer tests a binary program

First we write a C program, the code is as follows:

#include<stdio.h>
#include<stdlib.h>
int main(int argc,char** argv)
{
	char s[30];
	FILE* fp;
	fp=fopen(argv[1],"r+");
	if(fp==NULL)
	{
		exit(1);
	}
	fscanf(fp,"%29s",s);
	if(s[0]=='W')
	{
		if(s[10]=='A')
		{
			fscanf(fp,"%s",s);
			printf("%s\n",s);
		}
		else
		{
			printf("wrong");
		}
	}
	else
	{
		printf("wrong");
	}
	return 0;
}

Use gcc ac -oa to compile and generate 32-bit files (since the vuzzer can only be tested with the command line and can only use file input, so the program is written as above, we found that there is a risk of overflow in the second fscanf)

We use ida to open the binary program a, select the script file option under file, and then select the BB-weightv4.py script file, ida will run the script to generate the program block weight file.pkl and cmp instruction information file.names, the newly generated Put the file under vuzzer / idafiles, put the program a under the vuzzer / bin file, create a new folder a under vuzzer / datatemp, and put three initial seed files. Run the command python runfuzzer.py -s './bin/a% s' -i 'datatemp / a /' -w 'idafiles / a.pkl' -n idafiles / a.names to use vuzzer normally. The operation result is shown in the figure below

 Among them, all the seed files used in the test are placed in the data directory, and the seeds that can cause crashes are placed in the outd / crashInputs directory. Some crash records are recorded in error.log, and the information of each generation is placed in status. In the log, the analysis result of cmp is placed in cmp.out.

3. The vuzzer principle

3.1 Weight file and file generation with cmp information

  • def findCMPopnds (): Use the ida interface to find the cmp instruction, read the immediate value in the cmp instruction, and convert it to [set (string), set (character)] to return
  • def get_children (BB): use the breadth-first algorithm to combine the first addresses of all subblocks that can be reached from the BB block of a function into a list and return

def calculate_weight (func, fAddr): Calculate the probability of arrival of each block according to Markov model and program control flow graph

def calculate_weight(func, fAddr):
    ''' This function calculates weight for each BB, in the given function func.
	此函数借助广度优先算法计算给定函数func中每个bb的权重。
    '''
    # We start by iterating all BBs and assigning weights to each outgoing edges.
    # we assign a weight 0 to loopback edge because it does not point (i.e., leading) to "new" BB.
    edges.clear()
    temp = deque([]) # 工作队列
    rootFound= False
    visited=[] # 已计算权重的程序快
    shadow=[]
    noorphan=True
    #先计算每个程序块到下一程序块的概率
    for block in func:
        pLen=len(list(block.succs()))
        if pLen == 0: # exit BB
            continue
        eProb=1.0/pLen #查找某程序块之后连接程序块个数n,那么每个程序块到下一程序块的概率1/n
        #print "probability = %3.1f"%(eProb,), eProb
        for succBB in block.succs():
            if (succBB.startEA <= block.startEA) and (len(list(succBB.preds()))>1):
                #this is for backedge. this is not entirely correct as BB which are shared or are at lower
                #addresses are tagged as having zero value!! TO FIX.,
		#在控制流图中国前一个程序块首地址比后一个程序块首地址大,说明可能存在循环,重新给定该程序块到下一程序块概率
                edges[(block.startEA,succBB.startEA)]=1.0
            else:
                edges[(block.startEA,succBB.startEA)]=eProb
    print "[*] Finished edge probability calculation"
    #for edg in edges:
        #print " %x -> %x: %3.1f "%(edg[0],edg[1],edges[edg])
    # lets find the root BB
    #orphanage=[]#home for orphan BBs
    orphID=[]
    for block in func:
        if len(list(block.preds())) == 0:
        #Note: this only check was not working as there are orphan BB in code. Really!!!
		#注意:由于代码中有孤立BB,所以此唯一检查不起作用。真的?!!!\
            if block.startEA == fAddr:
                rootFound=True
                root = block
            else:
                if rootFound==True:
                    noorphan=False
                    break
                pass
    #now, all the BBs should be children of root node and those that are not children are orphans. This check is required only if we have orphans.
	#现在,所有bbs都应该是根节点的子级,而那些不是子级的bbs都是孤立的。只有当我们有孤儿时才需要这张支票。
    if noorphan == False:
        rch=get_children(root)
        rch.append(fAddr)# add root also as a non-orphan BB
        for blk in func:
            if blk.startEA not in rch:
                weight[blk.startEA]=(1.0,blk.endEA)
                visited.append(blk.id)
                orphID.append(blk.id)
        #print "[*] orphanage calculation done."
        del rch
    #程序块概率计算,为其前置程序块概率乘以从前置程序块到该程序块的概率求和
    if rootFound==True:
        #print "[*] found root BB at %x"%(root.startEA,)
        weight[root.startEA] = (1.0,root.endEA)
        visited.append(root.id)
        print "[*] Root found. Starting weight calculation."
        for sBlock in root.succs():
            #if sBlock.id not in shadow:
            #print "Pushing successor %x"%(sBlock.startEA,)
            temp.append(sBlock)
            shadow.append(sBlock.id)
        loop=dict()# this is a temp dictionary to avoid get_children() call everytime a BB is analysed.
        while len(temp) > 0:
            current=temp.popleft()
            shadow.remove(current.id)
            print "current: %x"%(current.startEA,)
            if current.id not in loop:
                loop[current.id]=[]
            # we check for orphan BB and give them a lower score
            # by construction and assumptions, this case should not hit!
			#我们检查孤立的BB并通过构造和假设给他们一个较低的分数,这种情况不应该发生!
            if current.id in orphID:
                #weight[current.startEA]=(0.5,current.endEA)
                #visited.append(current.id)
                continue

            tempSum=0.0
            stillNot=False
            chCalculated=False
            for pb in current.preds():
                #print "[*] pred of current %x"%(pb.startEA,)
                if pb.id not in visited:
                    if edges[(pb.startEA,current.startEA)]==0.0:
                        weight[pb.startEA]=(0.5,pb.endEA)
                        #artificial insertion
                        #print "artificial insertion branch"
						#人工插入分支
                        continue
                    #当前置程序块没有概率,那么查找其是不是在该程序块之后会运行到,如果是,说明存在循环,则提升其概率为0.5
                    if pb.id not in [k[0] for k in loop[current.id]]:
                        if chCalculated == False:
                            chCurrent=get_children(current)
                            chCalculated=True
                        if pb.startEA in chCurrent:
                            # this BB is in a loop. we give less score to such BB
                            weight[pb.startEA]=(0.5,pb.endEA)
                            loop[current.id].append((pb.id,True))
                            #print "loop branch"
                            continue
                        else:
                            loop[current.id].append((pb.id,False))
                    else:
                        if (pb.id,True) in loop[current.id]:
                            weight[pb.startEA]=(0.5,pb.endEA)
                            continue
                            
                    #print "not pred %x"%(pb.startEA,)
                    if current.id not in shadow:
                        temp.append(current)
                        #print "pushed back %x"%(current.startEA,)
                        shadow.append(current.id)
                    stillNot=True
                    break
            #计算程序块概率,为其前置程序块概率乘以从前置程序块到该程序块的概率求和,
            if stillNot == False:
                # as we sure to get weight for current, we push its successors
                for sb in current.succs():
                    if sb.id in visited:
                        continue
                    if sb.id not in shadow:
                        temp.append(sb)
                        shadow.append(sb.id)
                for pb in current.preds():
                    tempSum = tempSum+ (weight[pb.startEA][0]*edges[(pb.startEA,current.startEA)])
                weight[current.startEA] = (tempSum,current.endEA)
                visited.append(current.id)
                del loop[current.id]
                print "completed %x"%(current.startEA,)

def analysis (): Divide the program into functions and generate a control flow graph for each function, enter def calculate_weight (func, fAddr) to calculate the weight

def main()

def main():
    strings=[]
    start = timeit.default_timer()
    #获得么个程序块的概率
    analysis()
    #获得cmp的信息
    strings=findCMPopnds()
    stop = timeit.default_timer()
    #每个程序块的权重=1/概率,返回(程序块开始指令位置:(程序块权重,程序块结束后一条指令位置)
    for bb in weight:
        fweight[bb]=(1.0/weight[bb][0],weight[bb][1])
    print"[**] Printing weights..."
    for bb in fweight:
        print "BB [%x-%x] -> %3.2f"%(bb,fweight[bb][1],fweight[bb][0])
    print " [**] Total Time: ", stop - start
    print "[**] Total functions analyzed: %d"%(fCount,)
    print "[**] Total BB analyzed: %d"%(len(fweight),)
    outFile=GetInputFile() # name of the that is being analysed
    strFile=outFile+".names"
    outFile=outFile+".pkl"
    fd=open(outFile,'w')
    #将程序权重放在.pkl文件中
    pickle.dump(fweight,fd)
    fd.close()
    strFD=open(strFile,'w')
    #将程序cmp信息放在.name文件中
    pickle.dump(strings,strFD)
    strFD.close()
    print "[*] Saved results in pickle files: %s, %s"%(outFile,strFile)

3.2 Vuzzer seed generation, mutation principle

This part of the function is mainly implemented by runfuzz.py, gautils.py, operators.py, below we will look at the principle

3.2.1 runfuzz.py

  • def main():
def main():
    check_env()
    将命令行的指令拆解放入配置的变量中
    parser = argparse.ArgumentParser(description='VUzzer options')
    parser.add_argument('-s','--sut', help='SUT commandline',required=True)
    parser.add_argument('-i','--inputd', help='seed input directory (relative path)',required=True)
    parser.add_argument('-w','--weight', help='path of the pickle file(s) for BB wieghts (separated by comma, in case there are two) ',required=True)
	#
    parser.add_argument('-n','--name', help='Path of the pickle file(s) containing strings from CMP inst (separated by comma if there are two).',required=True)
    parser.add_argument('-l','--libnum', help='Nunber of binaries to monitor (only application or used libraries)',required=False, default=1)
    parser.add_argument('-o','--offsets',help='base-address of application and library (if used), separated by comma', required=False, default='0x00000000')
    parser.add_argument('-b','--libname',help='library name to monitor',required=False, default='')
    args = parser.parse_args()
    config.SUT=args.sut
    config.INITIALD=os.path.join(config.INITIALD, args.inputd)
    config.LIBNUM=int(args.libnum)
    config.LIBTOMONITOR=args.libname
    config.LIBPICKLE=[w for w in args.weight.split(',')]
    config.NAMESPICKLE=[n for n in args.name.split(',')]
    config.LIBOFFSETS=[o for o in args.offsets.split(',')]
    ih=config.PINCMD.index("#") # this is just to find the index of the placeholder in PINCMD list to replace it with the libname,这只是为了在pincmd列表中找到占位符的索引,用libname替换它。
    config.PINCMD[ih]=args.libname


    ###################################

    config.minLength=get_min_file(config.INITIALD)
    #对文件中清空操作
    try:
        shutil.rmtree(config.KEEPD)
    except OSError:
        pass
    os.mkdir(config.KEEPD)
    
    try:
        os.mkdir("outd")
    except OSError:
        pass
    
    try:
        os.mkdir("outd/crashInputs")
    except OSError:
        gau.emptyDir("outd/crashInputs")

    crashHash=[]
    try:
        os.mkdir(config.SPECIAL)
    except OSError:
        gau.emptyDir(config.SPECIAL)
    
    try:
        os.mkdir(config.INTER)
    except OSError:
        gau.emptyDir(config.INTER)
	
    ###### open names pickle files,打开名称pickle文件
    将.pkl和.names文件的内容读入
    gau.prepareBBOffsets()
    if config.PTMODE:
        pt = simplept.simplept()
    else:
        pt = None
    if config.ERRORBBON==True:
        #检查程序中错误处理的程序块
        gbb,bbb=dry_run()
    else:
        gbb=0
   # gau.die("dry run over..")
    import timing
    #selftest()
    noprogress=0
    currentfit=0
    lastfit=0
    
    config.CRASHIN.clear()
    stat=open("stats.log",'w')
    stat.write("**** Fuzzing started at: %s ****\n"%(datetime.now().isoformat('+'),))
    stat.write("**** Initial BB for seed inputs: %d ****\n"%(gbb,))
    stat.flush()
    os.fsync(stat.fileno())
    stat.write("Genaration\t MINfit\t MAXfit\t AVGfit MINlen\t Maxlen\t AVGlen\t #BB\t AppCov\t AllCov\n")
    stat.flush()
    os.fsync(stat.fileno())
    starttime=time.clock()
    allnodes = set()
    alledges = set()
    try:
        shutil.rmtree(config.INPUTD)
    except OSError:
        pass
    shutil.copytree(config.INITIALD,config.INPUTD)
    # fisrt we get taint of the intial inputs
    在data目录下生成初始种子文件
    get_taint(config.INITIALD)
    
    print "MOst common offsets and values:", config.MOSTCOMMON
    #gg=raw_input("press enter to continue..")
    config.MOSTCOMFLAG=True
    crashhappend=False
    filest = os.listdir(config.INPUTD)
    filenum=len(filest)
    if filenum < config.POPSIZE:
        gau.create_files(config.POPSIZE - filenum)
    
    if len(os.listdir(config.INPUTD)) != config.POPSIZE:
        gau.die("something went wrong. number of files is not right!")

    efd=open(config.ERRORS,"w")
    gau.prepareBBOffsets()
    writecache = True
    genran=0
    bbslide=10 # this is used to call run_error_BB() functions
    keepslide=3
    keepfilenum=config.BESTP
    使用遗传变异的算法生成种子并运行fuzz
    while True:
        print "[**] Generation %d\n***********"%(genran,)
        del config.SPECIALENTRY[:]
        del config.TEMPTRACE[:]
        del config.BBSEENVECTOR[:]
        config.SEENBB.clear()
        config.TMPBBINFO.clear()
        config.TMPBBINFO.update(config.PREVBBINFO)
        
        fitnes=dict()
        execs=0
        config.cPERGENBB.clear()
        config.GOTSTUCK=False
       
        if config.ERRORBBON == True:
            if genran > config.GENNUM/5:
                bbslide = max(bbslide,config.GENNUM/20)
                keepslide=max(keepslide,config.GENNUM/100)
                keepfilenum=keepfilenum/2
        #config.cPERGENBB.clear()
        #config.GOTSTUCK=False
            if 0< genran < config.GENNUM/5 and genran%keepslide == 0:
                copy_files(config.INPUTD,config.KEEPD,keepfilenum)
                
        #lets find out some of the error handling BBs,让我们找出一些错误处理bbs
            if  genran >20 and genran%bbslide==0:
                stat.write("\n**** Error BB cal started ****\n")
                stat.flush()
                os.fsync(stat.fileno())
                run_error_bb(pt)
                copy_files(config.KEEPD,config.INPUTD,len(os.listdir(config.KEEPD))*1/10)
            #copy_files(config.INITIALD,config.INPUTD,1)
        files=os.listdir(config.INPUTD)
        #将种子文件代入程序中运行,看是否有bug产生且计算每个种子文件的权重
        for fl in files:
                将种子文件逐个加入命令行运行,并将运行结果返回
                tfl=os.path.join(config.INPUTD,fl)
                iln=os.path.getsize(tfl)
                args = (config.SUT % tfl).split(' ')
                progname = os.path.basename(args[0])
                #print ''
                #print 'Input file sha1:', sha1OfFile(tfl)
                #print 'Going to call:', ' '.join(args)
                (bbs,retc)=execute(tfl)
                #计算权重
                if config.BBWEIGHT == True:
                    fitnes[fl]=gau.fitnesCal2(bbs,fl,iln)
                else:
                    fitnes[fl]=gau.fitnesNoWeight(bbs,fl,iln)

                execs+=1
                #当种子文件引发程序漏洞执行后面的程序
                if retc < 0 and retc != -2:
                    print "[*]Error code is %d"%(retc,)
                    efd.write("%s: %d\n"%(tfl, retc))
                    efd.flush()
                    os.fsync(efd)
                    tmpHash=sha1OfFile(config.CRASHFILE)
                    #将种子文件放入crashInputs文件夹和special文件夹中
                    if tmpHash not in crashHash:
                            crashHash.append(tmpHash)
                            tnow=datetime.now().isoformat().replace(":","-")
                            nf="%s-%s.%s"%(progname,tnow,gau.splitFilename(fl)[1])
                            npath=os.path.join("outd/crashInputs",nf)
                            shutil.copyfile(tfl,npath)
                            shutil.copy(tfl,config.SPECIAL)
                            config.CRASHIN.add(fl)
                    #打开STOPONCRASH选项,fuzz会在第一次发现bug的时候崩溃
                    if config.STOPONCRASH == True:
                        #efd.close()
                        crashhappend=True
                        break
        计算种子文件大小和分数的一些信息
        fitscore=[v for k,v in fitnes.items()]
        maxfit=max(fitscore)
        avefit=sum(fitscore)/len(fitscore)
        mnlen,mxlen,avlen=gau.getFileMinMax(config.INPUTD)
        print "[*] Done with all input in Gen, starting SPECIAL. \n"
        #### copy special inputs in SPECIAL directory and update coverage info ###
        spinputs=os.listdir(config.SPECIAL)
        #将上轮中覆盖率小于本轮的新种子的种子文件删除
        for sfl in spinputs:
                if sfl in config.PREVBBINFO and sfl not in config.TMPBBINFO:
                        tpath=os.path.join(config.SPECIAL,sfl)
                        os.remove(tpath)
                        if sfl in config.TAINTMAP:
                            del config.TAINTMAP[sfl]
        config.PREVBBINFO=copy.deepcopy(config.TMPBBINFO)
        spinputs=os.listdir(config.SPECIAL)
        将本次覆盖率更高的种子文件放入
        for inc in config.TMPBBINFO:
                config.SPECIALENTRY.append(inc)
                if inc not in spinputs:
                        incp=os.path.join(config.INPUTD,inc)
                        shutil.copy(incp,config.SPECIAL)
                        #del fitnes[incp]
        计算本次fuzz的代码覆盖率
        appcov,allcov=gau.calculateCov()
        stat.write("\t%d\t %d\t %d\t %d\t %d\t %d\t %d\t %d\t %d\t %d\n"%(genran,min(fitscore),maxfit,avefit,mnlen,mxlen,avlen,len(config.cPERGENBB),appcov,allcov))
        stat.flush()
        os.fsync(stat.fileno())
        print "[*] Wrote to stat.log\n"
        if crashhappend == True:
            break
        #lets find out some of the error handling BBs
        #if genran >20 and genran%5==0:
         #   run_error_bb(pt)
        genran += 1
        #this part is to get initial fitness that will be used to determine if fuzzer got stuck.
        #查看种子的分数是否提升,如果二十轮都没有改变则说明种子卡死
        lastfit=currentfit
        currentfit=maxfit
        if currentfit==lastfit:#lastfit-config.FITMARGIN < currentfit < lastfit+config.FITMARGIN:
            noprogress +=1
        else:
            noprogress =0
        if noprogress > 20:
            config.GOTSTUCK=True
            stat.write("Heavy mutate happens now..\n")
            noprogress =0
        if (genran >= config.GENNUM) and (config.STOPOVERGENNUM == True):
            break
        # copy inputs to SPECIAL folder (if they do not yet included in this folder
        #spinputs=os.listdir(config.SPECIAL)
        #for sfl in spinputs:
        #        if sfl in config.PREVBBINFO and sfl not in config.TMPBBINFO:
        #                tpath=os.path.join(config.SPECIAL,sfl)
        #                os.remove(tpath)
        #config.PREVBBINFO=copy.deepcopy(config.TMPBBINFO)
        #spinputs=os.listdir(config.SPECIAL)
        #for inc in config.TMPBBINFO:
        #        config.SPECIALENTRY.append(inc)
        #        if inc not in spinputs:
        #                incp=os.path.join(config.INPUTD,inc)
        #                shutil.copy(incp,config.SPECIAL)
        #                #del fitnes[incp]
        #使用special中的种子文件查看cmp指令比较信息的结果
        if len(os.listdir(config.SPECIAL))>0:
            if len(os.listdir(config.SPECIAL))<config.NEWTAINTFILES:
                get_taint(config.SPECIAL)
            else:
                try:
                    os.mkdir("outd/tainttemp")
                except OSError:
                    gau.emptyDir("outd/tainttemp")
                if conditional_copy_files(config.SPECIAL,"outd/tainttemp",config.NEWTAINTFILES) == 0:
                    get_taint("outd/tainttemp")
            #print "MOst common offsets and values:", config.MOSTCOMMON
            #gg=raw_input("press any key to continue..")
        print "[*] Going for new generation creation.\n" 
        #生成新一代的种子
        gau.createNextGeneration3(fitnes,genran)
        #raw_input("press any key...")

    efd.close()
    stat.close()
    libfd_mm.close()
    libfd.close()
    endtime=time.clock()
    
    print "[**] Totol time %f sec."%(endtime-starttime,)
    print "[**] Fuzzing done. Check %s to see if there were crashes.."%(config.ERRORS,)
  • def dry_run (): Get error handling block
def dry_run():
    ''' this function executes the initial test set to determine error handling BBs in the SUT. Such BBs are given zero weights during actual fuzzing.
    此函数执行初始测试集以确定SUT中的错误处理BBS。这种BBS在实际过程中被赋予零权重。
'''
    '''将程序正常运行和程序不正常运行时候经过的程序块输出。'''
    print "[*] Starting dry run now..."
    tempbad=[]
    dfiles=os.listdir(config.INITIALD)
    if len(dfiles) <3:
        gau.die("not sufficient initial files")
    '''基于初始种子运行程序,标记正常运行的一些程序块'''
    for fl in dfiles:
        tfl=os.path.join(config.INITIALD,fl)
        try:
            f=open(tfl, 'r')
            f.close()
        except:
            gau.die("can not open our own input %s!"%(tfl,))
        (bbs,retc)=execute(tfl)
        if retc < 0:
            gau.die("looks like we already got a crash!!")
        config.GOODBB |= set(bbs.keys())
    print "[*] Finished good inputs (%d)"%(len(config.GOODBB),)
    #now lets run SUT of probably invalid files. For that we need to create them first.
     
    #现在让我们运行可能无效文件的SUT。为此,我们需要先创建它们。
    print "[*] Starting bad inputs.."
    lp=0
    badbb=set()
    while lp <2:
        try:
                shutil.rmtree(config.INPUTD)
        except OSError:
                pass

        os.mkdir(config.INPUTD)
        #生成一些随机字符作为一些种子文件作为测试
        gau.create_files_dry(30)
        dfiles=os.listdir(config.INPUTD)
        #当运行到一些之前没有经过的程序块,那么就是错误处理的程序块
        for fl in dfiles:
            tfl=os.path.join(config.INPUTD,fl)
            (bbs,retc)=execute(tfl)
            if retc < 0:
                gau.die("looks like we already got a crash!!")
            tempbad.append(set(bbs.keys()) - config.GOODBB)
            
        tempcomn=set(tempbad[0])
        for di in tempbad:
            tempcomn.intersection_update(set(di))
        badbb.update(tempcomn)
        lp +=1
    #else:
    #  tempcomn = set()
    ###print "[*] finished bad inputs (%d)"%(len(tempbad),)
    config.ERRORBBALL=badbb.copy()
    print "[*] finished common BB. TOtal such BB: %d"%(len(badbb),)
    for ebb in config.ERRORBBALL:
        print "error bb: 0x%x"%(ebb,)
    time.sleep(5)
    if config.LIBNUM == 2:
        baseadr=config.LIBOFFSETS[1]
        for ele in tempcomn:
            if ele < baseadr:
                config.ERRORBBAPP.add(ele)
            else:
                config.ERRORBBLIB.add(ele-baseadr)
                         
    del tempbad
    del badbb
    #del tempgood
    将正确的程序块首地址写入GOODBB中,将错误的程序块首地址写入ERRORBBALL中,返回
    return len(config.GOODBB),len(config.ERRORBBALL)
  • def read_taint (fpath): return the cmp information encountered by the current seed file
  • def get_taint (dirin): Get the information that the seed passes the cmp instruction when the program is running, put it in config.TAINTMAP, and put the information of the cmp instruction that each seed file has in config.MAXOFFSET

3.2.2.gautils.py

  • def create_files_dry (num): Use the initial seed file in the datatemp directory as the initial file, and use the totally_random function in the class ga to generate a random-length string. The function parameters are not useful
  • def create_files (num): first-generation generation
def create_files(num):
    ''' This function creates num number of files in the input directory. This is called if we do not have enough initial population.
    Addition: once a new file is created by mutation/cossover, we query MOSTCOMMON dict to find offsets that replace values at those offsets in the new files. Int he case of mutation, we also use taintmap of the parent input to get other offsets that are used in CMP and change them. For crossover, as there are two parents invlived, we cannot query just one, so we do a random change on those offsets from any of the parents in resulting children.
    此函数在输入目录中创建num个文件。如果没有足够的初始数量将会被调用。
    另外:一旦mutation/cossover创建了一个新文件,我们将查询mostcommon dict以查找在新文件中替换这些偏移值的偏移量。在突变的情况下,我们还使用父输入的污染图来获取CMP中使用的其他偏移并更改它们。对于交叉,因为有两个父对象是反向的,所以我们不能只查询一个,所以我们对这些偏移量从产生子对象的任何父对象进行随机更改。
''' 
    #files=os.listdir(config.INPUTD)
    files=os.listdir(config.INITIALD)
    #初始化operators类,注意这里将cmp比较信息,即config.ALLSTRINGS作为参数传入
    ga=operators.GAoperator(random.Random(),config.ALLSTRINGS)
    while (num != 0):
        当满足该条件,将选择两个种子文件做交叉
        if random.uniform(0.1,1.0)>(1.0 - config.PROBCROSS) and (num >1):
            #we are going to use crossover, so we get two parents.
            par=random.sample(files, 2)
            bn, ext = splitFilename(par[0])
            #fp1=os.path.join(config.INPUTD,par[0])
            #fp2=os.path.join(config.INPUTD,par[1])
            fp1=os.path.join(config.INITIALD,par[0])
            fp2=os.path.join(config.INITIALD,par[1])
            p1=readFile(fp1)
            p2=readFile(fp2)
            #完成交叉
            ch1,ch2 = ga.crossover(p1,p2)
            # now we make changes according to taintflow info.
            #将一些污染的信息加入
            ch1=taint_based_change(ch1,par[0])
            ch2=taint_based_change(ch2,par[1])
            np1=os.path.join(config.INPUTD,"ex-%d.%s"%(num,ext))
            np2=os.path.join(config.INPUTD,"ex-%d.%s"%(num-1,ext))
            writeFile(np1,ch1)
            writeFile(np2,ch2)
            num -= 2
        #当满足该条件时,将对单个文件做变异
        else:
            fl=random.choice(files)
            bn, ext = splitFilename(fl)
            #fp=os.path.join(config.INPUTD,fl)
            fp=os.path.join(config.INITIALD,fl)
            p1=readFile(fp)
            #随机选择一种策略对种子做变异
            ch1= ga.mutate(p1,fl)
            ch1=taint_based_change(ch1,fl)
            np1=os.path.join(config.INPUTD,"ex-%d.%s"%(num,ext))
            writeFile(np1,ch1)
            num -= 1
    return 0
  • def prepareBBOffsets (): add the comparison information in .names to config.ALLSTRINGS, add the weight in .pkl to config.ALLBB, config.cAPPBB
  • def fitnesCal2 (bbdict, cinput, ilen): record the seeds that can find new blocks in TMPBBINFO, and record all the blocks found in cPERGENBB, calculate the seed score: score = number of blocks found by the seed * \sumlog (block weight) * log (the number of times the block has passed through the seed run)
  • def calculateCov (): Coverage calculation
  • def createNextGeneration3 (fit, gn): next generation generation
def createNextGeneration3(fit,gn):
    ''' this funtion generates new generation. This is the implemntation of standard ilitism approach. We are also addressing "input bloating" issue  by selecting inputs based on its length. the idea is to select inputs for crossover their lenths is less than the best input's length. Oterwise, such inputs directly go for mutation whereby having a chance to reduce their lengths.'''
    '''产生新一代'''
    files=os.listdir(config.INPUTD)
    #初始化operators类
    ga=operators.GAoperator(random.Random(),config.ALLSTRINGS)
    sfit=sorted(fit.items(),key=itemgetter(1),reverse=True)
    bfp=os.path.join(config.INPUTD,sfit[0][0])
    bestLen=os.path.getsize(bfp)
    fitnames=[k for k,v in sfit]
    # as our selection policy requires that each input that trigerred a new BB must go to the next generation, we need to find a set of BEST BBs and merge it with this set of inputs.
    best=set(fitnames[:config.BESTP])#.union(set(config.SPECIALENTRY))
    #best.update(config.CRASHIN)
    #print "best",best, len(best)
    if len(best)%2 !=0:
        for nm in fitnames:
            if nm not in best:
                best.add(nm)
                break
   
    if config.GOTSTUCK==True:
        heavyMutate(config.INPUTD,ga,best)
    #here we check for file length and see if we can reduce lengths of some.
    #降低种子字符长度
    if gn%config.skipGen ==0:
        mn,mx,avg=getFileMinMax(config.INPUTD)
        filesTrim(config.INPUTD,avg,bestLen,config.minLength,ga, best)
    i=0
    bn, ext = splitFilename(sfit[i][0])
    #limit=config.POPSIZE - config.BESTP
    limit=config.POPSIZE - len(best)
    #print "nextgen length %d - %d\n"%(limit, len(best))
    #raw_input("enter key")
    crashnum=0 #this variable is used to count new inputs generated with crashing inputs. 
    emptyDir(config.INTER)
    copyd2d(config.SPECIAL,config.INTER)
    if config.ERRORBBON==True:
        copyd2d(config.INITIALD,config.INTER)
    while i< limit:
        #选择进入遗传的上一代种子
        cutp=int(random.uniform(0.4,0.8)*len(fitnames))
        #we are going to use crossover s.t. we want to choose best parents frequently, but giving chance to less fit parents also to breed. the above cut gives us an offset to choose parents from. Note that last 10% never get a chance to breed.
        #print "crossover"
        par=random.sample(fitnames[:cutp], 2)
        fp1=os.path.join(config.INPUTD,par[0])
        fp2=os.path.join(config.INPUTD,par[1])
        inpsp=os.listdir(config.INTER)
        #if len(config.SPECIALENTRY)>0 and random.randint(0,9) >6:
        #    fp1=os.path.join(config.INPUTD,random.choice(config.SPECIALENTRY))
        #if len(config.CRASHIN)>0 and random.randint(0,9) >4 and crashnum<5:
        #    fp2=os.path.join(config.INPUTD,random.choice(config.CRASHIN))
        #    crashnum += 1
        sin1='xxyy'
        sin2='yyzz'
        if len(inpsp)>0:
            if random.randint(0,9) >config.SELECTNUM:
                sin1=random.choice(inpsp)
                fp1=os.path.join(config.INTER,sin1)
            if random.randint(0,9) >config.SELECTNUM:
                sin2=random.choice(inpsp)
                fp2=os.path.join(config.INTER,sin2)
        np1=os.path.join(config.INPUTD,"new-%d-g%d.%s"%(i,gn,ext))
        np2=os.path.join(config.INPUTD,"new-%d-g%d.%s"%(i+1,gn,ext))
        p1=readFile(fp1)
        p2=readFile(fp2)
        #当上一代种子长度过长,将不做交叉,直接使用create中优秀的种子做变异
        if (len(p1) > bestLen) or (len(p2) > bestLen):
            #print "no crossover"
            #mch1= ga.mutate(p1)
            if sin1 != 'xxyy':
                mch1= ga.mutate(p1,sin1)
                mch1=taint_based_change(mch1,sin1)
            else:
                mch1= ga.mutate(p1,par[0])
                mch1=taint_based_change(mch1,par[0])
            #mch2= ga.mutate(p2)
            if sin2 !='yyzz':
                mch2= ga.mutate(p2,sin2)
                mch2=taint_based_change(mch2,sin2)
            else:
                mch2= ga.mutate(p2,par[1])
                mch2=taint_based_change(mch2,par[1])
            if len(mch1)<3 or len(mch2)<3:
                die("zero input created")
            writeFile(np1,mch1)
            writeFile(np2,mch2)
            i+=2
            #continue
        #先对选出的两个种子做交叉,然后使用create中优秀的种子做变异
        else:
            #print "crossover"
            ch1,ch2 = ga.crossover(p1,p2)
            #now we do mutation on these children, one by one
            if random.uniform(0.1,1.0)>(1.0 - config.PROBMUT):
                #mch1= ga.mutate(ch1)
                if sin1 !='xxyy':
                    mch1= ga.mutate(ch1,sin1)
                    mch1=taint_based_change(mch1,sin1)
                else:
                    mch1= ga.mutate(ch1,par[0])
                    mch1=taint_based_change(mch1,par[0])
                if len(mch1)<3:
                    die("zero input created")
                writeFile(np1,mch1)
            else:
                if sin1 != 'xxyy':
                    ch1=taint_based_change(ch1,sin1)
                else:
                    ch1=taint_based_change(ch1,par[0])
                writeFile(np1,ch1)
            if random.uniform(0.1,1.0)>(1.0 - config.PROBMUT):
                #mch2= ga.mutate(ch2)
                if sin2 !='yyzz':
                    mch2= ga.mutate(ch2,sin2)
                    mch2=taint_based_change(mch2,sin2)
                else:
                    mch2= ga.mutate(ch2,par[1])
                    mch2=taint_based_change(mch2,par[1])

                if len(mch2)<3:
                    die("zero input created")
                writeFile(np2,mch2)
            else:
                if sin2 != 'yyzz':
                    ch2=taint_based_change(ch2,sin2)
                else:
                    ch2=taint_based_change(ch2,par[1])

                writeFile(np2,ch2)
            i += 2
    
    # now we need to delete last generation inputs from INPUTD dir, preserving BEST inputs.
    #best=[k for k,v in sfit][:config.BESTP]
    #print "gennext loop ",i
    #raw_input("enterkey..")
    for fl in files:
        if fl in best:
            continue
        os.remove(os.path.join(config.INPUTD,fl))
    #lets check if everything went well!!!
    if len(os.listdir(config.INPUTD))!=config.POPSIZE:
        die("Something went wrong while creating next gen inputs.. check it!")
    return 0

3.2.3.operators.py

  • def get_cut (): add the information obtained by cmp pollution to the seed
  • def mutate (): mutate a single seed
mutators = [eliminate_random, change_bytes, change_bytes,add_random, add_random, change_random,single_change_random, lower_single_random, raise_single_random, eliminate_null, eliminate_double_null, totally_random, int_slide, double_fuzz,change_random_full,change_random_full,eliminate_random,add_random, change_random]:变异策略
  
    def mutate(self, original,fl):
        result=self.r.choice(self.mutators)(self, original,fl)
        while len(result)<3:
            result= self.r.choice(self.mutators)(self, original,fl)
        assert len(result)>2, "elimination failed to reduce size %d" % (len(result),)
        return result
  •  def crossover (self, original1, original2): cross over two seeds
crossovers=[single_crossover, double_crossover]#交叉策略
def crossover(self, original1, original2):
        minlen=min(len(original1), len(original2))
        if minlen <20:
            return original1, original2 # we don't do any crossover as parents are two young to have babies ;)
        return self.r.choice(self.crossovers)(self, original1,original2)

 

Published 43 original articles · Like 23 · Visits 30,000+

Guess you like

Origin blog.csdn.net/zhang14916/article/details/100103103