几段效率超低的文本文件读取及字符串拼接切分代码

这几天准备完善下 Base64 & UUE 编码文件生成工具,发现处理大文件时,特别慢,分析了一下发现是字符串拼接和切分代码效率太低,看如下代码:

Private Sub Command1_Click()
    Dim fL As Long, enfp As Integer, defp As Integer, enfn, defn
    Dim B() As Byte, tmpstr As String, outStr As String
    Dim timx As Single
    
    timx = Timer
    enfn = Text1.Text
    defn = Text2.Text
    enfp = FreeFile
    Open enfn For Binary As #enfp
        fL = LOF(enfp)
        ReDim B(fL - 1)
        Get #enfp, , B
    Close #enfp
    tmpstr = StrConv(B, vbUnicode)
    defp = FreeFile
    Open defn For Output As #defp
        Do While Len(tmpstr) > 60
            outStr = "M" & Mid(tmpstr, 1, 60)
            tmpstr = Mid(tmpstr, 61)   '这句导致效率变低 20220522
            Print #defp, outStr
            DoEvents
        Loop
        Print #defp, tmpstr
    Close #defp
    MsgBox "处理:" & fL & " 字节用时:" & Timer - timx & " 秒"
End Sub

编码结果得到的字符串,切分为固定长度时,这句:

tmpstr = Mid(tmpstr, 61)   '这句导致效率变低 20220522

本意是将切过剩下的字符串取出来,在字符串短的时候没什么影响,但是字符串长度增加后,其速度越来越慢,于是重新想了一个办法:

Private Sub Command2_Click()
    Dim fL As Long, enfp As Integer, defp As Integer, enfn, defn
    Dim B() As Byte, tmpstr As String, outStr As String
    Dim E
    Dim timx As Single
    
    timx = Timer
    enfn = Text1.Text
    defn = Text2.Text
    enfp = FreeFile
    Open enfn For Binary As #enfp
        fL = LOF(enfp)
        ReDim B(fL - 1)
        Get #enfp, , B
    Close #enfp
    tmpstr = StrConv(B, vbUnicode)
    defp = FreeFile
    E = 1
    Open defn For Output As #defp
        Do While (fL - E) > 60
            outStr = "M" & Mid(tmpstr, E, 60)
            Print #defp, outStr
            DoEvents
            E = E + 60
        Loop
        outStr = Mid(tmpstr, E, 60)
        Print #defp, outStr
    Close #defp
    MsgBox "处理:" & fL & " 字节用时:" & Format(Timer - timx, "0.000000") & " 秒"
End Sub

只从原字符串截取指定长度字符,不再变动原字符串,效率一下子提升了几百倍(字符串越长,提升效率越大)。

’================================================================

另外,对于整个文件读取来说,原先使用的是 :Line Input  

    Open defn For Input As #defp
        Do While Not EOF(defp)
            Line Input #defp, tmpstr
            EnStr = EnStr & tmpstr
        Loop
    Close #defp

同理其中 EnStr = EnStr & tmpstr 这句字符串拼接语句也导致了读取效率超低,于是想到了使用 Adodb.Stream 来一次读取整个文件,同样的,小文件时不明显,但是对于2Mb以上的文件来说,obj.readtext 这句效率居然超低,对于8.27 MB的文件需时可达7.32秒。

Private Sub Command3_Click()
    Dim str, stm, enfn, defn
    Dim timx As Single, tmpstr As String
    timx = Timer
    
    enfn = Text1.Text
    defn = Text2.Text
    
    Set stm = CreateObject("Adodb.Stream")
    stm.Type = 2 '1 bin,2 txt
    stm.Mode = 3
    stm.Open
    stm.Charset = "GB2312"
    stm.LoadFromFile enfn
    
        str = stm.readtext     '------ 低效 7.32秒

'        str = stm.Read          '--------高效 0.015秒

    stm.Close
    Set stm = Nothing
'    tmpstr = StrConv(str, vbUnicode)
    MsgBox "完成读取文件用时:" & Timer - timx & " 秒" '& Chr(str(0))
End Sub

于是改为 Obj.Read ,发现效率立马提升近500倍。

Private Sub Command3_Click()
    Dim str, stm, enfn, defn
    Dim timx As Single, tmpstr As String
    timx = Timer
    
    enfn = Text1.Text
    defn = Text2.Text
    
    Set stm = CreateObject("Adodb.Stream")
    stm.Type = 1 '1 bin,2 txt
    stm.Mode = 3
    stm.Open
'    stm.Charset = "GB2312"
    stm.LoadFromFile enfn
    
'         str = stm.readtext     '------ 低效 7.32秒

       str = stm.Read          '--------高效    0.015秒

    stm.Close
    Set stm = Nothing
    tmpstr = StrConv(str, vbUnicode)
    MsgBox "完成读取文件用时:" & Timer - timx & " 秒" '& Chr(str(0))
End Sub

可见还是因为字符串拼接导致效率变低,同时,与我前面直接用单子节数组读取完整文件的方法比较,Adodb.Stream  Obj.Read 的效率还是低了,用之前 8.27MB的文件,以下代码已经计算不出延时,几乎为 0 了。于是更换了一个 75.7 MB 的文件,Adodb.Stream  Obj.Read 用时:0.109秒,而以下代码用时:0.023秒,可见 open 语句读取整个文件的话,效率至少是 Adodb.Stream  Obj.Read 的 4 倍。

Private Sub Command4_Click()
    Dim fL As Long, enfp As Integer, defp As Integer, enfn, defn
    Dim B() As Byte, tmpstr As String, outStr As String
    Dim timx As Single
    
    timx = Timer
    enfn = Text1.Text
    defn = Text2.Text
    enfp = FreeFile
    Open enfn For Binary As #enfp
        fL = LOF(enfp)
        ReDim B(fL - 1)     '----比 Adodb.Stream 更高效 
        Get #enfp, , B
    Close #enfp
'    tmpstr = StrConv(B, vbUnicode)
    MsgBox "完成读取文件用时:" & Format((Timer - timx), "0.000000") & " 秒"  '& Chr(B(0))
End Sub

'============================================

同时,在之前的 Base64 编码结果拼接时,原先使用的是字符直接拼接的方法(见:一个 VBS 写的 Base64 + UUE 编码程序源码,可自定义编码表_jessezappy的博客-CSDN博客): ret = ret & Chr(Base64EncMap((first \ 4) And 63)) ,全部拼接完成后返回整个字符串,也是在数据量变大后,发现其效率超级低,后来,将其改为先保存编码结果至 byte 单字节数组,

ReDim Preserve ret(retLength + 4)
ret(retLength + 1) = (Base64EncMap((first \ 4) And 63))
ret(retLength + 2) = (Base64EncMap(((first * 16) And 48) + ((second \ 16) And 15)))
ret(retLength + 3) = (Base64EncMap(((second * 4) And 60) + ((third \ 64) And 3)))
ret(retLength + 4) = (Base64EncMap(third And 63))

最后将单字节数组直接用  StrConv(ret, vbUnicode)  转换为字符串,对比效率提升了近千倍(倍率由编码数据长度决定)。

’===========================================

综上所述,字符串的拼接,裁剪,是导致以上代码效率变低的罪魁祸首。

----------此记

猜你喜欢

转载自blog.csdn.net/jessezappy/article/details/124916536
今日推荐