Several paragraphs of ultra-low-efficiency text file reading and string splicing and segmentation code

In the past few days, I am preparing to improve the Base64 & UUE encoding file generation tool. I found that it is very slow when processing large files. After analyzing it, I found that the code efficiency of string splicing and segmentation is too low. See the following code:

Private Sub Command1_Click()
    Dim fL As Long, enfp As Integer, defp As Integer, enfn, defn
    Dim B() As Byte, tmpstr As String, outStr As String
    Dim timx As Single
    
    timx = Timer
    enfn = Text1.Text
    defn = Text2.Text
    enfp = FreeFile
    Open enfn For Binary As #enfp
        fL = LOF(enfp)
        ReDim B(fL - 1)
        Get #enfp, , B
    Close #enfp
    tmpstr = StrConv(B, vbUnicode)
    defp = FreeFile
    Open defn For Output As #defp
        Do While Len(tmpstr) > 60
            outStr = "M" & Mid(tmpstr, 1, 60)
            tmpstr = Mid(tmpstr, 61)   '这句导致效率变低 20220522
            Print #defp, outStr
            DoEvents
        Loop
        Print #defp, tmpstr
    Close #defp
    MsgBox "处理:" & fL & " 字节用时:" & Timer - timx & " 秒"
End Sub

When the string obtained by the encoding result is divided into fixed lengths, this sentence:

tmpstr = Mid(tmpstr, 61)   '这句导致效率变低 20220522

The original intention is to take out the remaining string after cutting, which has no effect when the string is short, but when the length of the string increases, the speed becomes slower and slower, so I thought of a new way:

Private Sub Command2_Click()
    Dim fL As Long, enfp As Integer, defp As Integer, enfn, defn
    Dim B() As Byte, tmpstr As String, outStr As String
    Dim E
    Dim timx As Single
    
    timx = Timer
    enfn = Text1.Text
    defn = Text2.Text
    enfp = FreeFile
    Open enfn For Binary As #enfp
        fL = LOF(enfp)
        ReDim B(fL - 1)
        Get #enfp, , B
    Close #enfp
    tmpstr = StrConv(B, vbUnicode)
    defp = FreeFile
    E = 1
    Open defn For Output As #defp
        Do While (fL - E) > 60
            outStr = "M" & Mid(tmpstr, E, 60)
            Print #defp, outStr
            DoEvents
            E = E + 60
        Loop
        outStr = Mid(tmpstr, E, 60)
        Print #defp, outStr
    Close #defp
    MsgBox "处理:" & fL & " 字节用时:" & Format(Timer - timx, "0.000000") & " 秒"
End Sub

Only the specified length of characters is intercepted from the original string, and the original string is no longer changed, and the efficiency is improved hundreds of times at once (the longer the string, the greater the efficiency).

’================================================================

In addition, for the entire file reading, the original use is: Line Input  

    Open defn For Input As #defp
        Do While Not EOF(defp)
            Line Input #defp, tmpstr
            EnStr = EnStr & tmpstr
        Loop
    Close #defp

In the same way, the string splicing statement of EnStr = EnStr & tmpstr also leads to extremely low reading efficiency, so I thought of using Adodb.Stream to read the entire file at once. Similarly, it is not obvious for small files, but for more than 2Mb For the file, the efficiency of obj.readtext is extremely low, and it takes up to 7.32 seconds for a file of 8.27 MB.

Private Sub Command3_Click()
    Dim str, stm, enfn, defn
    Dim timx As Single, tmpstr As String
    timx = Timer
    
    enfn = Text1.Text
    defn = Text2.Text
    
    Set stm = CreateObject("Adodb.Stream")
    stm.Type = 2 '1 bin,2 txt
    stm.Mode = 3
    stm.Open
    stm.Charset = "GB2312"
    stm.LoadFromFile enfn
    
        str = stm.readtext     '------ 低效 7.32秒

'        str = stm.Read          '--------高效 0.015秒

    stm.Close
    Set stm = Nothing
'    tmpstr = StrConv(str, vbUnicode)
    MsgBox "完成读取文件用时:" & Timer - timx & " 秒" '& Chr(str(0))
End Sub

So it was changed to Obj.Read, and found that the efficiency immediately increased by nearly 500 times.

Private Sub Command3_Click()
    Dim str, stm, enfn, defn
    Dim timx As Single, tmpstr As String
    timx = Timer
    
    enfn = Text1.Text
    defn = Text2.Text
    
    Set stm = CreateObject("Adodb.Stream")
    stm.Type = 1 '1 bin,2 txt
    stm.Mode = 3
    stm.Open
'    stm.Charset = "GB2312"
    stm.LoadFromFile enfn
    
'         str = stm.readtext     '------ 低效 7.32秒

       str = stm.Read          '--------高效    0.015秒

    stm.Close
    Set stm = Nothing
    tmpstr = StrConv(str, vbUnicode)
    MsgBox "完成读取文件用时:" & Timer - timx & " 秒" '& Chr(str(0))
End Sub

It can be seen that the efficiency is still low due to string splicing. At the same time, compared with the method I used to directly read the complete file with a single-section array, the efficiency of Adodb.Stream Obj.Read is still low. Using the previous 8.27MB file, the following code The delay can not be calculated, it is almost 0. So a 75.7 MB file was replaced, Adodb.Stream Obj.Read took 0.109 seconds, and the following code took 0.023 seconds. It can be seen that if the open statement reads the entire file, the efficiency is at least 4 times that of Adodb.Stream Obj.Read.

Private Sub Command4_Click()
    Dim fL As Long, enfp As Integer, defp As Integer, enfn, defn
    Dim B() As Byte, tmpstr As String, outStr As String
    Dim timx As Single
    
    timx = Timer
    enfn = Text1.Text
    defn = Text2.Text
    enfp = FreeFile
    Open enfn For Binary As #enfp
        fL = LOF(enfp)
        ReDim B(fL - 1)     '----比 Adodb.Stream 更高效 
        Get #enfp, , B
    Close #enfp
'    tmpstr = StrConv(B, vbUnicode)
    MsgBox "完成读取文件用时:" & Format((Timer - timx), "0.000000") & " 秒"  '& Chr(B(0))
End Sub

'============================================

At the same time, when splicing the previous Base64 encoding results, the method of direct character splicing was originally used (see: a Base64 + UUE encoding program source code written in VBS, the encoding table can be customized_jessezappy's blog-CSDN blog ): ret = ret & Chr(Base64EncMap((first \ 4) And 63)) returns the entire string after all splicing is completed. It is also found that its efficiency is extremely low after the amount of data increases. Later, it is changed to save the encoding result to byte first single byte array,

ReDim Preserve ret(retLength + 4)
ret(retLength + 1) = (Base64EncMap((first \ 4) And 63))
ret(retLength + 2) = (Base64EncMap(((first * 16) And 48) + ((second \ 16) And 15)))
ret(retLength + 3) = (Base64EncMap(((second * 4) And 60) + ((third \ 64) And 3)))
ret(retLength + 4) = (Base64EncMap(third And 63))

Finally, the single-byte array is directly converted into a string with StrConv(ret, vbUnicode), and the comparison efficiency is increased by nearly a thousand times (the ratio is determined by the length of the encoded data).

’===========================================

To sum up, the splicing and cutting of strings is the culprit that leads to the low efficiency of the above code.

----------This note

Guess you like

Origin blog.csdn.net/jessezappy/article/details/124916536