java parses txt file according to byte length

In daily development, sometimes it is necessary to parse the data files pushed by third-party files. The format agreed by both parties is GBK encoding, one line represents one record, and the bytes are stored and obtained in a fixed-length manner, so substring cannot be used. The characters are intercepted, and the specific implementation method can directly look at the code:

/**
     * Parse third-party incoming file
     * 
     * @param filePath
     * Incoming file path
     * @throws Exception
     */
    public static void parseFile(String filePath)
            throws Exception {
       
        try {
            File file = new File(filePath);
            InputStream is = new FileInputStream(file);
            BufferedReader br = new BufferedReader(new InputStreamReader(is,Charset.forName("GBK")));
            String line = "";
            while ((line = br.readLine()) != null) {
                // customer name - 20 digits - intercept 6-25
                String cifName= StringCommonUtil.substringByte(line,6, 19).trim();
                // ID number - 18 digits - intercept 31-48
                String blackListType = StringCommonUtil.substringByte(line,31, 17).trim();
              / /todo other business processing
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
     
    }

Among them, this sentence encodes the bytes of the file input stream (if other encodings are agreed, they can be uniformly replaced with other formats)

BufferedReader br = new BufferedReader(new InputStreamReader(is,Charset.forName("GBK")));

The method of byte interception is as follows:

/**
     * Intercept the string by byte, specify the starting byte position and length of the intercepted byte
     * 
     * @param orignal
     * The string to be intercepted
     * @param offset
     * The length of the intercepted Byte;
     * @return the intercepted character String
     * @throws UnsupportedEncodingException
     * An encoding format not supported by JAVA is used
     */
    public static String substringByte(String orignal, int start, int count) {

        // If the target string is empty, return directly without entering the interception logic;
        if (orignal == null || "".equals(orignal))
            return orignal;

        // The length of the intercepted Byte must be >0
        if (count <= 0)
            return orignal;

        // The number of start bytes to intercept must be greater than
        if (start < 0)
            start = 0;

        // Target char Pull buff buffer area;
        StringBuffer buff = new StringBuffer();

        try {
            // If the starting byte position of the intercepted byte is greater than the length of the Byte of the target String, then return a null value
            if (start >= getStringByteLenths(orignal))
                return null;
            int len ​​= 0;
            char c;
            // Traverse each string of String A Char character, calculate the current total length
            // If the byte length to the current Char is greater than the total length of the characters to be intercepted, jump out of the loop and return to the intercepted string.
            for (int i = 0; i < orignal.toCharArray().length; i++) {
                c = orignal.charAt(i);

                // when the start position is 0
                if (start == 0) {
                    len += String.valueOf(c).getBytes("GBK").length;
                    if (len <= count)
                        buff.append(c);
                    else
                        break;
                } else {
                    // The intercepted string starts from a non-zero position
                    len += String.valueOf(c).getBytes("GBK").length;
                    if (len >= start && len <= start + count) {
                        buff.append(c);
                    }
                    if (len > start + count)
                        break;
                }
            }
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }
        // Return the final intercepted character result;
        // Create a String object and pass in the target char Buff object
        return new String(buff);
    }

    /**
     * Calculate the total Byte length occupied by the current String string
     * 
     * @param args
     * The string to be intercepted
     * @return The return value is int type, the byte length occupied by the string, if args is empty or "" then return 0
     * @throws UnsupportedEncodingException
     */
    public static int getStringByteLenths(String args)
            throws UnsupportedEncodingException {
        return args != null && args != "" ? args.getBytes("GBK").length : 0;
    }
 

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324124586&siteId=291194637