java achieve read local txt file (administrative divisions production level marker table)

First received a URL address is http://www.zxinc.org/gb2260.htm . But this site is useless later, there should be a server problem, but fortunately I save to a local.

And put a cloud disk

Links: https://pan.baidu.com/s/1Hkf2PtRGK3dLQ50tJ1mk4g
extraction code: unon 

 

 

 

 Here I am open to that page to view the source code and copy down, so the front with a <BR>, where each row is determined by the zoning code names for the division. Here a total of 6976 line

My demand is to make this text excel, and is divided into three, put a division code, place name two, three divisions need to mark their own hierarchy based on zoning codes, such as the provincial mark 1, mark the municipal 2, numeral 3 county.

Here I do not go in the end with a set of processes, excel spreadsheet I also manually create and paste data directly up. I only need to filter out the data in the text I want and print to the console, as long as the number of rows is correct, a column of paste into excel spreadsheet on is possible. In fact, you can also write a method to write to excel, but as low efficiency, is not necessary.

Well, directly attached to the code, here I direct is written in the main method again, and I simply copy down the main method. There are not going to use part of the code, I comment it out, and so I will open when needed Notes

public static void main(String[] args) {
        File File = new new File ( " D: \\ xzqh.txt " );   // source text
 //         File File = new new File ( "D: \\ code.txt");   // Code + tally text 
        the BufferedReader br = null ;
        SB the StringBuffer = null ;
         the try {
             // apply InputStreamReader converted to character stream on the basis of the byte stream 
            br = new new the BufferedReader ( new new InputStreamReader ( new new the FileInputStream (file.getPath ()), " GBK " ));
            sb = new StringBuffer();
            String line = null;
            while ((line = br.readLine()) != null) {
                sb.append(line);
            }
        } catch (Exception e) {
            e.printStackTrace ();
        } finally {
            try {
                br.close();
            } catch (Exception e) {
                e.printStackTrace ();
            }
        }
        // all the contents of the print file 
        System.err.println ( new new String (SB));  
 //       
        // set filters
 //         String REGEX = "\\ {D}. 6";   // matching digits
 //         String = REGEX "[A-zA-the Z]";   // matching letters
 //         String REGEX = "[\ u4e00- \ u9fa5]. 1 {,}";   // matching Chinese
 //         the Pattern of Pattern.compile P = (REGEX);
 //         Matcher m = p.matcher (new new String (SB));
 //         the while (m.find ()) {
             // append
 //             String str1 = m.group ();
 //            if(str1.indexOf("省") != -1 || str1.indexOf("市") != -1 || str1.indexOf("区") != -1) {
//                System.err.println(m.group());
//            }else if(str1.indexOf("[县]") != -1) {
//                System.err.println(m.group());
//                break;
//            }else if(str1.indexOf("县") != -1 || str1.indexOf("旗") != -1 || str1.indexOf("盟") != -1) {
//                System.err.println(m.group());
//            }else if(str1.indexOf("州") != -1 || str1.indexOf("岛") != -1 || str1.indexOf("直辖行政单位") != -1) {
//                System.err.println(m.group());
//            }else if(str1.indexOf("镇") != -1 || str1.indexOf("委员会") != -1) {
//                System.err.println(m.group());
//            }else {
//                System.out.print(m.group()); 
//            }
          

        // //code处理,这段是后面加上去的
        // StringBuffer str = new StringBuffer(m.group());
        // if("A".equals(str.toString())) {
        // System.err.println(str.append("1"));
        // }else if("B".equals(str.toString())) {
        // System.err.println(str.append("2"));
        // }else if("C".equals(str.toString())) {
        // System.err.println(str.append("3"));
        // }else {
        // System.err.println(str);
        // }

// direct output
 //             System.out.println (m.group ());
             // additional processing character
 //             the StringBuffer new new STR = the StringBuffer (m.group ());
 //             String code0 str.substring = (0, 2); // taken bits 0-2
 //             String str.substring of code1 = (2,. 4); // taken two intermediate
 //             String str.substring code2 = (. 4,. 6); // 2 bits taken
 //             IF {( "00" .equals (code2)!) // county
 //                 System.out.println (str.append ( "C")); 
 //             } the else IF ( "00" .equals (! code1) && "00".equals(code2)) {//市级
//                System.out.println(str.append("B")); 
//            }else if(!"00".equals(code0) && "00".equals(code1) && "00".equals(code2)){ //省级
//                System.out.println(str.append("A")); 
//            }else {
//                System.out.println(str); 
//            }
//        }
    }

To talk about the whole idea of ​​it, the results we get are three Excel spreadsheet, so we go one one of the first to get his division code

The next step then is

1. The above code is run, note the file location is correct, the correct words should be run the following way, which is no line breaks, but this has little impact.

 

 

 Will print all the contents of that line of code commented out, because we do not need to print all, simply print out the area code; and the following note open, as

 

 

 The first column run the program again, the print output division code table, the same row 6976 (total number of lines), copying and pasting it to excel in

 

 

 2.根据行政区划做标记省级为XX0000,市级为XXX000或XXXX00,县级为XXXXXX或XXXXX0,层级标记的是数字,所以,但是区划也是数字,在这里不好区分开,所以暂时用ABC代替123,便于过滤。

将 直接输出 那行代码注释,如图,其他则不变。这段代码是根据区划代码在后面加上标记

 

 

 输出如图:这里的000000是中华人民共和国,因为只有一个,所以我这里不做判断

 

 

 

 然后将打印台内容全选放入一个新的txt文本,取名code.txt。里面保存的是行政区划+区划标记的文本。(先暂时存放,后面还需要用到)

3.将 处理追加字符下面这段代码注释掉,打开 直接输出 的注释,如图

 

 

 然后将过滤条件更换成匹配中文,然后将 直接输出 这行代码注释掉,将蓝色部分代码的注释打开。这里解释一下为什么已经匹配了中文还需要这么麻烦地对比字符,这是因为有些地名实在是千奇百怪,直接匹配地名,会导致匹配出来行数对不上,也就是数据有问题,所以才这样走一遍的:

 

 

 然后直接输出,打印台是这样的,复制到Ecel,这样就拿到第二列的区划名称

 

 

 

4.读取刚才的code.txt,注意路径。并将以下代码如图注释掉。这里匹配字母然后选择过滤后直接输出!

 

 

 记得将上面的读取的文件替换成code.txt,将code处理这段注释打开,

 

打印出一串字母+数字组合,再将以下打印台输出的额所有内容复制到code.txt,再更换过滤条件,单独将数字取出

 

 过滤后直接输出即可,不需要处理什么,注意运行前记得把匹配的6为数字改成1位

 

 

 控制台输出:

 

 

 将其复制粘贴至Excel中即可。注意1开始是从北京市开始的,而不是从中华人民共和国开始。

 

 

 逻辑有点乱,下次好好梳理

Guess you like

Origin www.cnblogs.com/yuan-zhou/p/12111343.html