What is the best/fastest way to load these .csv files?

Nicolás Cárdenas :

I've been attempting to load 6 .csv files. I've used the CSVReader library, and the BufferedReader one. I noticed that when using the BufferedReader library the files are loaded faster. Nonetheless, this caused me to have an OutOfMemory exception, so I had to set the max memory usage to 1024mb in Eclipse. On the other hand, when I use the CSVReader library I don't get this issue. I'm wondering if there's something wrong in my code, and what could be the best way to load the files considering it to be optimal in terms of speed and memory. Here is my code (I'm using BufferedReader here):

public void loadMovingViolations(int semestre)
{
    try
    {
        String path = ".\\data\\Moving_Violations_Issued_in_";
        String mth1Path = "";
        String mth2Path = "";
        String mth3Path = "";
        String mth4Path = "";
        String mth5Path = "";
        String mth6Path = "";

        if (semestre == 1)
        {
            mth1Path = path + "January_2018.csv";
            mth2Path = path + "February_2018.csv";
            mth3Path = path + "March_2018.csv";
            mth4Path = path + "April_2018.csv";
            mth5Path = path + "May_2018.csv";
            mth6Path = path + "June_2018.csv";
        }

        else if (semestre == 2)
        {
            mth1Path = path + "July_2018.csv";
            mth2Path = path + "August_2018.csv";
            mth3Path = path + "September_2018.csv";
            mth4Path = path + "October_2018.csv";
            mth5Path = path + "November_2018.csv";
            mth6Path = path + "December_2018.csv";
        }

        String[] mths = {mth1Path, mth2Path, mth3Path, mth4Path, mth5Path, mth6Path};
        String cPath = "";

        int numInfracs = 0;
        int[] infracs = new int[6];
        double xMin = Double.MAX_VALUE, yMin = Double.MAX_VALUE, xMax = 0, yMax = 0;
        BufferedReader br = null;

        int i = 0;
        while (i < mths.length)
        {
            int tempInfrac = 0;
            cPath = mths[i];
            br = new BufferedReader(new FileReader(cPath));
            String row = br.readLine();

            while ( (row = br.readLine()) != null)
            {   
                String[] columns = row.split(",");

                String in1 = columns[0];
                Integer objId = Integer.parseInt(in1);

                String location = columns[2];

                String in2 = columns[3];
                int adressId = 0;
                if ( !(in2.compareTo("") == 0) )
                    adressId = Integer.parseInt(in2);

                String in3 = columns[4];
                double streetId = 0;
                if ( !(in3.compareTo("") == 0) )
                    streetId = Double.parseDouble(in3);

                String in4 = columns[5];
                Double xCord = Double.parseDouble(in4);

                String in5 = columns[6];
                Double yCord = Double.parseDouble(in5);

                String ticketType = columns[7];

                String in6 = columns[8];
                Integer fineAmt = Integer.parseInt(in6);

                String in7 = columns[9];
                double totalPaid = Double.parseDouble(in7);

                String in8 = columns[10];
                Integer penalty1 =  Integer.parseInt(in8);

                String accident = columns[12];

                String date = columns[13];

                String vioCode = columns[14];

                String vioDesc = columns[15];

                VOMovingViolations vomv = new VOMovingViolations(objId, location, adressId, streetId, xCord, yCord, ticketType, fineAmt, totalPaid, penalty1, accident, date, vioCode, vioDesc);
                movingViolationsQueue.enqueue(vomv);
                tempInfrac++;

                if (xCord > xMax)
                    xMax = xCord;

                if (yCord > yMax)
                    yMax = yCord;

                if (xCord < xMin)
                    xMin = xCord;

                if (yCord < yMin)
                    yMin = yCord;
            }

            numInfracs += tempInfrac;
            infracs[i] = tempInfrac;
            i++;
            br.close();
        }

        System.out.println();
        int j = 0;
        for (int current: infracs)
        {
            String[] sa = mths[j].substring(35).split("_");
            String mth = sa[0];
            System.out.println("En el mes " + mth + " se encontraron " + 
                                current + " infracciones");
            j++;
        }
        System.out.println();
        System.out.println("Se encontraron " + numInfracs + " infracciones en el semestre.");
        System.out.println();
        System.out.println("Minimax: " + "("+xMin+", "+yMin+"), " + "("+xMax+", "+yMax+")");
        System.out.println();
    }

    catch (Exception e)
    {
        e.printStackTrace();
        System.out.println();
        System.out.println("No se pudieron cargar los datos");
        System.out.println();
    }
}
GhostCat salutes Monica C. :

Regarding the "better" way, as usual it depends.

You are reinventing the wheel. And it is surprisingly hard to write a fully functional csv parser that works with arbitrary input data. Your parser does a simple split on ",", this means it will fail as soon as one of your columns contains a string with a comma inside! You might also run into trouble when the separation character is changed.

Your code is faster because it omits a ton of things a csv parser can do. Therefore your code works with your table, but if somebody else gives you a valid csv file, your parser will throw exceptions at you. A real csv parser would accept any well formed input!

Thus: if the sole purpose of your code is read files with that given structure, sure, you can use your faster solution. But if you expect that your input data format will change over time, then every update will make you change your code. And worse, such updates might make your code more complicated over time. Therefore you have to carefully balance development efficiency against runtime performance.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=158768&siteId=1