CSVファイルのjavaを読むための最速の方法

サンティアゴファハルド:

I'veはopenCSVを使用して(20メガバイトをarroundの)いくつかのCSVファイルを読み込むしようとしてきたが、これまでのところ、それは遅れています。イムは、私が設計しているヒープ、に - 私のロード4つのCSVファイルを読み込むしようとしています。これははるかに少ない時間で行うことができる他の方法があれば、私は、思っていました。

private Heap<VOMovingViolations> datosHeap; 

public void loadMovingViolations()
{
    Runtime garbage = Runtime.getRuntime(); 
    garbage.gc();
    try 
    {
        FileReader fileReaderMes1 = new FileReader(FECHAS[0]);
        FileReader fileReaderMes2 = new FileReader(FECHAS[1]); 
        FileReader fileReaderMes3 = new FileReader(FECHAS[2]); 
        FileReader fileReaderMes4 = new FileReader(FECHAS[3]); 
        CSVReader enero = new CSVReaderBuilder(fileReaderMes1).withSkipLines(1).build();
        CSVReader febrero = new CSVReaderBuilder(fileReaderMes2).withSkipLines(1).build();
        CSVReader marzo = new CSVReaderBuilder(fileReaderMes3).withSkipLines(1).build();
        CSVReader abril = new CSVReaderBuilder(fileReaderMes4).withSkipLines(1).build();

        String[] row; 


        while((row = enero.readNext()) != null)
        {
            int objectId = Integer.parseInt(row[0]); 
            int totalPaid = (int)Double.parseDouble(row[9]);
            short fi = Short.parseShort(row[8]);
            short penalty1 = Short.parseShort(row[10]);
            datosHeap.insert(new VOMovingViolations(objectId, totalPaid,  fi,  row[2], row[13],
                        row[12],row[14], row[15], row[4], row[3], penalty1));
        }

        while((row = febrero.readNext()) != null)
        {
            int objectId = Integer.parseInt(row[0]); 
            int totalPaid = (int)Double.parseDouble(row[9]);
            short fi = Short.parseShort(row[8]);
            short penalty1 = Short.parseShort(row[10]);
            datosHeap.insert(new VOMovingViolations(objectId, totalPaid,  fi,  row[2], row[13],
                        row[12],row[14], row[15], row[4], row[3], penalty1));
        }

        while((row = marzo.readNext()) != null)
        {
            int objectId = Integer.parseInt(row[0]); 
            int totalPaid = (int)Double.parseDouble(row[9]);
            short fi = Short.parseShort(row[8]);
            short penalty1 = Short.parseShort(row[10]);
            datosHeap.insert(new VOMovingViolations(objectId, totalPaid,  fi,  row[2], row[13],
                        row[12],row[14], row[15], row[4], row[3], penalty1));
        }

        while((row = abril.readNext()) != null)
        {
            int objectId = Integer.parseInt(row[0]); 
            int totalPaid = (int)Double.parseDouble(row[9]);
            short fi = Short.parseShort(row[8]);
            short penalty1 = Short.parseShort(row[10]);
            datosHeap.insert(new VOMovingViolations(objectId, totalPaid,  fi,  row[2], row[13],
                        row[12],row[14], row[15], row[4], row[3], penalty1));
        }

    } 
    catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } 


}

私は実際に任意のヘルプまたは誰かが私に与えてください可能性が任意のアイデアをいただければ幸いです。

バジルボーク:

TL; DR

20メガバイトCSVファイルを読み取り、行ごとにオブジェクトをインスタンス化する、かかる1秒未満の合計経過時間に。

細部

あなたは「遅い」という用語を定義していませんでした。だから私は、実験、カジュアルなベンチマークテストを行いました。

まず、40,000の20メガバイトのファイルを作成しPersonたレコードを。それぞれはPersonフランス語で最初&最後の名前を保持しているUUID、および説明など、いくつかの任意のテキスト。データは4つの列として書かれたCSVでファイルUTF-8 私は使用のApache CommonsのCSVの読み書きするライブラリを。

第二に、この記述されたファイルが読み込まれます。データの各行は、メモリに読み込まれ、次にインスタンス化するために使用して収集されるPersonオブジェクト。

このファイルを読み取り、インスタンスPerson各行のオブジェクトがかかった総経過時間に1秒未満各行は20K程度かかりナノ秒実は、これはファイルの読み込み含み二回私たちが収集したインスタンスの初期容量を設定するには、データの行数をカウントするためにスキャンを行うよう、。また、我々は、128ビットの値に16進文字列の入力を解析しているUUID我々はデータ処理(ちょうど読んでいない)に費やされたいくつかの時間を持っているので、。

ここにあるPersonクラスは。

package work.basil.example;

import java.util.UUID;

public class Person
{
    // Static
   static public  String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";

    // Member variables.
    public String givenName, surname, description;
    public UUID id;

    public Person ( String givenName , String surname , UUID id , String description)
    {
        this.givenName = givenName;
        this.surname = surname;
        this.id = id;
        this.description = description ;
    }

    @Override
    public String toString ()
    {
        return "Person{ " +
                "givenName='" + givenName + '\'' +
                " | surname='" + surname + '\'' +
                " | id='" + id + '\'' +
                " }";
    }
}

そして、ここに書き込んでから、20メガバイトのファイルを読み込み、完全なアプリです。私はちょっとの間に、このアップをホイップとして、勉強して批評してください。私は私の仕事をダブルチェックしていませんでした。

あなたは見つけるwrite方法、およびread方法を。mainこの方法は、両方を呼び出し、時間を追跡します。

package work.basil.example;

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import org.apache.commons.csv.CSVRecord;

import java.io.BufferedReader;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.Duration;
import java.time.Instant;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
import java.util.concurrent.ThreadLocalRandom;

public class CsvSpeed
{
    public List < Person > read ( Path path )
    {
        // TODO: Add a check for valid file existing.

        List < Person > list = List.of();  // Default to empty list.
        try
        {
            // Prepare list.
            int initialCapacity = ( int ) Files.lines( path ).count();
            list = new ArrayList <>( initialCapacity );

            // Read CSV file. For each row, instantiate and collect `DailyProduct`.
            BufferedReader reader = Files.newBufferedReader( path );
            Iterable < CSVRecord > records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse( reader );
            for ( CSVRecord record : records )
            {
                String givenName = record.get( "givenName" );
                String surname = record.get( "surname" );
                UUID id = UUID.fromString( record.get( "id" ) );
                String description = record.get( "description" );
                // Instantiate `Person` object, and collect it.
                Person person = new Person( givenName , surname , id , description );
                list.add( person );
            }
        } catch ( IOException e )
        {
            e.printStackTrace();
        }
        return list;
    }

    public void write ( final Path path )
    {
        ThreadLocalRandom random = ThreadLocalRandom.current();
        try ( final CSVPrinter printer = CSVFormat.RFC4180.withHeader( "givenName" , "surname" , "id" , "description" ).print( path , StandardCharsets.UTF_8 ) ; )
        {
            int limit = 40_000;  // 40_000 yields about 20 MB of data.
            List < String > givenNames = List.of( "Adrien" , "Aimon" , "Alerion" , "Alexis" , "Alezan" , "Ancil" , "Andre" , "Antoine" , "Archard" , "Aurélien" , "Averill" , "Baptiste" , "Barnard" , "Bartelemy" , "Bastien" , "Baylee" , "Beale" , "Beau" , "Beaumont" , "Beauregard" , "Bellamy" , "Berger" , "Blaize" , "Blondel" , "Boyce" , "Bruce" , "Brunelle" , "Brys" , "Burcet" , "Burnell" , "Burrell" , "Byron" , "Canaan" , "Carden" , "Carolas" , "Cavell" , "Chace" , "Chanler" , "Chante" , "Chappel" , "Charles" , "Chasen" , "Chason" , "Chemin" , "Chene" , "Cher" , "Chevalier" , "Cheyne" , "Clément" , "Clemence" , "Corbin" , "Coty" , "Cygne" , "Damien" , "Dandre" , "Dariel" , "Darl" , "Dauphine" , "Davet" , "Dax" , "Dean" , "Delice" , "Delmon" , "Destin" , "Dominique" , "Donatien" , "Duke" , "Eliott" , "Elroy" , "Enzo" , "Erwan" , "Etalon" , "Ethan" , "Fabron" , "Ferrand" , "Filberte" , "Florent" , "Florian" , "Fontaine" , "Forest" , "Fortune" , "Franchot" , "Francois" , "Fraser" , "Frayne" , "Gaëtan" , "Gabin" , "Gage" , "Gaige" , "Garland" , "Garner" , "Gaston" , "Gauge" , "Gaylord" , "Germain" , "Germaine" , "German" , "Gervaise" , "Giles" , "Gilles" , "Gitan" , "Grosvener" , "Guifford" , "Guion" , "Guy" , "Guzman" , "Henri" , "Holland" , "Hugo" , "Hugues" , "Hyacinthe" , "Jérémy" , "Jacquan" , "Jacques" , "Jacquez" , "Janvier" , "Jardan" , "Jay" , "Jaye" , "Jehan" , "Jemond" , "Jocquez" , "Jonathan" , "Jules" , "Julien" , "Justus" , "Karoly" , "Lado" , "Lafayette" , "Lamond" , "Lancelin" , "Landis" , "Landry" , "Laron" , "Larrimore" , "Laurent" , "LaValle" , "Leandre" , "Leggett" , "Leonce" , "Leron" , "Leverett" , "Lilian" , "Loïc" , "Lorenzo" , "Louis" , "Lowell" , "Luc" , "Lucien" , "Lukas" , "Macaire" , "Mace" , "Mahieu" , "Maison" , "Malleville" , "Manneville" , "Mantel" , "Marc" , "Marcel" , "Marion" , "Marius" , "Markez" , "Markis" , "Marmion" , "Marquis" , "Marquise" , "Marshall" , "Martial" , "Maslin" , "Mason" , "Matheo" , "Mathias" , "Mathys" , "Matthieu" , "Maxence" , "Mayson" , "Mehdi" , "Merle" , "Merville" , "Montague" , "Montaigu" , "Monte" , "Montgomery" , "Montreal" , "Montrel" , "Moore" , "Morel" , "Mortimer" , "Nerville" , "Neuveville" , "Nicolas" , "Noë" , "Noah" , "Noe" , "Norman" , "Norville" , "Nouel" , "Olivier" , "Onfroi" , "Paien" , "Parfait" , "Parnell" , "Pascal" , "Patrice" , "Paul" , "Peppin" , "Percival" , "Percy" , "Pernell" , "Peverell" , "Philipe" , "Pierpont" , "Pierre" , "Pomeroy" , "Prewitt" , "Purvis" , "Quennell" , "Quentin" , "Quincey" , "Quincy" , "Quintin" , "Rémi" , "Rafaelle" , "Ranger" , "Raoul" , "Raphaël" , "Rapier" , "Rawlins" , "Ray" , "Raynard" , "Remi" , "René" , "Renard" , "Rene" , "Reule" , "Reynard" , "Robin" , "Romain" , "Rondel" , "Roy" , "Royal" , "Ruff" , "Rush" , "Russel" , "Rustin" , "Sabastien" , "Sacha" , "Salomon" , "Samuel" , "Satordi" , "Saville" , "Scoville" , "Sebastien" , "Sennett" , "Severin" , "Shant" , "Shantae" , "Sidney" , "Siffre" , "Simeon" , "Simon" , "Sinclair" , "Sofiane" , "Somer" , "Stephane" , "Sully" , "Sydney" , "Sylvain" , "Talbot" , "Talon" , "Telford" , "Tempest" , "Teppo" , "Théo" , "Thayer" , "Thibault" , "Thibaut" , "Thiery" , "Tiennan" , "Tiennot" , "Titouan" , "Toussaint" , "Travaris" , "Tyson" , "Urson" , "Vachel" , "Valentin" , "Valere" , "Vallis" , "Verdun" , "Victoir" , "Victor" , "Waltier" , "William" , "Wyatt" , "Yanis" , "Yann" , "Yves" , "Yvon" , "Zosime" , "Abrial" , "Abrielle" , "Abril" , "Adele" , "Alair" , "Alerion" , "Amee" , "Angelique" , "Annette" , "Antonella" , "Arian" , "Ariane" , "Armandina" , "Aubree" , "Aubrielle" , "Audra" , "Avril" , "Bella" , "Berneta" , "Bette" , "Blaise" , "Blanche" , "Blasa" , "Bonte" , "Brie" , "Brienne" , "Brigit" , "Cachay" , "Calice" , "Camille" , "Camylle" , "Caprice" , "Caressa" , "Caroline" , "Catin" , "Celesta" , "Celeste" , "Cera" , "Cerise" , "Chablis" , "Chalice" , "Chambray" , "Champagne" , "Chandell" , "Chaney" , "Chantal" , "Chante" , "Chanterelle" , "Chantile" , "Chantilly" , "Chantrice" , "Charla" , "Charlotte" , "Charmane" , "Chaton" , "Chemin" , "Chenetta" , "Cher" , "Chere" , "Cheri" , "Cheryl" , "Christine" , "Cidney" , "Cinderella" , "Claire" , "Claudette" , "Colette" , "Cordelle" , "Cydnee" , "Daeja" , "Daija" , "Daja" , "Damzel" , "Darelle" , "Darlene" , "Darselle" , "Dejanelle" , "Deleena" , "Delice" , "Demeri" , "Deni" , "Denise" , "Desgracias" , "Desire" , "Desiree" , "Destanee" , "Destiny" , "Dior" , "Domanique" , "Dominique" , "Elaina" , "Elaine" , "Elayna" , "Elise" , "Eloisa" , "Elyse" , "Emeline" , "Emmaline" , "Emmeline" , "Estella" , "Estrella" , "Etiennette" , "Evette" , "Fabienne" , "Fabrienne" , "Fanchon" , "Fancy" , "Fawna" , "Fayana" , "Fayette" , "Fifi" , "Fleur" , "Fleurette" , "Fontanna" , "Fosette" , "Francine" , "Frederique" , "Gabriel" , "Gabriele" , "Gabrielle" , "Gaby" , "Garcelle" , "Gena" , "Genie" , "Georgette" , "Germaine" , "Gervaise" , "Gitana" , "Harriet" , "Heloisa" , "Holland" , "Honnetta" , "Isabelle" , "Ivette" , "Ivonne" , "Jacqueena" , "Jacquetta" , "Jacquiline" , "Jacyline" , "Jaime" , "Jakqueline" , "Janeen" , "Janelly" , "Janina" , "Janiqua" , "Janique" , "Jannnelle" , "Jaquita" , "Jardena" , "Jeanetta" , "Jermaine" , "Jessamine" , "Jewel" , "Jewell" , "Joli" , "Jolie" , "Josephine" , "Jozephine" , "Julieta" , "Karessa" , "Karmaine" , "Klara" , "Laine" , "Lanelle" , "Laramie" , "Layne" , "Layney" , "Leala" , "Leonette" , "Lissette" , "Lizette" , "Lourdes" , "Lucienne" , "Ly" , "Lyla" , "Lysette" , "Madelaine" , "Malerie" , "Manette" , "Marais" , "Marcelle" , "Marché" , "Mardi" , "Margo" , "Marguerite" , "Marie" , "Marie Claude" , "Marie Frances" , "Marie Joelle" , "Marie Pascale" , "Marie Sophie" , "Marjolaine" , "Marquise" , "Marvella" , "Mathieu" , "Matisse" , "Maurelle" , "Maurissa" , "Mavis" , "Melisande" , "Michelle" , "Miette" , "Mignon" , "Mimi" , "Mirya" , "Monet" , "Moniqua" , "Monteen" , "Musetta" , "Myrlie" , "Nadeen" , "Nadia" , "Nadiyah" , "Naeva" , "Nanon" , "Natalle" , "Naudia" , "Nettie" , "Nicholas" , "Nicki" , "Nicky" , "Nicole" , "Nicolette" , "Nicolina" , "Nicolle" , "Nikolette" , "Ninette" , "Ninon" , "Noelle" , "Nycole" , "Odelette" , "Opaline" , "Orane" , "Orva" , "Page" , "Parisa" , "Parnel" , "Parris" , "Patrice" , "Peridot" , "Pippi" , "Prairie" , "Rachele" , "Rachelle" , "Racquel" , "Raphaelle" , "Raquelle" , "Remi" , "Renée" , "Renea" , "Renelle" , "Renita" , "Risette" , "Rochelle" , "Romy" , "Rosabel" , "Rosiclara" , "Ruba" , "Russhell" , "Saleena" , "Salina" , "Satin" , "Sedona" , "Serene" , "Shandelle" , "Shanta" , "Shante" , "Shariah" , "Sharita" , "Sharleen" , "Sheree" , "Shereen" , "Sherell" , "Sherice" , "Sherry" , "Sidnee" , "Sidney" , "Sidnie" , "Sidonie" , "Sinclaire" , "Solange" , "Solen" , "Sorrel" , "Suzette" , "Sydnee" , "Sydney" , "Tallis" , "Tempest" , "Toinette" , "Turquoise" , "Veronique" , "Vignette" , "Villette" , "Violeta" , "Virginie" , "Voleta" , "Vonny" );
            List < String > surnames = List.of( "Arceneau" , "Aucoin" , "Babin" , "Babineaux" , "Benoit" , "Bergeron" , "Bernard" , "Bertrand" , "Bessette" , "Blanc" , "Blanchard" , "Bonnet" , "Boucher" , "Bourg" , "Bourque" , "Boutin" , "Bouvier" , "Braud" , "Broussard" , "Brun" , "Chevalier" , "David" , "Depaul" , "Desmarais" , "Disney" , "Dubois" , "Dupont" , "Dupuis" , "Durand" , "Fortescue" , "Fournier" , "Garnier" , "Gaudet" , "Gillet" , "Gillette" , "Girard" , "Gravois" , "Grosvenor" , "Lambert" , "Landry" , "Laroche" , "Laurent" , "Lefevre" , "Leroy" , "Leveque" , "Lisle" , "Martin" , "Michel" , "Molyneux" , "Moreau" , "Morel" , "Neville" , "Pelletier" , "Petit" , "Prideux" , "Renard" , "Richard" , "Robert" , "Rousseau" , "Roux" , "Rufus" , "Simon" , "Thomas" );
            for ( int i = 1 ; i <= limit ; i++ )
            {
                String givenName = givenNames.get( random.nextInt( 0 , givenNames.size() ) );
                String surname = surnames.get( random.nextInt( 0 , surnames.size() ) );
                UUID id = UUID.randomUUID();
                String description = Person.LOREM_IPSUM;
                printer.printRecord( givenName , surname , id , description );
            }
        } catch ( IOException e )
        {
            e.printStackTrace();
        }
    }

    public static void main ( final String[] args )
    {
        // Launch the app.
        CsvSpeed app = new CsvSpeed();

        // Write.
        String when = Instant.now().truncatedTo( ChronoUnit.SECONDS ).toString().replace( ":" , "•" );
        Path pathOutput = Paths.get( "/Users/basilbourque/persons.csv" );
        app.write( pathOutput );
        System.out.println( "Writing file: " + pathOutput );

        // Read.
        long start = System.nanoTime();
        Path pathInput = Paths.get( "/Users/basilbourque/persons.csv" );
        List < Person > list = app.read( pathInput );
        long stop = System.nanoTime();

        // Time.
        long elapsed = ( stop - start );
        Duration d = Duration.ofNanos( elapsed );
        System.out.println( "Reading elapsed: " + d );
        System.out.println( "Reading took nanos per row: " + ( elapsed / list.size() ) );
        System.out.println( "nanos elapsed: " + elapsed + "  |  list.size: " + list.size() );
    }
}

実行すると:

書き込みファイル:/Users/basilbourque/persons.csv

PT0.857816234S:経過を読みます

読書は、行ごとのnanos値を取った:21445

nanos値が経過:857816234 | list.size:40000

テクノロジー・スタック:

  • Javaの11.0.2 - ズールーアズールSystemsによる(OpenJDKのから構築されました)
  • IntelliJの2019.1内で実行
  • MacOSのモハーベ
  • MacBook Proの(網膜、15インチ、後期2013)
  • プロセッサ:2.3 GHzのインテルCore i7プロセッサー(4個のコア、8ハイパー)
  • 16ギガバイト1600 MHzのDDR3
  • ストレージ:ソリッド・ステートは、内蔵のAppleが

おすすめ

転載: http://43.154.161.224:23101/article/api/json?id=182820&siteId=1