Groovy OpenCSV read value with backslash (e.g. "domain\user" )

sfgroups :

In Groovy I am using opencsv to parse the CSV file. my code is not handling value with backslash.

my input file has this value

value1,domain\user,value2

Here is my groovy code.

   def filename = 'C:\\Temp\\list.txt'
    CSVReader csvReader = new CSVReader(new FileReader(filename))
    String[] nextRecord       
    while ((nextRecord = csvReader.readNext()) != null) {
        println nextRecord
    }
    csvReader.close()

it prints the value with-out backslash for second filed.

[value1, domainuser, value2]

How to handle the backslash value in OpenCSV?

thanks SR

============= Apache Common parser worked.

Iterable<CSVRecord> records = CSVFormat.EXCEL.parse(new FileReader(filename));
for (CSVRecord record : records) {
    String f1 = record.get(0);
    String f2 = record.get(1);
    String f3 = record.get(2);
    println f1
    println f2
    println f3
}
Dmitry Khamitov :

In version 3.9, opencsv introduced one more parser, in addition to CSVParser. A parser underlies the CSVReader. That new parser is called RFC4180Parser. As official documentation states

RFC4180 defines a standard for all of the nitty-gritty questions of just precisely how CSV files are to be formatted...

The main difference between between the CSVParser and the RFC4180Parser is that the CSVParser uses an escape character to denote "unprintable" characters while the RFC4180 spec takes all characters between the first and last quote as gospel (with the exception of the double quote which is escaped by a double quote).

So try using opencsv 3.9+ and RFC4180Parser. It works for me

def parser = new RFC4180ParserBuilder().build()
def reader = new CSVReaderBuilder(new FileReader(filename)).withCSVParser(parser).build();
println reader.readNext()

Output:

[value1, domain\user, value2]

If for some reason you can't use version 3.9 and above you can setup the old parser so that the escape character is some other character instead of backslash. But in this case there is a risk of breaking other rows from the file if the original file's creator uses backslash as their escape character according to the official documentation

... Sometimes the separator character is included in the data for a field itself, so quotation characters are necessary. Those quotation characters could be included in the data also, so an escape character is necessary...

So my suggestion is to use version 3.9+ and RFC4180Parser

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=161343&siteId=1