How do I use sed () replace all \r\n with a different end of line escape sequence?

Muraguri E :

So, I am trying to load data into MySQL using LOAD DATA LOCAL INFILE. About 2,400,000 rows.

Some of these rows have lines with a line breaks inside the data and are not being inserted.

One thing we are sure about is that the last character before the actual linebreak is a " (double quote) so we can change all linebreaks next to a " to be something else like "***\r\n.

This will enable me to add the statement LINES TERMINATED BY '***\r\n' instead of just \r\n

E.g

One entry in my input file: in.csv (Line terminated by \r\n)

1,223,"{...}","Some title with

line breaks"\r\n

Wanted output after processing file:

1,223,"{...}","Some title with

line breaks"***\r\n
`

An alternative would be to remove all line breaks within the CSV data and only have line breaks at the end of the row. Problem is I'm not sure how to do that fast enough since I am dealing with really large files (2GB+)

Now, I have tried sed unsuccessfully. I think I am missing something. Here is my closest attempt.

sed ':a;N;$!ba;s/"$/***"\r\n/g' in.csv > out.csv

However, this did not work for me.

Thanks.

***UPDATE***

I realized that all records were being inserted by load data query. I thought otherwise because the line-count by using wc-l was different from select count(*).

Then I realized that when wc-l encounters data with embedded linebreaks, it regards it as a separate row instead of regarding it as part of the same row.

I thank you all for your efforts.

Tim Biegeleisen :

Here is a PHP script which should meet your needs:

$fin = fopen("input.txt", "r");
$fout = fopen("output.txt", "w");

while(!feof($fin)) {
    $line = preg_replace("/(?<!\")\r?\n/", "", fgets($fin));
    fwrite($fout, $line);
}

fclose($fin);
fclose($fout);

This solution uses the regex pattern (?<!\")\r?\n to target CR?LF only if it does not immediately follow a double quote.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=11143&siteId=1