d_frEak :
So I have huge csv file (assume 5 GB) and I want to insert the data to the table but it return error that the length of the data is not the same
I found that some data has more columns than I want For example the correct data I have has 8 columns but some data has 9 (it can be human/system error)
I want to take only 8 columns data, but because the data is so huge, I can not do it manually or using parsing in python
Any recommendation of a way to do it?
I am using linux, so any linux command also welcome
In sql I am using COPY ... FROM ... CSV HEADER; command to import the csv into table
Romeo Ninov :
You can use awk
for this purpose. Assuming you field delimiter is comma (,
) this code can do the work:
awk -F\, 'NF==8 {print}' input_file >output_file