Number of [R] read.table / read.delim fewer rows of data read?

I thought to read.table/read.delimfamiliar, who knows and fell into the pit.

I have more than 30,000 rows of data sets, including the expression of the sample, and comments. Probably a long way:
image.png

Originally more than 30,000 lines, but the reading came in line becomes more than 10,000, and the number of lines read.delim and read.table reduction is not the same. I use Excel to open, and then save it back to read txt format, the number of data lines return to normal more than 30,000.

MP <- read.delim("combine_test.txt",sep = '\t',header = T)
MP1 <- read.table("combine_test.txt",sep = '\t',header = T)
MP2<- read.delim("new_combine_test.txt",sep = '\t',header = T)

image.png

So I think it is not Rstudio problem. So I tested under Linux, find more strange.

MP <- read.table("combine_test2.txt",header = T,sep='\t')
dim(MP)
MP2 <- read.delim("combine_test2.txt",header = T,sep='\t')
dim(MP2)
write.table(MP,"out.txt",col.names=T,row.names=F,sep='\t',quote=F)
write.table(MP2,"out.txt",col.names=T,row.names=F,sep='\t',quote=F)

dim the display line are over 10,000, the output data as there are more than 30,000 rows!

I realized that the problem of the data format. With readr to try:

MP2 <- as.data.frame(read_delim("combine_test.txt",delim = '\t'))

Back to normal. Do base Rnot like tidyverseit? ? ? I checked the Internet search, finally found the reason, it is a quotematter of argument.

MP3 <- read.table("combine_test.txt",sep = '\t',quote = "",header = T)
MP4 <- read.delim("combine_test.txt",sep = '\t',quote = "",header = T)

image.png
About quoteparameters, that answer is so explained:

Explanation: Your data has a single quote on 59th line (( pyridoxamine 5'-phosphate oxidase (predicted)). Then there is another single quote, which complements the single quote on line 59, is on line 137 (5'-hydroxyl-kinase activity...). Everything within quote will be read as a single field of data, and quotes can include the newline character also. That's why you lose the lines in between. quote = "" disables quoting altogether.

My simple understanding is data which contains the single quotation marks '', as would a field between two single quotes to deal with, I need to use advance quote=""the cause to the field. I checked under my description in KEGG does contain quotes.

If the field itself contains double quotes string ""or when other symbol may be mistaken. To check such an error can be used count.fieldsto count the number of fields in each row, if the NA occurs, then the data read error.

num.fields = count.fields("combine_test.txt", sep="\t")

image.png

num.fields = count.fields("combine_test.txt", sep="\t",quote = "")

image.png

Seemingly read.csvbe no problem, because it caused earlier come. Read.table does have seen an unexpected error has occurred. To know more about freadand readrcampaign today.

Guess you like

Origin www.cnblogs.com/jessepeng/p/11445943.html