本篇博客中将简要介绍几种常见文件读取的函数。
read.table()可以用来读取纯文本格式的文件
read.table(file, header = FALSE, sep = "", quote = "\"'", dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"), row.names, col.names, as.is = !stringsAsFactors, na.strings = "NA", colClasses = NA, nrows = -1, skip = 0, check.names = TRUE, fill = !blank.lines.skip, strip.white = FALSE, blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE, flush = FALSE, stringsAsFactors = default.stringsAsFactors(), fileEncoding = "", encoding = "unknown", text, skipNul = FALSE) read.csv(file, header = TRUE, sep = ",", quote = "\"", dec = ".", fill = TRUE, comment.char = "", ...)默认使用‘,’分割 read.csv2(file, header = TRUE, sep = ";", quote = "\"", dec = ",", fill = TRUE, comment.char = "", ...) read.delim(file, header = TRUE, sep = "\t", quote = "\"", dec = ".", fill = TRUE, comment.char = "", ...)默认使用制表符分割 read.delim2(file, header = TRUE, sep = "\t", quote = "\"", dec = ",", fill = TRUE, comment.char = "", ...)
Arguments
file |
读取文件的名称或者路径,注意要用“/” |
header |
header=TRUE 表示将文件第一行作为列名 |
sep |
数据文件的分隔符,read.table()的默认分隔符为sep=‘’,csv文件默认分隔符为‘,’。分隔符还有“” |
quote |
一组引用字符。要完全禁用引用,使用quote = ""。查看嵌入在引号中的引号上的行为。只对作为字符读取的列考虑引用,除非指定了colClasses,否则所有列都是字符。 |
dec |
the character used in the file for decimal points. |
numerals |
string indicating how to convert numbers whose conversion to double precision would lose accuracy, see |
row.names |
行名称的向量。可以是给出实际行名称的向量,或者给出包含行名称的表的列的单个数字,或者给出包含行名称的表列的名称的字符串。 如果有标题,第一行包含的字段比列数少一个,则输入中的第一列用于行名称。否则,如果缺少row.names,则对行进行编号。
扫描二维码关注公众号,回复:
5641517 查看本文章
使用row.names = NULL强制行编号。缺少或空行 row..names生成被认为是“自动”的行名称(不被as.matrix保存)。 |
col.names |
a vector of optional names for the variables. The default is to use |
as.is |
the default behavior of Note: to suppress all conversions including those of numeric columns, set Note that |
na.strings |
这是处理缺失值的参数,如果知道数据集中用什么函数代表缺失值,就可以用这个参数将缺失值换成na |
colClasses |
character. A vector of classes to be assumed for the columns. If unnamed, recycled as necessary. If named, names are matched with unspecified values being taken to be Possible values are Note that |
nrows |
表示读取的行数 |
skip |
表示跳过几行 |
check.names |
logical. If |
fill |
逻辑值,如果为真,则在行长度不相等的情况下,将隐式添加空白字段。看到的细节。 |
strip.white |
logical. Used only when |
blank.lines.skip |
logical: if |
comment.char |
character: a character vector of length one containing a single character or an empty string. Use |
allowEscapes |
logical. Should C-style escapes such as \n be processed or read verbatim (the default)? Note that if not within quotes these could be interpreted as a delimiter (but not as a comment character). For more details see |
flush |
logical: if |
stringsAsFactors |
在默认情况下,字符型变量被转化为因子,我们并不总需要这样做,当设置 |
fileEncoding |
character string: if non-empty declares the encoding used on a file (not a connection) so the character data can be re-encoded. See the ‘Encoding’ section of the help for |
encoding |
encoding to be assumed for input strings. It is used to mark character strings as known to be in Latin-1 or UTF-8 (see |
text |
character string: if |
skipNul |
logical: should nuls be skipped? |
... |
Further arguments to be passed to |
加粗部分为常用的参数。
下面我们演示一遍,在这之前,我将R中内置数据集写入CSV文件和txt文件中,便于我们操作
> getwd()
[1] "C:/Users/DELL/Documents"
> setwd('E:/R工作路径')#首先设置工作目录,否则输入输出的文件夹将是上一个。而且工作路径和文件目
#录不同时需要输入绝对路径。
> getwd()
[1] "E:/R工作路径"
> read.table('mtcars_1.txt',sep=',')#在我的Rstudio里如果不加sep=''就会报错,很希望有大佬能帮忙解释一下。
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
1 mpg cyl disp hp drat wt qsec vs am gear carb
2 Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4
3 Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
4 Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
5 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
6 Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
7 Valiant 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
8 Duster 360 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4
9 Merc 240D 24.4 4 146.7 62 3.69 3.19 20 1 0 4 2
10 Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2
11 Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4
12 Merc 280C 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4
13 Merc 450SE 16.4 8 275.8 180 3.07 4.07 17.4 0 0 3 3
14 Merc 450SL 17.3 8 275.8 180 3.07 3.73 17.6 0 0 3 3
15 Merc 450SLC 15.2 8 275.8 180 3.07 3.78 18 0 0 3 3
16 Cadillac Fleetwood 10.4 8 472 205 2.93 5.25 17.98 0 0 3 4
17 Lincoln Continental 10.4 8 460 215 3 5.424 17.82 0 0 3 4
18 Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
19 Fiat 128 32.4 4 78.7 66 4.08 2.2 19.47 1 1 4 1
20 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
21 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.9 1 1 4 1
22 Toyota Corona 21.5 4 120.1 97 3.7 2.465 20.01 1 0 3 1
23 Dodge Challenger 15.5 8 318 150 2.76 3.52 16.87 0 0 3 2
24 AMC Javelin 15.2 8 304 150 3.15 3.435 17.3 0 0 3 2
25 Camaro Z28 13.3 8 350 245 3.73 3.84 15.41 0 0 3 4
26 Pontiac Firebird 19.2 8 400 175 3.08 3.845 17.05 0 0 3 2
27 Fiat X1-9 27.3 4 79 66 4.08 1.935 18.9 1 1 4 1
28 Porsche 914-2 26 4 120.3 91 4.43 2.14 16.7 0 1 5 2
29 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
30 Ford Pantera L 15.8 8 351 264 4.22 3.17 14.5 0 1 5 4
31 Ferrari Dino 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6
32 Maserati Bora 15 8 301 335 3.54 3.57 14.6 0 1 5 8
33 Volvo 142E 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2
此时,由于我并没有加header=TRUE参数,所以系统自动给数据框加上了列名。
下面我们使用header=TRUE参数,
> y<-read.table('mtcars_1.txt',header=TRUE,sep=',')
> y
X mpg cyl disp hp drat wt qsec vs am gear carb
1 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
5 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
6 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
7 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
8 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
9 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
10 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
11 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
12 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
13 Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
14 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
15 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
16 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
17 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
18 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
19 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
20 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
21 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
22 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
23 AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
24 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
25 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
26 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
27 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
28 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
29 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
30 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
31 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
32 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
>
此时的列名变成了第一列
我们可以发现,有时候文件行数太多显示出来救刷屏了,因此此时我们可以使用head()和tail()函数,截取前六行和末尾六行
> head(y)
X mpg cyl disp hp drat wt qsec vs am gear carb
1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
> tail(y)
X mpg cyl disp hp drat wt qsec vs am gear carb
27 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
28 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
29 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
30 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
31 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
32 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1 1 4 2
也可以加上参数n
> head(y,n=8)
X mpg cyl disp hp drat wt qsec vs am gear carb
1 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
5 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
6 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
7 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
8 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
>
有时候在文件开头会包含一段介绍性文字,此时可以用skip参数跳过。
> y<-read.table('mtcars.csv',header=TRUE,sep=',',skip = 20)
> y
Toyota.Corolla X33.9 X4 X71.1 X65 X4.22 X1.835 X19.9 X1 X1.1 X4.1 X1.2
1 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
2 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
3 AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
4 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
5 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
6 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
7 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
8 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
9 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
10 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
11 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
12 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
>
从第21 行读起。
有时候我们只需要文件的一部分,我们可以用nrows参数于skip参数结合就可以读取文按的任意部分了
> y<-read.table('mtcars.csv',header=TRUE,sep=',',skip = 20,nrows = 5)
> y
Toyota.Corolla X33.9 X4 X71.1 X65 X4.22 X1.835 X19.9 X1 X1.1 X4.1 X1.2
1 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
2 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
3 AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
4 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
5 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
>