R语言初学者——文件读取（一）

本篇博客中将简要介绍几种常见文件读取的函数。

read.table()可以用来读取纯文本格式的文件

read.table(file, header = FALSE, sep = "", quote = "\"'",
           dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),
           row.names, col.names, as.is = !stringsAsFactors,
           na.strings = "NA", colClasses = NA, nrows = -1,
           skip = 0, check.names = TRUE, fill = !blank.lines.skip,
           strip.white = FALSE, blank.lines.skip = TRUE,
           comment.char = "#",
           allowEscapes = FALSE, flush = FALSE,
           stringsAsFactors = default.stringsAsFactors(),
           fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)

read.csv(file, header = TRUE, sep = ",", quote = "\"",
         dec = ".", fill = TRUE, comment.char = "", ...)默认使用‘，’分割

read.csv2(file, header = TRUE, sep = ";", quote = "\"",
          dec = ",", fill = TRUE, comment.char = "", ...)
read.delim(file, header = TRUE, sep = "\t", quote = "\"",
           dec = ".", fill = TRUE, comment.char = "", ...)默认使用制表符分割 

read.delim2(file, header = TRUE, sep = "\t", quote = "\"",
            dec = ",", fill = TRUE, comment.char = "", ...)

Arguments

`file`	读取文件的名称或者路径，注意要用“/”
`header`	header=TRUE 表示将文件第一行作为列名
`sep`	数据文件的分隔符，read.table()的默认分隔符为sep=‘’，csv文件默认分隔符为‘，’。分隔符还有“”
`quote`	一组引用字符。要完全禁用引用，使用quote = ""。查看嵌入在引号中的引号上的行为。只对作为字符读取的列考虑引用，除非指定了colClasses，否则所有列都是字符。
`dec`	the character used in the file for decimal points.
`numerals`	string indicating how to convert numbers whose conversion to double precision would lose accuracy, see `type.convert`. Can be abbreviated. (Applies also to complex-number inputs.)
`row.names`	行名称的向量。可以是给出实际行名称的向量，或者给出包含行名称的表的列的单个数字，或者给出包含行名称的表列的名称的字符串。如果有标题，第一行包含的字段比列数少一个，则输入中的第一列用于行名称。否则，如果缺少row.names，则对行进行编号。扫描二维码关注公众号，回复： 5641517 查看本文章使用row.names = NULL强制行编号。缺少或空行 row..names生成被认为是“自动”的行名称(不被as.matrix保存)。
`col.names`	a vector of optional names for the variables. The default is to use `"V"` followed by the column number.
`as.is`	the default behavior of `read.table` is to convert character variables (which are not converted to logical, numeric or complex) to factors. The variable `as.is` controls the conversion of columns not otherwise specified by `colClasses`. Its value is either a vector of logicals (values are recycled if necessary), or a vector of numeric or character indices which specify which columns should not be converted to factors. Note: to suppress all conversions including those of numeric columns, set`colClasses = "character"`. Note that `as.is` is specified per column (not per variable) and so includes the column of row names (if any) and any columns to be skipped.
`na.strings`	这是处理缺失值的参数，如果知道数据集中用什么函数代表缺失值，就可以用这个参数将缺失值换成na
`colClasses`	character. A vector of classes to be assumed for the columns. If unnamed, recycled as necessary. If named, names are matched with unspecified values being taken to be `NA`. Possible values are `NA` (the default, when `type.convert` is used), `"NULL"` (when the column is skipped), one of the atomic vector classes (logical, integer, numeric, complex, character, raw), or `"factor"`, `"Date"` or `"POSIXct"`. Otherwise there needs to be an `as` method (from package methods) for conversion from `"character"`to the specified formal class. Note that `colClasses` is specified per column (not per variable) and so includes the column of row names (if any).
`nrows`	表示读取的行数
`skip`	表示跳过几行
`check.names`	logical. If `TRUE` then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names. If necessary they are adjusted (by`make.names`) so that they are, and also to ensure that there are no duplicates.
`fill`	逻辑值，如果为真，则在行长度不相等的情况下，将隐式添加空白字段。看到的细节。
`strip.white`	logical. Used only when `sep` has been specified, and allows the stripping of leading and trailing white space from unquoted `character` fields (`numeric` fields are always stripped). See `scan` for further details (including the exact meaning of ‘white space’), remembering that the columns may include the row names.
`blank.lines.skip`	logical: if `TRUE` blank lines in the input are ignored.
`comment.char`	character: a character vector of length one containing a single character or an empty string. Use `""` to turn off the interpretation of comments altogether.
`allowEscapes`	logical. Should C-style escapes such as \n be processed or read verbatim (the default)? Note that if not within quotes these could be interpreted as a delimiter (but not as a comment character). For more details see `scan`.
`flush`	logical: if `TRUE`, `scan` will flush to the end of the line after reading the last of the fields requested. This allows putting comments after the last field.
`stringsAsFactors`	在默认情况下，字符型变量被转化为因子，我们并不总需要这样做，当设置`stringsAsFactors=FALSE`时，这种操作被禁止
`fileEncoding`	character string: if non-empty declares the encoding used on a file (not a connection) so the character data can be re-encoded. See the ‘Encoding’ section of the help for `file`, the ‘R Data Import/Export Manual’ and ‘Note’.
`encoding`	encoding to be assumed for input strings. It is used to mark character strings as known to be in Latin-1 or UTF-8 (see `Encoding`): it is not used to re-encode the input, but allows R to handle encoded strings in their native encoding (if one of those two). See ‘Value’ and ‘Note’.
`text`	character string: if `file` is not supplied and this is, then data are read from the value of`text` via a text connection. Notice that a literal string can be used to include (small) data sets within R code.
`skipNul`	logical: should nuls be skipped?
`...`	Further arguments to be passed to `read.table`.

加粗部分为常用的参数。

下面我们演示一遍，在这之前，我将R中内置数据集写入CSV文件和txt文件中，便于我们操作

> getwd()
[1] "C:/Users/DELL/Documents"
> setwd('E:/R工作路径')#首先设置工作目录，否则输入输出的文件夹将是上一个。而且工作路径和文件目
#录不同时需要输入绝对路径。
> getwd()
[1] "E:/R工作路径"
> read.table('mtcars_1.txt',sep=',')#在我的Rstudio里如果不加sep=''就会报错，很希望有大佬能帮忙解释一下。
                    V1   V2  V3    V4  V5   V6    V7    V8 V9 V10  V11  V12
1                       mpg cyl  disp  hp drat    wt  qsec vs  am gear carb
2            Mazda RX4   21   6   160 110  3.9  2.62 16.46  0   1    4    4
3        Mazda RX4 Wag   21   6   160 110  3.9 2.875 17.02  0   1    4    4
4           Datsun 710 22.8   4   108  93 3.85  2.32 18.61  1   1    4    1
5       Hornet 4 Drive 21.4   6   258 110 3.08 3.215 19.44  1   0    3    1
6    Hornet Sportabout 18.7   8   360 175 3.15  3.44 17.02  0   0    3    2
7              Valiant 18.1   6   225 105 2.76  3.46 20.22  1   0    3    1
8           Duster 360 14.3   8   360 245 3.21  3.57 15.84  0   0    3    4
9            Merc 240D 24.4   4 146.7  62 3.69  3.19    20  1   0    4    2
10            Merc 230 22.8   4 140.8  95 3.92  3.15  22.9  1   0    4    2
11            Merc 280 19.2   6 167.6 123 3.92  3.44  18.3  1   0    4    4
12           Merc 280C 17.8   6 167.6 123 3.92  3.44  18.9  1   0    4    4
13          Merc 450SE 16.4   8 275.8 180 3.07  4.07  17.4  0   0    3    3
14          Merc 450SL 17.3   8 275.8 180 3.07  3.73  17.6  0   0    3    3
15         Merc 450SLC 15.2   8 275.8 180 3.07  3.78    18  0   0    3    3
16  Cadillac Fleetwood 10.4   8   472 205 2.93  5.25 17.98  0   0    3    4
17 Lincoln Continental 10.4   8   460 215    3 5.424 17.82  0   0    3    4
18   Chrysler Imperial 14.7   8   440 230 3.23 5.345 17.42  0   0    3    4
19            Fiat 128 32.4   4  78.7  66 4.08   2.2 19.47  1   1    4    1
20         Honda Civic 30.4   4  75.7  52 4.93 1.615 18.52  1   1    4    2
21      Toyota Corolla 33.9   4  71.1  65 4.22 1.835  19.9  1   1    4    1
22       Toyota Corona 21.5   4 120.1  97  3.7 2.465 20.01  1   0    3    1
23    Dodge Challenger 15.5   8   318 150 2.76  3.52 16.87  0   0    3    2
24         AMC Javelin 15.2   8   304 150 3.15 3.435  17.3  0   0    3    2
25          Camaro Z28 13.3   8   350 245 3.73  3.84 15.41  0   0    3    4
26    Pontiac Firebird 19.2   8   400 175 3.08 3.845 17.05  0   0    3    2
27           Fiat X1-9 27.3   4    79  66 4.08 1.935  18.9  1   1    4    1
28       Porsche 914-2   26   4 120.3  91 4.43  2.14  16.7  0   1    5    2
29        Lotus Europa 30.4   4  95.1 113 3.77 1.513  16.9  1   1    5    2
30      Ford Pantera L 15.8   8   351 264 4.22  3.17  14.5  0   1    5    4
31        Ferrari Dino 19.7   6   145 175 3.62  2.77  15.5  0   1    5    6
32       Maserati Bora   15   8   301 335 3.54  3.57  14.6  0   1    5    8
33          Volvo 142E 21.4   4   121 109 4.11  2.78  18.6  1   1    4    2

此时，由于我并没有加header=TRUE参数，所以系统自动给数据框加上了列名。

下面我们使用header=TRUE参数，

> y<-read.table('mtcars_1.txt',header=TRUE,sep=',')
> y
                     X  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1            Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
2        Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
3           Datsun 710 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
4       Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
5    Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
6              Valiant 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
7           Duster 360 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
8            Merc 240D 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
9             Merc 230 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
10            Merc 280 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
11           Merc 280C 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
12          Merc 450SE 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
13          Merc 450SL 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
14         Merc 450SLC 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
15  Cadillac Fleetwood 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
16 Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
17   Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
18            Fiat 128 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
19         Honda Civic 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
20      Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
21       Toyota Corona 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
22    Dodge Challenger 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
23         AMC Javelin 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
24          Camaro Z28 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
25    Pontiac Firebird 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
26           Fiat X1-9 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
27       Porsche 914-2 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
28        Lotus Europa 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
29      Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
30        Ferrari Dino 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
31       Maserati Bora 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
32          Volvo 142E 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
>

此时的列名变成了第一列

我们可以发现，有时候文件行数太多显示出来救刷屏了，因此此时我们可以使用head（）和tail()函数，截取前六行和末尾六行

> head(y)
                  X  mpg cyl disp  hp drat    wt  qsec vs am gear carb
1         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
2     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
3        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
4    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
5 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
6           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
> tail(y)
                X  mpg cyl  disp  hp drat    wt qsec vs am gear carb
27  Porsche 914-2 26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2
28   Lotus Europa 30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
29 Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
30   Ferrari Dino 19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
31  Maserati Bora 15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
32     Volvo 142E 21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2

也可以加上参数n

> head(y,n=8)
                  X  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1         Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
2     Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
3        Datsun 710 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
4    Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
5 Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
6           Valiant 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
7        Duster 360 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
8         Merc 240D 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
>

有时候在文件开头会包含一段介绍性文字，此时可以用skip参数跳过。

> y<-read.table('mtcars.csv',header=TRUE,sep=',',skip = 20)
> y
     Toyota.Corolla X33.9 X4 X71.1 X65 X4.22 X1.835 X19.9 X1 X1.1 X4.1 X1.2
1     Toyota Corona  21.5  4 120.1  97  3.70  2.465 20.01  1    0    3    1
2  Dodge Challenger  15.5  8 318.0 150  2.76  3.520 16.87  0    0    3    2
3       AMC Javelin  15.2  8 304.0 150  3.15  3.435 17.30  0    0    3    2
4        Camaro Z28  13.3  8 350.0 245  3.73  3.840 15.41  0    0    3    4
5  Pontiac Firebird  19.2  8 400.0 175  3.08  3.845 17.05  0    0    3    2
6         Fiat X1-9  27.3  4  79.0  66  4.08  1.935 18.90  1    1    4    1
7     Porsche 914-2  26.0  4 120.3  91  4.43  2.140 16.70  0    1    5    2
8      Lotus Europa  30.4  4  95.1 113  3.77  1.513 16.90  1    1    5    2
9    Ford Pantera L  15.8  8 351.0 264  4.22  3.170 14.50  0    1    5    4
10     Ferrari Dino  19.7  6 145.0 175  3.62  2.770 15.50  0    1    5    6
11    Maserati Bora  15.0  8 301.0 335  3.54  3.570 14.60  0    1    5    8
12       Volvo 142E  21.4  4 121.0 109  4.11  2.780 18.60  1    1    4    2
>

从第21 行读起。

有时候我们只需要文件的一部分，我们可以用nrows参数于skip参数结合就可以读取文按的任意部分了

> y<-read.table('mtcars.csv',header=TRUE,sep=',',skip = 20,nrows = 5)
> y
    Toyota.Corolla X33.9 X4 X71.1 X65 X4.22 X1.835 X19.9 X1 X1.1 X4.1 X1.2
1    Toyota Corona  21.5  4 120.1  97  3.70  2.465 20.01  1    0    3    1
2 Dodge Challenger  15.5  8 318.0 150  2.76  3.520 16.87  0    0    3    2
3      AMC Javelin  15.2  8 304.0 150  3.15  3.435 17.30  0    0    3    2
4       Camaro Z28  13.3  8 350.0 245  3.73  3.840 15.41  0    0    3    4
5 Pontiac Firebird  19.2  8 400.0 175  3.08  3.845 17.05  0    0    3    2
>

R语言初学者——文件读取（一）

Arguments

猜你喜欢