R语言初学者——文件读取(一)

本篇博客中将简要介绍几种常见文件读取的函数。

read.table()可以用来读取纯文本格式的文件 

read.table(file, header = FALSE, sep = "", quote = "\"'",
           dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),
           row.names, col.names, as.is = !stringsAsFactors,
           na.strings = "NA", colClasses = NA, nrows = -1,
           skip = 0, check.names = TRUE, fill = !blank.lines.skip,
           strip.white = FALSE, blank.lines.skip = TRUE,
           comment.char = "#",
           allowEscapes = FALSE, flush = FALSE,
           stringsAsFactors = default.stringsAsFactors(),
           fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)

read.csv(file, header = TRUE, sep = ",", quote = "\"",
         dec = ".", fill = TRUE, comment.char = "", ...)默认使用‘,’分割

read.csv2(file, header = TRUE, sep = ";", quote = "\"",
          dec = ",", fill = TRUE, comment.char = "", ...)
read.delim(file, header = TRUE, sep = "\t", quote = "\"",
           dec = ".", fill = TRUE, comment.char = "", ...)默认使用制表符分割 

read.delim2(file, header = TRUE, sep = "\t", quote = "\"",
            dec = ",", fill = TRUE, comment.char = "", ...)

Arguments

file

读取文件的名称或者路径,注意要用“/”

header

header=TRUE 表示将文件第一行作为列名

sep

数据文件的分隔符,read.table()的默认分隔符为sep=‘’,csv文件默认分隔符为‘,’。分隔符还有“”

quote 一组引用字符。要完全禁用引用,使用quote = ""。查看嵌入在引号中的引号上的行为。只对作为字符读取的列考虑引用,除非指定了colClasses,否则所有列都是字符。
dec

the character used in the file for decimal points.

numerals

string indicating how to convert numbers whose conversion to double precision would lose accuracy, see type.convert. Can be abbreviated. (Applies also to complex-number inputs.)

row.names

行名称的向量。可以是给出实际行名称的向量,或者给出包含行名称的表的列的单个数字,或者给出包含行名称的表列的名称的字符串。

如果有标题,第一行包含的字段比列数少一个,则输入中的第一列用于行名称。否则,如果缺少row.names,则对行进行编号。

扫描二维码关注公众号,回复: 5641517 查看本文章

使用row.names = NULL强制行编号。缺少或空行 row..names生成被认为是“自动”的行名称(不被as.matrix保存)。

col.names

a vector of optional names for the variables. The default is to use "V" followed by the column number.

as.is

the default behavior of read.table is to convert character variables (which are not converted to logical, numeric or complex) to factors. The variable as.is controls the conversion of columns not otherwise specified by colClasses. Its value is either a vector of logicals (values are recycled if necessary), or a vector of numeric or character indices which specify which columns should not be converted to factors.

Note: to suppress all conversions including those of numeric columns, setcolClasses = "character".

Note that as.is is specified per column (not per variable) and so includes the column of row names (if any) and any columns to be skipped.

na.strings

这是处理缺失值的参数,如果知道数据集中用什么函数代表缺失值,就可以用这个参数将缺失值换成na

colClasses

character. A vector of classes to be assumed for the columns. If unnamed, recycled as necessary. If named, names are matched with unspecified values being taken to be NA.

Possible values are NA (the default, when type.convert is used), "NULL" (when the column is skipped), one of the atomic vector classes (logical, integer, numeric, complex, character, raw), or "factor""Date" or "POSIXct". Otherwise there needs to be an as method (from package methods) for conversion from "character"to the specified formal class.

Note that colClasses is specified per column (not per variable) and so includes the column of row names (if any).

nrows 表示读取的行数
skip

表示跳过几行

check.names

logical. If TRUE then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names. If necessary they are adjusted (bymake.names) so that they are, and also to ensure that there are no duplicates.

fill

逻辑值,如果为真,则在行长度不相等的情况下,将隐式添加空白字段。看到的细节。

strip.white

logical. Used only when sep has been specified, and allows the stripping of leading and trailing white space from unquoted character fields (numeric fields are always stripped). See scan for further details (including the exact meaning of ‘white space’), remembering that the columns may include the row names.

blank.lines.skip

logical: if TRUE blank lines in the input are ignored.

comment.char

character: a character vector of length one containing a single character or an empty string. Use "" to turn off the interpretation of comments altogether.

allowEscapes

logical. Should C-style escapes such as \n be processed or read verbatim (the default)? Note that if not within quotes these could be interpreted as a delimiter (but not as a comment character). For more details see scan.

flush

logical: if TRUEscan will flush to the end of the line after reading the last of the fields requested. This allows putting comments after the last field.

stringsAsFactors

在默认情况下,字符型变量被转化为因子,我们并不总需要这样做,当设置stringsAsFactors=FALSE时,这种操作被禁止

fileEncoding

character string: if non-empty declares the encoding used on a file (not a connection) so the character data can be re-encoded. See the ‘Encoding’ section of the help for file, the ‘R Data Import/Export Manual’ and ‘Note’.

encoding

encoding to be assumed for input strings. It is used to mark character strings as known to be in Latin-1 or UTF-8 (see Encoding): it is not used to re-encode the input, but allows R to handle encoded strings in their native encoding (if one of those two). See ‘Value’ and ‘Note’.

text

character string: if file is not supplied and this is, then data are read from the value oftext via a text connection. Notice that a literal string can be used to include (small) data sets within R code.

skipNul

logical: should nuls be skipped?

...

Further arguments to be passed to read.table.

加粗部分为常用的参数。

下面我们演示一遍,在这之前,我将R中内置数据集写入CSV文件和txt文件中,便于我们操作

> getwd()
[1] "C:/Users/DELL/Documents"
> setwd('E:/R工作路径')#首先设置工作目录,否则输入输出的文件夹将是上一个。而且工作路径和文件目
#录不同时需要输入绝对路径。
> getwd()
[1] "E:/R工作路径"
> read.table('mtcars_1.txt',sep=',')#在我的Rstudio里如果不加sep=''就会报错,很希望有大佬能帮忙解释一下。
                    V1   V2  V3    V4  V5   V6    V7    V8 V9 V10  V11  V12
1                       mpg cyl  disp  hp drat    wt  qsec vs  am gear carb
2            Mazda RX4   21   6   160 110  3.9  2.62 16.46  0   1    4    4
3        Mazda RX4 Wag   21   6   160 110  3.9 2.875 17.02  0   1    4    4
4           Datsun 710 22.8   4   108  93 3.85  2.32 18.61  1   1    4    1
5       Hornet 4 Drive 21.4   6   258 110 3.08 3.215 19.44  1   0    3    1
6    Hornet Sportabout 18.7   8   360 175 3.15  3.44 17.02  0   0    3    2
7              Valiant 18.1   6   225 105 2.76  3.46 20.22  1   0    3    1
8           Duster 360 14.3   8   360 245 3.21  3.57 15.84  0   0    3    4
9            Merc 240D 24.4   4 146.7  62 3.69  3.19    20  1   0    4    2
10            Merc 230 22.8   4 140.8  95 3.92  3.15  22.9  1   0    4    2
11            Merc 280 19.2   6 167.6 123 3.92  3.44  18.3  1   0    4    4
12           Merc 280C 17.8   6 167.6 123 3.92  3.44  18.9  1   0    4    4
13          Merc 450SE 16.4   8 275.8 180 3.07  4.07  17.4  0   0    3    3
14          Merc 450SL 17.3   8 275.8 180 3.07  3.73  17.6  0   0    3    3
15         Merc 450SLC 15.2   8 275.8 180 3.07  3.78    18  0   0    3    3
16  Cadillac Fleetwood 10.4   8   472 205 2.93  5.25 17.98  0   0    3    4
17 Lincoln Continental 10.4   8   460 215    3 5.424 17.82  0   0    3    4
18   Chrysler Imperial 14.7   8   440 230 3.23 5.345 17.42  0   0    3    4
19            Fiat 128 32.4   4  78.7  66 4.08   2.2 19.47  1   1    4    1
20         Honda Civic 30.4   4  75.7  52 4.93 1.615 18.52  1   1    4    2
21      Toyota Corolla 33.9   4  71.1  65 4.22 1.835  19.9  1   1    4    1
22       Toyota Corona 21.5   4 120.1  97  3.7 2.465 20.01  1   0    3    1
23    Dodge Challenger 15.5   8   318 150 2.76  3.52 16.87  0   0    3    2
24         AMC Javelin 15.2   8   304 150 3.15 3.435  17.3  0   0    3    2
25          Camaro Z28 13.3   8   350 245 3.73  3.84 15.41  0   0    3    4
26    Pontiac Firebird 19.2   8   400 175 3.08 3.845 17.05  0   0    3    2
27           Fiat X1-9 27.3   4    79  66 4.08 1.935  18.9  1   1    4    1
28       Porsche 914-2   26   4 120.3  91 4.43  2.14  16.7  0   1    5    2
29        Lotus Europa 30.4   4  95.1 113 3.77 1.513  16.9  1   1    5    2
30      Ford Pantera L 15.8   8   351 264 4.22  3.17  14.5  0   1    5    4
31        Ferrari Dino 19.7   6   145 175 3.62  2.77  15.5  0   1    5    6
32       Maserati Bora   15   8   301 335 3.54  3.57  14.6  0   1    5    8
33          Volvo 142E 21.4   4   121 109 4.11  2.78  18.6  1   1    4    2

此时,由于我并没有加header=TRUE参数,所以系统自动给数据框加上了列名。

下面我们使用header=TRUE参数,

> y<-read.table('mtcars_1.txt',header=TRUE,sep=',')
> y
                     X  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1            Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
2        Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
3           Datsun 710 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
4       Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
5    Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
6              Valiant 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
7           Duster 360 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
8            Merc 240D 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
9             Merc 230 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
10            Merc 280 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
11           Merc 280C 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
12          Merc 450SE 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
13          Merc 450SL 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
14         Merc 450SLC 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
15  Cadillac Fleetwood 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
16 Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
17   Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
18            Fiat 128 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
19         Honda Civic 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
20      Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
21       Toyota Corona 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
22    Dodge Challenger 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
23         AMC Javelin 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
24          Camaro Z28 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
25    Pontiac Firebird 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
26           Fiat X1-9 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
27       Porsche 914-2 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
28        Lotus Europa 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
29      Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
30        Ferrari Dino 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
31       Maserati Bora 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
32          Volvo 142E 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
> 

此时的列名变成了第一列

我们可以发现,有时候文件行数太多显示出来救刷屏了,因此此时我们可以使用head()和tail()函数,截取前六行和末尾六行

> head(y)
                  X  mpg cyl disp  hp drat    wt  qsec vs am gear carb
1         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
2     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
3        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
4    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
5 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
6           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
> tail(y)
                X  mpg cyl  disp  hp drat    wt qsec vs am gear carb
27  Porsche 914-2 26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2
28   Lotus Europa 30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
29 Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
30   Ferrari Dino 19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
31  Maserati Bora 15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
32     Volvo 142E 21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2

也可以加上参数n

> head(y,n=8)
                  X  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1         Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
2     Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
3        Datsun 710 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
4    Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
5 Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
6           Valiant 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
7        Duster 360 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
8         Merc 240D 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
> 

有时候在文件开头会包含一段介绍性文字,此时可以用skip参数跳过。

> y<-read.table('mtcars.csv',header=TRUE,sep=',',skip = 20)
> y
     Toyota.Corolla X33.9 X4 X71.1 X65 X4.22 X1.835 X19.9 X1 X1.1 X4.1 X1.2
1     Toyota Corona  21.5  4 120.1  97  3.70  2.465 20.01  1    0    3    1
2  Dodge Challenger  15.5  8 318.0 150  2.76  3.520 16.87  0    0    3    2
3       AMC Javelin  15.2  8 304.0 150  3.15  3.435 17.30  0    0    3    2
4        Camaro Z28  13.3  8 350.0 245  3.73  3.840 15.41  0    0    3    4
5  Pontiac Firebird  19.2  8 400.0 175  3.08  3.845 17.05  0    0    3    2
6         Fiat X1-9  27.3  4  79.0  66  4.08  1.935 18.90  1    1    4    1
7     Porsche 914-2  26.0  4 120.3  91  4.43  2.140 16.70  0    1    5    2
8      Lotus Europa  30.4  4  95.1 113  3.77  1.513 16.90  1    1    5    2
9    Ford Pantera L  15.8  8 351.0 264  4.22  3.170 14.50  0    1    5    4
10     Ferrari Dino  19.7  6 145.0 175  3.62  2.770 15.50  0    1    5    6
11    Maserati Bora  15.0  8 301.0 335  3.54  3.570 14.60  0    1    5    8
12       Volvo 142E  21.4  4 121.0 109  4.11  2.780 18.60  1    1    4    2
> 

从第21 行读起。

有时候我们只需要文件的一部分,我们可以用nrows参数于skip参数结合就可以读取文按的任意部分了

> y<-read.table('mtcars.csv',header=TRUE,sep=',',skip = 20,nrows = 5)
> y
    Toyota.Corolla X33.9 X4 X71.1 X65 X4.22 X1.835 X19.9 X1 X1.1 X4.1 X1.2
1    Toyota Corona  21.5  4 120.1  97  3.70  2.465 20.01  1    0    3    1
2 Dodge Challenger  15.5  8 318.0 150  2.76  3.520 16.87  0    0    3    2
3      AMC Javelin  15.2  8 304.0 150  3.15  3.435 17.30  0    0    3    2
4       Camaro Z28  13.3  8 350.0 245  3.73  3.840 15.41  0    0    3    4
5 Pontiac Firebird  19.2  8 400.0 175  3.08  3.845 17.05  0    0    3    2
> 

猜你喜欢

转载自blog.csdn.net/qq_43264642/article/details/88312404