Download and related preparation
windows operating system
R download: https://www
.r-project.org/
Rstudio download: https://www.rstudio.com/products/rstudio/download/
Linux operating system CENTOS
R is already downloaded Rstudio after wget, the rpm file will not compile TAT without root privileges
Study book recommendation: https://xccds1977.blogspot.sg/2013/02/r.html
Reference: http://staff.ustc.edu.cn/~zwp/teach/Stat-Comp/R4beg_cn_2.0.pdf
Going to the library to borrow "R in action"...
object
When R is running, all variables, data, functions and results are stored in the computer's memory activities in the form of objects, and are given corresponding names and codes.
Object's intrinsic properties:
Type mode() Numeric, Character, Complex, Logical
length length() the number of elements in the object
> x <- 1 > mode(x) [1] "numeric" > length(x) [1] 1 > x <- "helloworld"; y <- TRUE; Z <- i Error: object 'i' not found > x <- "helloworld"; y <- TRUE; Z <- 1i > mode(x);mode(y);mode(Z) [1] "character" [1] "logical" [1] "complex"
Object operations:
Assignment/Modification: <-
remove: rm()
> rm(Z) > ls() [1] "x" "y" > rm(list=ls()) > ls() character(0)
Inf: +∞ -Inf: -∞ NaN: Not a Number
Classification of objects:
Vector: distinction between external attributes dim and length,
factor (numeric or character), array, matrix (two-dimensional array),
Data frame: Consists of one or several vectors and/or factors, must be present, but can be of different data types,
Time series: contains additional attributes such as frequency, time, etc.
List: Can contain any type of object.
file read and write
R-readable data: data stored in text files (ASCII)
Other format files (Excel, SAS, SPAA) and access to SQL-type databases - advanced applications
Read:
read.table()
> mydata <- read.table("C:/data/test_data.txt") #Create a data frame mydata Warning message: In read.table("C:/data/test_data.txt") : incomplete final line found by readTableHeader on 'C:/data/test_data.txt' > View(mydata) # Each variable in the data frame is named, the default is V1, V2... > mydata$V1;mydata["V1"];mydata[,1] #Access variables individually [1] 1 2 #Vector V1 1 1 2 2 #dataframe [1] 1 2 #Vector
Default Default:
read.table(file, header = FALSE, sep = "", quote = "\"’", dec = ".",row.names, col.names, as.is = FALSE, na.strings = "NA",colClasses = NA, nrows = -1,skip = 0, check.names = TRUE, fill = !blank.lines.skip,strip.white = FALSE, blank.lines.skip = TRUE,comment.char = "#")
file |
A filename (enclosed in "", or using a character variable) or a URL link (http://...) (for remote access to the file using a URL) |
header | Reflects whether the first line of this file contains variable names |
sep | field separator |
quote | Specifies the character used to store character data |
dec | character used to represent the decimal point |
row.names | A vector holding the line names, or the serial number or name of a variable in the file, by default the line number is 1, 2, 3, . . . |
col.names |
character vector specifying column names (default: V1, V2, V3, . . . ) |
as.is | Controls whether to convert character variables to factor variables (if the value is FALSE), or to keep them as characters (TRUE) as.is can be a logical, numeric or character vector, used to determine whether the variable is reserved as a character |
na.strings | Values representing missing data (converted to NA) |
colClasses | a character vector specifying the data type of each column |
nrows | Maximum number of lines that can be read (negative values are ignored) |
skip | Number of rows to skip before reading data |
check.names | If TRUE, check if the variable name is valid in R |
fill | If TRUE and not all rows have the same number of variables, fill with blanks |
strip.white | If sep is specified, if TRUE, removes extra spaces before and after character variables |
blank.lines.skip | If TRUE, ignore blank lines |
comment.char | 一个字符用来在数据文件中写注释,以这个字符开头的行将被忽略(要禁用这个参数,可使用comment.char = "") |
scan()
可用于创建不同的对象,向量,矩阵...
> mydata <- scan("data.dat", what = list("", 0, 0)) #读取了文件data.dat中三个变量,第一个是字符型变量,后两个是数值型变量
scan(file = "", what = double(0), nmax = -1, n = -1, sep = "",quote = if (sep=="\n") "" else "’\"", dec = ".",skip = 0, nlines = 0, na.strings = "NA",flush = FALSE, fill = FALSE, strip.white = FALSE, quiet = FALSE,blank.lines.skip = TRUE, multi.line = TRUE, comment.char = "")
what | 指定数据的类型(缺省值为数值型) |
nmax | 要读取数据的最大数量,如果what是一个列表,nmax则是可以读取的行数 (在缺省情况下,scan读取到文件最末端为止的所有数据) |
n | 要读取数据的最大数量(在缺省情况下,没有限制) |
read.fwf():来读取文件中一些固定宽度格式的数据
> mydata <- read.fwf("C:/data/test_data.txt",widths=c(1,5,3,2)
原始数据: 处理后:
存储:
write.table()
> d <- data.frame(obs=c(1,2,3),treat=c("A","B","C"),weight=c(2.3,NA,9)) > write.table(d,file="C:/data/test_data.txt") > View(d) > write.table(d,file="C:/data/test_data.txt",row.names=F,quote=F,sep="\t")
存储数据: 列表显示:
qppend | 如果为TRUE则在写入数据时不删除目标文件中可能已存在的数据,采取往后添加的方式 |
quote | 一个逻辑型或者数值型向量:如果为TRUE,则字符型变量和因子写在双引号""中; 若quote是数值型向量则代表将欲写在""中的那些列的列标。 (两种情况下变量名都会被写在""中;若quote = FALSE则变量名不包含在双引号中) |
row.names | 一个逻辑值,决定行名是否写入文件;或指定要作为行名写入文件的字符型向量 |
col.names | 一个逻辑值(决定列名是否写入文件);或指定一个要作为列名写入文件中的字符型向量 |
save():保存为R专有的文件格式
> save(d,file="C:/data/test_data.Rdata") > setwd("C:/data") #定义路径 > load("test_data.Rdata") #加载到内存中