首先导入数据集并将矩阵形式转化成数据框形式
> WorldPhones
N.Amer Europe Asia S.Amer Oceania Africa Mid.Amer
1951 45939 21574 2876 1815 1646 89 555
1956 60423 29990 4708 2568 2366 1411 733
1957 64721 32510 5230 2695 2526 1546 773
1958 68484 35218 6662 2845 2691 1663 836
1959 71799 37598 6856 3000 2868 1769 911
1960 76036 40341 8220 3145 3054 1905 1008
1961 79831 43173 9053 3338 3224 2005 1076
> x<-as.data.frame(WorldPhones)
分别计算行和和列平均
> rs<-rowSums(x)
> rs
1951 1956 1957 1958 1959 1960 1961
74494 102199 110001 118399 124801 133709 141700
> mean<-colMeans(x)
> mean
N.Amer Europe Asia S.Amer Oceania Africa Mid.Amer
66747.5714 34343.4286 6229.2857 2772.2857 2625.0000 1484.0000 841.7143
在将其用cbind()和rbind()函数将其合并
> total<-cbind(x,rs)
> total_1<-rbind(total,mean)
> total_1
N.Amer Europe Asia S.Amer Oceania Africa Mid.Amer rs
1951 45939.00 21574.00 2876.000 1815.000 1646 89 555.0000 74494.00
1956 60423.00 29990.00 4708.000 2568.000 2366 1411 733.0000 102199.00
1957 64721.00 32510.00 5230.000 2695.000 2526 1546 773.0000 110001.00
1958 68484.00 35218.00 6662.000 2845.000 2691 1663 836.0000 118399.00
1959 71799.00 37598.00 6856.000 3000.000 2868 1769 911.0000 124801.00
1960 76036.00 40341.00 8220.000 3145.000 3054 1905 1008.0000 133709.00
1961 79831.00 43173.00 9053.000 3338.000 3224 2005 1076.0000 141700.00
8 66747.57 34343.43 6229.286 2772.286 2625 1484 841.7143 66747.57
我们发现,最后一列并没有计算平均,最后一个空值用第一个数替代了。
在R中提供了apply系列函数。
apply(X, MARGIN, FUN, ...)
Arguments
X |
一个向量,包括矩阵 |
MARGIN |
维度的下标,取值为1或者2 ,1代表对行进行处理,2代表对列进行处理 |
FUN |
是函数,表示要对数据进行的操作 |
... |
optional arguments to |
> apply(x,MARGIN = 1,FUN = sum)
1951 1956 1957 1958 1959 1960 1961
74494 102199 110001 118399 124801 133709 141700
> apply(x,MARGIN = 2,FUN=mean)
N.Amer Europe Asia S.Amer Oceania Africa
66747.5714 34343.4286 6229.2857 2772.2857 2625.0000 1484.0000
Mid.Amer
841.7143
lapply()返回的是列表,sapply()返回的是向量或者矩阵
lapply(X, FUN, ...)
sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)
replicate(n, expr, simplify = "array")
simplify2array(x, higher = TRUE)
Arguments
X |
对象 |
FUN |
函数 |
... |
optional arguments to |
simplify |
逻辑值或者是字符串 ; 结果是否该被简化为向量、矩阵或者高维数组 ,对于 |
USE.NAMES |
逻辑值; i如果为真并且X是字符型 使用X为结果的名字除非已经有名字了。 |
FUN.VALUE |
a (generalized) vector; a template for the return value from FUN. See ‘Details’. |
n |
integer: the number of replications. |
expr |
the expression (a language object, usually a call) to evaluate repeatedly. |
x |
一个列表一般是由lapply返回的 |
higher |
逻辑值;如果为真, |
如
> state.center
$`x`
[1] -86.7509 -127.2500 -111.6250 -92.2992 -119.7730
[6] -105.5130 -72.3573 -74.9841 -81.6850 -83.3736
[11] -126.2500 -113.9300 -89.3776 -86.0808 -93.3714
[16] -98.1156 -84.7674 -92.2724 -68.9801 -76.6459
[21] -71.5800 -84.6870 -94.6043 -89.8065 -92.5137
[26] -109.3200 -99.5898 -116.8510 -71.3924 -74.2336
[31] -105.9420 -75.1449 -78.4686 -100.0990 -82.5963
[36] -97.1239 -120.0680 -77.4500 -71.1244 -80.5056
[41] -99.7238 -86.4560 -98.7857 -111.3300 -72.5450
[46] -78.2005 -119.7460 -80.6665 -89.9941 -107.2560
$y
[1] 32.5901 49.2500 34.2192 34.7336 36.5341 38.6777 41.5928
[8] 38.6777 27.8744 32.3329 31.7500 43.5648 40.0495 40.0495
[15] 41.9358 38.4204 37.3915 30.6181 45.6226 39.2778 42.3645
[22] 43.1361 46.3943 32.6758 38.3347 46.8230 41.3356 39.1063
[29] 43.3934 39.9637 34.4764 43.1361 35.4195 47.2517 40.2210
[36] 35.5053 43.9078 40.9069 41.5928 33.6190 44.3365 35.6767
[43] 31.3897 39.1063 44.2508 37.5630 47.4231 38.4204 44.5937
[50] 43.0504
> lapply(state.center,FUN=length)
$`x`
[1] 50
$y
[1] 50
> class(lapply(state.center,FUN=length))
[1] "list"
> sapply(state.center,FUN=length)
x y
50 50
> class(sapply(state.center,FUN=length))
[1] "integer"
tapply()作用于因子
tapply(X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)
Arguments
X |
数据集 |
INDEX |
必须是一个因子数据类型,长度和X相同 ,利用这个因子来实现对X的分组 |
FUN |
函数 |
... |
optional arguments to |
default |
(only in the case of simplification to an array) the value with which the array is initialized as |
simplify |
logical; if |
> state.name
[1] "Alabama" "Alaska" "Arizona" "Arkansas"
[5] "California" "Colorado" "Connecticut" "Delaware"
[9] "Florida" "Georgia" "Hawaii" "Idaho"
[13] "Illinois" "Indiana" "Iowa" "Kansas"
[17] "Kentucky" "Louisiana" "Maine" "Maryland"
[21] "Massachusetts" "Michigan" "Minnesota" "Mississippi"
[25] "Missouri" "Montana" "Nebraska" "Nevada"
[29] "New Hampshire" "New Jersey" "New Mexico" "New York"
[33] "North Carolina" "North Dakota" "Ohio" "Oklahoma"
[37] "Oregon" "Pennsylvania" "Rhode Island" "South Carolina"
[41] "South Dakota" "Tennessee" "Texas" "Utah"
[45] "Vermont" "Virginia" "Washington" "West Virginia"
[49] "Wisconsin" "Wyoming"
> state.division
[1] East South Central Pacific Mountain West South Central
[5] Pacific Mountain New England South Atlantic
[9] South Atlantic South Atlantic Pacific Mountain
[13] East North Central East North Central West North Central West North Central
[17] East South Central West South Central New England South Atlantic
[21] New England East North Central West North Central East South Central
[25] West North Central Mountain West North Central Mountain
[29] New England Middle Atlantic Mountain Middle Atlantic
[33] South Atlantic West North Central East North Central West South Central
[37] Pacific Middle Atlantic New England South Atlantic
[41] West North Central East South Central West South Central Mountain
[45] New England South Atlantic Pacific South Atlantic
[49] East North Central Mountain
9 Levels: New England Middle Atlantic South Atlantic ... Pacific
利用第一个向量和第二个因子来实现对美国各个分区所含大洲的统计
> tapply(state.name,state.division,FUN=length)
New England Middle Atlantic South Atlantic East South Central
6 3 8 4
West South Central East North Central West North Central Mountain
4 5 7 8
Pacific
5
R中的中心化和标准化处理
中心化:各项数据减去数据平均值
标准化:各项数据除以标准差
使用scale函数
> scale(h,center=T,scale=T)
Population Income Illiteracy Life Exp Murder HS Grad Frost Area
Alabama -0.2200732 -0.9501180 1.09913001 -1.236374542 1.66820651 -1.1946743 -0.7660111 -0.62635214
Alaska -0.6346639 1.5643230 -0.03140371 -1.023017873 0.36563430 0.9548932 1.1417902 1.99851088
Arizona -0.3990488 -0.1035615 0.53386315 -0.005470684 -0.83410325 0.2270869 -0.8382763 -0.30718426
Arkansas -0.4120606 -1.1799777 0.72228544 0.084795599 -0.04570429 -1.3131544 -0.1156243 -0.62005622
California 2.0229261 0.4421218 -0.78509287 0.946428300 0.02285214 0.6079157 -0.7660111 -0.08861363
Colorado -0.3570795 0.2272123 -1.53878202 1.233639200 -1.17688541 0.7179330 1.3441327 -0.35630463
attr(,"scaled:center")
Population Income Illiteracy Life Exp Murder HS Grad Frost Area
5.340167e+03 4.640833e+03 1.516667e+00 7.055667e+01 1.023333e+01 5.541667e+01 7.300000e+01 1.737715e+05
attr(,"scaled:scale")
Population Income Illiteracy Life Exp Murder HS Grad Frost Area
7.839057e+03 1.070218e+03 5.307228e-01 1.218617e+00 2.917305e+00 1.181633e+01 6.918959e+01 1.964765e+05