R 描述性统计分析

  • 中位数:
quantile(iris$Sepal.Length)
  0%  25%  50%  75% 100% 
 4.3  5.1  5.8  6.4  7.9 


quantile(iris$Sepal.Length,seq(0,1,by=0.1))
  0%  10%  20%  30%  40%  50%  60%  70%  80%  90% 100% 
4.30 4.80 5.00 5.27 5.60 5.80 6.10 6.30 6.52 6.90 7.90 
  • 分布形态-----偏态  >0,偏右。
> library(fBasics)
Loading required package: timeDate
Loading required package: timeSeries
> skewness(iris$Sepal.Length)
[1] 0.3086407
attr(,"method")
[1] "moment"
  • 分类统计函数:Hmisc :: summary()   默认:fun=mean
 mystats <- function(x) c(Median=median(x,na.rm=T),IQR=IQR(x,na.rm=T))
 summary(mpg~cyl+hp,data=mtcars,fun=mystats,method='response')
mpg     N= 32 

+-------+---------+--+------+-----+
|       |         |N |Median|IQR  |
+-------+---------+--+------+-----+
|cyl    |4        |11|26.00 |7.600|
|       |6        | 7|19.70 |2.350|
|       |8        |14|15.20 |1.850|
+-------+---------+--+------+-----+
|hp     |[ 52, 97)| 8|26.65 |6.900|
|       |[ 97,150)| 9|21.00 |2.200|
|       |[150,205)| 8|16.85 |3.400|
|       |[205,335]| 7|14.30 |3.000|
+-------+---------+--+------+-----+
|Overall|         |32|19.20 |7.375|
+-------+---------+--+------+-----+

summary(mpg~cyl+hp,data=mtcars,fun=quantile,method='response')
mpg     N= 32 

+-------+---------+--+----+------+-----+------+----+
|       |         |N |0%  |25%   |50%  |75%   |100%|
+-------+---------+--+----+------+-----+------+----+
|cyl    |4        |11|21.4|22.800|26.00|30.400|33.9|
|       |6        | 7|17.8|18.650|19.70|21.000|21.4|
|       |8        |14|10.4|14.400|15.20|16.250|19.2|
+-------+---------+--+----+------+-----+------+----+
|hp     |[ 52, 97)| 8|22.8|24.000|26.65|30.900|33.9|
|       |[ 97,150)| 9|17.8|19.200|21.00|21.400|30.4|
|       |[150,205)| 8|15.2|15.425|16.85|18.825|19.7|
|       |[205,335]| 7|10.4|11.850|14.30|14.850|15.8|
+-------+---------+--+----+------+-----+------+----+
|Overall|         |32|10.4|15.425|19.20|22.800|33.9|
+-------+---------+--+----+------+-----+------+----+
> 
summary(cyl~mpg+hp,data=mtcars,method='reverse')


Descriptive Statistics by cyl

+---+--------------------+--------------------+--------------------+
|   |4                   |6                   |8                   |
|   |(N=11)              |(N=7)               |(N=14)              |
+---+--------------------+--------------------+--------------------+
|mpg|  22.80/26.00/30.40 |  18.65/19.70/21.00 |  14.40/15.20/16.25 |
+---+--------------------+--------------------+--------------------+
|hp | 65.50/ 91.00/ 96.00|110.00/110.00/123.00|176.25/192.50/241.25|
+---+--------------------+--------------------+--------------------+
> 
 summary(mpg~cyl+hp,data=mtcars,method='cross',fun=var)

 var by cyl, hp 

+---+
|N  |
|mpg|
+---+
+---+---------+---------+---------+---------+---------+
|cyl|[ 52, 97)|[ 97,150)|[150,205)|[205,335]|   ALL   |
+---+---------+---------+---------+---------+---------+
|4  | 8       | 3       | 0       | 0       |11       |
|   |18.494286|26.703333|         |         |20.338545|
+---+---------+---------+---------+---------+---------+
|6  | 0       | 6       | 1       | 0       | 7       |
|   |         | 2.535000|         |         | 2.112857|
+---+---------+---------+---------+---------+---------+
|8  | 0       | 0       | 7       | 7       |14       |
|   |         |         | 2.764762| 4.804762| 6.553846|
+---+---------+---------+---------+---------+---------+
|ALL| 8       | 9       | 8       | 7       |32       |
|   |18.494286|13.743611| 3.431429| 4.804762|36.324103|
+---+---------+---------+---------+---------+---------+
> 

获取统计概括信息  describe函数

describe(mtcars)
mtcars 

 11  Variables      32  Observations
----------------------------------------------------------------------------
mpg 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
      32        0       25    0.999    20.09    6.796    12.00    14.34 
     .25      .50      .75      .90      .95 
   15.43    19.20    22.80    30.09    31.30 

lowest : 10.4 13.3 14.3 14.7 15.0, highest: 26.0 27.3 30.4 32.4 33.9
----------------------------------------------------------------------------
cyl 
       n  missing distinct     Info     Mean      Gmd 
      32        0        3    0.866    6.188    1.948 
                            
Value          4     6     8
Frequency     11     7    14
Proportion 0.344 0.219 0.438
----------------------------------------------------------------------------
disp 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
      32        0       27    0.999    230.7    142.5    77.35    80.61 
     .25      .50      .75      .90      .95 
  120.83   196.30   326.00   396.00   449.00 

lowest :  71.1  75.7  78.7  79.0  95.1, highest: 360.0 400.0 440.0 460.0 472.0
----------------------------------------------------------------------------
hp 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
      32        0       22    0.997    146.7    77.04    63.65    66.00 
     .25      .50      .75      .90      .95 
   96.50   123.00   180.00   243.50   253.55 

lowest :  52  62  65  66  91, highest: 215 230 245 264 335
----------------------------------------------------------------------------
drat 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
      32        0       22    0.997    3.597   0.6099    2.853    3.007 
     .25      .50      .75      .90      .95 
   3.080    3.695    3.920    4.209    4.314 

lowest : 2.76 2.93 3.00 3.07 3.08, highest: 4.08 4.11 4.22 4.43 4.93
----------------------------------------------------------------------------
wt 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
      32        0       29    0.999    3.217    1.089    1.736    1.956 
     .25      .50      .75      .90      .95 
   2.581    3.325    3.610    4.048    5.293 

lowest : 1.513 1.615 1.835 1.935 2.140, highest: 3.845 4.070 5.250 5.345 5.424
----------------------------------------------------------------------------
qsec 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
      32        0       30        1    17.85    2.009    15.05    15.53 
     .25      .50      .75      .90      .95 
   16.89    17.71    18.90    19.99    20.10 

lowest : 14.50 14.60 15.41 15.50 15.84, highest: 19.90 20.00 20.01 20.22 22.90
----------------------------------------------------------------------------
vs 
       n  missing distinct     Info      Sum     Mean      Gmd 
      32        0        2    0.739       14   0.4375   0.5081 

----------------------------------------------------------------------------
am 
       n  missing distinct     Info      Sum     Mean      Gmd 
      32        0        2    0.724       13   0.4062    0.498 

----------------------------------------------------------------------------
gear 
       n  missing distinct     Info     Mean      Gmd 
      32        0        3    0.841    3.688   0.7863 
                            
Value          3     4     5
Frequency     15    12     5
Proportion 0.469 0.375 0.156
----------------------------------------------------------------------------
carb 
       n  missing distinct     Info     Mean      Gmd 
      32        0        6    0.929    2.812    1.718 
                                              
Value          1     2     3     4     6     8
Frequency      7    10     3    10     1     1
Proportion 0.219 0.312 0.094 0.312 0.031 0.031
-
  • 数据可视化 caret ::featurePlot()  无缺失值数据
  • library(caret) 
    str(iris)
    'data.frame':	150 obs. of  5 variables:
     $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
     $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
     $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
     $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
     $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
    > featurePlot(iris[,1:4],iris[,5],'ellipse')
    

plot(iris)
plot(iris$Sepal.Length)
plot(iris$Species)
with(iris,{plot(Sepal.Length,Sepal.Width,pch=as.numeric(Species))
            legend('topright',legend=levels(iris$Species),pch=1:3,ncol=3,cex=0.8)})

因子类型数据可用函数:mosaicplot()进行绘制 

table(iris$Species)

    setosa versicolor  virginica 
        50         50         50 
> 

猜你喜欢

转载自blog.csdn.net/baibingbingbing/article/details/81296070