R语言之拿到数据要做的第一件事

当我们要处理数据时,我们第一件事是看一下数据的结构是怎么样的,各是什么数据类型,每一个变量各自的值有哪些,他们的分布是怎么样的。

我们的数据长这个样子:

                                V1                               V2
1 7063b3d0c075a4d276c5f06f4327cf4a effb071415be51f11e845884e67c0f8c
2 0db66c0dd3993fd3504bb98c3beb15b3 f87ff481d85d2f95335ab602f38a7655
3 f8c065dc140ec74c6e44144164e618e3 8a27d9a6c59628c991c154e8d93f412e
4 2c6082cf0d68e244f2a10325e8d1b85b ecea5fe33e6817d09c395f2910479728
5 2c6082cf0d68e244f2a10325e8d1b85b 31a3d0420d89c9b121bb55dbdbbeda6b
          V3 V4       V5
1 1426406400  1 20150315
2 1426417200  1 20150315
3 1426406400  2 20150315
4 1426417200  1 20150315
5 1426417200  1 20150315

一步步来,首先,数据的结构信息:

str(c)

得到:

'data.frame':   5 obs. of  5 variables:
 $ V1: Factor w/ 349946 levels "0000110e00f7c85f550b329dc3d76210",..: 153443 18715 340011 60216 60216
 $ V2: Factor w/ 10278 levels "00088cb1e6d740fcd42bc8ff2673c805",..: 9613 9962 5650 9482 2041
 $ V3: int  1426406400 1426417200 1426406400 1426417200 1426417200
 $ V4: int  1 1 2 1 1
 $ V5: int  20150315 20150315 20150315 20150315 20150315

接着,我们要知道各变量的分布信息,可以使用summary函数或者describe函数:

describe(c)

得到:

 5  Variables      5  Observations
-----------------------------------------------------------------------------
V1 
      n missing  unique 
      5       0       4 

0db66c0dd3993fd3504bb98c3beb15b3 (1, 20%) 
2c6082cf0d68e244f2a10325e8d1b85b (2, 40%) 
7063b3d0c075a4d276c5f06f4327cf4a (1, 20%) 
f8c065dc140ec74c6e44144164e618e3 (1, 20%) 
-----------------------------------------------------------------------------
V2 
      n missing  unique 
      5       0       5 

31a3d0420d89c9b121bb55dbdbbeda6b (1, 20%) 
8a27d9a6c59628c991c154e8d93f412e (1, 20%) 
ecea5fe33e6817d09c395f2910479728 (1, 20%) 
effb071415be51f11e845884e67c0f8c (1, 20%) 
f87ff481d85d2f95335ab602f38a7655 (1, 20%) 
-----------------------------------------------------------------------------
V3 
        n   missing    unique      Info      Mean 
        5         0         2      0.75 1.426e+09 

1426406400 (2, 40%), 1426417200 (3, 60%) 
-----------------------------------------------------------------------------
V4 
      n missing  unique    Info    Mean 
      5       0       2     0.5     1.2 

1 (4, 80%), 2 (1, 20%) 
-----------------------------------------------------------------------------
V5 
       n  missing   unique     Info     Mean 
       5        0        1        0 20150315 
-----------------------------------------------------------------------------

猜你喜欢

转载自blog.csdn.net/melon0014/article/details/51471737