Statistical data packet family of functions --apply family usage and experience

The author Message: apply family of powerful, practical, can replace a lot of loop, R language Do not use the loop.

Original link:  https://blog.csdn.net/sinat_26917383/article/details/51086663


Function name features
apply rows and columns of a simple arithmetic mean operation, summation, several other public
tapply = table apply function added above table apply to be binding group summary table, the packet may be summarized
lapply = list apply data frame format needs may be used in combination with the list, returns a list list usage still
sapply = simplify apply = unlist (lapply ) data frame format needs to be combined with the list, is returned as lapply matrix, but may output a matrix format

 


apply

Apply Functions Over Array Margins

Using an array of rows or columns function

apply(X, MARGIN, FUN, ...)

lapply

Apply a Function over a List or Vector

Use the list or vector function

lapply(X, FUN, ...)

sapply

Apply a Function over a List or Vector

Use the list or vector function

sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)

vapply

Apply a Function over a List or Vector

Use the list or vector function

vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)

tapply

Apply a Function Over a Ragged Array

Irregular arrays using the function

tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)

eapply

Apply a Function Over Values in an Environment

In the environment using the function values

eapply(env, FUN, ..., all.names = FALSE, USE.NAMES = TRUE)

mapply

Apply a Function to Multiple List or Vector Arguments

A plurality of function or use of the vector parameter list

mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)

rapply

Recursively Apply a Function to a List

Using recursive function to generate a list

rapply(object, f, classes = "ANY", deflt = NULL,how = c("unlist", "replace", "list"), ...)

 


1, apply the function
to calculate an array, a matrix of vertical and horizontal calculation (sum, average, etc.) row by row or by

Wherein apply, the line is equal to 1, equal to 2 columns

> ma <- matrix(c(1:4, 1, 6:8), nrow = 2)
> ma
[,1] [,2] [,3] [,4]
[1,] 1 3 1 7
[2,] 2 4 6 8
> apply(ma, c(1,2), sum)
[,1] [,2] [,3] [,4]
[1,] 1 3 1 7
[2,] 2 4 6 8
> apply(ma, 1, sum)
[1] 12 20
> apply(ma, 2, sum)
[1] 3 7 7 15

> tapply(1:17, fac, sum, simplify = FALSE)
$`1`
[1] 51
$`2`
[1] 57
$`3`
[1] 45
$`4`
NULL
$`5`
NULL
> tapply(1:17, fac, range)
$`1`
[1] 1 16
$`2`
[1] 2 17
$`3`
[1] 3 15
$`4`
NULL
$`5`
NULL


2, tapply
(grouped statistics)

tapply (X-, the INDEX, FUN = NULL, ..., Simplify = TRUE)
# x in the fun at Classified index
# Example: the factor of x in the classification, aggregated operation
fac <- factor (rep (1 : 3 , length =. 4), =. 1 Levels:. 5)
FAC
[. 1] 2. 3. 1. 1
Levels: 2. 1. 5. 4. 3
tapply (. 1:. 4, FAC, SUM)
. 1 2. 5. 4. 3 
. 5. 3 NA NA 2 

 


# When the index is not a factor, can be used as.factor () cast into the factor parameter

Additional cases, fulfill the functions of the PivotTable excel

# Tapply realize the function using the PivotTable similar to excel in the:
> DA
year Sale Province
. 1. 1 A 2007
2 2007 B 2
. 3 C 2007. 3
. 4 D 2007. 4
. 5. 5 A 2008
. 6 C 2008. 6
. 7 D 2008. 7
. 8 2009 B. 8
. 9 2009 C. 9
10 2009 D 10
> the attach (DA)
> tapply (sale, List (year, province)) # to sale as a group, according to year, province order, arrangement
[1] 1471028 12 is. 9. 6. 11
> tapply (Sale, List (year, Province), Mean)
ABCD
2007. 4. 3. 1 2
2008. 5. 6. 7 NA
2009 NA. 9 10. 8


3, Table function (frequency of occurrence of seek factor)
has the format:
Table (..., the exclude = IF (useNA == "NO") C (NA, NaN3), useNA = C ( "NO",
"ifany" , "always"), dnn = list.names (...), deparse.level = 1)
wherein exclude indicates which factor is not calculated.
Sample code:
> D <- factor (REP (C ( "A", "B", "C"), 10), Levels = C ( "A", "B", "C", "D", " E "))
> D
[. 1] ABCABCABCABCABCABCABC ABCABCABC
Levels: ABCDE
> Table (D, the exclude =" B ")
D
ACDE
10 10 0 0

 


4, the function of the function sapply lapply
each column of data using the same form of a function, such as seek median score variable X, such as X variable demand cycle function.

lapply use the format:

lapply(X, FUN, ...)

lapply and a return value is and X have the same length of the object list,

The object list each element is a function FUN is applied to each element of X.

Wherein X is a List object (for each element in the list is a vector),

Automatically converted to other types of object type list is by the function R as.list ().

 

 

Sapply function is a special case of lapply function, the values ​​of some parameters were defined using the format:

sapply(X, FUN,..., simplify = TRUE, USE.NAMES = TRUE)

sapply (*, simplify = FALSE, USE.NAMES = FALSE) and lapply (*) is the same as the return value.

If the parameter simplify = TRUE, the function return value is not sapply a list, but a matrix;

If simplify = FALSE, the return value of the function sapply is still a list.

x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE))
> lapply(x, quantile)
$a
0% 25% 50% 75% 100%
1.00 3.25 5.50 7.75 10.00

$beta
0% 25% 50% 75% 100%
0.04978707 0.25160736 1.00000000 5.05366896 20.08553692

$logic
0% 25% 50% 75% 100%
0.0 0.0 0.5 1.0 1.0

> sapply(x, quantile,simplify=FALSE,use.names=FALSE)
$a
0% 25% 50% 75% 100%
1.00 3.25 5.50 7.75 10.00

$beta
0% 25% 50% 75% 100%
0.04978707 0.25160736 1.00000000 5.05366896 20.08553692

$logic
0% 25% 50% 75% 100%
0.0 0.0 0.5 1.0 1.0
#参数simplify=TRUE的情况
> sapply(x, quantile)
a beta logic
0% 1.00 0.04978707 0.0
25% 3.25 0.25160736 0.0
50% 5.50 1.00000000 0.5
75% 7.75 5.05366896 1.0
100% 10.00 20.08553692 1.0

5, the function mapply
function is a modified version of the function sapply mapply of mapply function FUN is sequentially applied to the first element of each parameter, the second element, the third element. Mapply function using the following format:

mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE,USE.NAMES = TRUE)

MoreArgs parameter which represents the parameter list of the function FUN.


> Mapply (REP, Times =. 1:. 4, X =. 4:. 1)
[[. 1]]
[. 1]. 4

[[2]]
[. 1]. 3. 3

[[. 3]]
[. 1] 2 2 2

[[. 4 ]]
[1] 1 1 1 1

# rep functions directly results:
> rep (1: 4,1:. 4)
[1] 2 1 2. 4. 3. 4. 3. 4. 3. 4


6, vapply {base} - proceed as a function variables

vapply sapply similar function, but its return value predefined type, so it will be safer to use, sometimes faster.

In vapply function will always be simplified, vapply FUN detects all values ​​are compatible with FUN.VALUE,

So that they have the same length and type. Type sequence: logical, integer, real, complex

 

vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)

 

X represents a vector or expression objects, the remaining objects are cast list by as.list

simplify logical value or a character string, if possible, should be simplified as a result of vectors, matrices, or high dimensional array.

Must be named, it can not be abbreviated. The default value is TRUE, the if appropriate will return a vector or a matrix. If simplify = "array", will return a result array.

USE.NAMES logical value, if TRUE, and x is not named, then the x-named.

FUN.VALUE a universal vector, FUN function returns worth template.


> X <-data.frame (rnorm A = (4,4,4), B = rnorm (4,5,3), rnorm C = (4,5,3))
> vapply (X, Mean, C ( C = 0))
 A B C
 1.8329043 6.0442858 -0.1437202
> K <-function (X)
+ {
+ List (Mean (X), SD (X))
+}
> vapply (X, K, C (C = 0))
error in vapply (x, k, c ( c = 0)): length value required for the 1,
 but the length FUN (X [[1]] ) result is 2
> vapply (X, K, C (C = 0 , b = 0))
error in vapply (x, k, c ( c = 0, b = 0)): the type required value is 'double',
 but the type of FUN (X [[1]] ) result is 'List'
> vapply (X, K, C (List (C = 0, B = 0)))
 A B C        
C -0.1437202 1.832904 6.044286
B 1.257834 1.940433 3.649194 

The difference between sapply and vapply function:

> i39 <- sapply(3:9, seq)
> i39
[[1]]
[1] 1 2 3

[[2]]
[1] 1 2 3 4

[[3]]
[1] 1 2 3 4 5

[[4]]
[1] 1 2 3 4 5 6

[[5]]
[1] 1 2 3 4 5 6 7

[[6]]
[1] 1 2 3 4 5 6 7 8

[[7]]
[1] 1 2 3 4 5 6 7 8 9

> sapply(i39, fivenum)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1.0 1.0 1 1.0 1.0 1.0 1
[2,] 1.5 1.5 2 2.0 2.5 2.5 3
[3,] 2.0 2.5 3 3.5 4.0 4.5 5
[4,] 2.5 3.5 4 5.0 5.5 6.5 7
[5,] 3.0 4.0 5 6.0 7.0 8.0 9
> vapply(i39, fivenum,
+ c(Min. = 0, "1st Qu." = 0, Median = 0, "3rd Qu." = 0, Max. = 0))
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
Min. 1.0 1.0 1.0 1.0 1.0. 1. 1
1st Qu-. 2 for 1.5 for 1.5. 3 2.0 2.5 2.5
Median 2.0 2.5 4.0 4.5 of 5. 3. 5 3.5 of
3rd Qu-.. 4 5.0 5.5 6.5 2.5 3.5 of. 7
Max.. 5 3.0 4.0 6.0 7.0 8.0. 9

 

 

7、eapply {base}

After eapply environment named function value calculated by the FUN returns a list of values, the user may request all the named objects used.

eapply(env, FUN, ..., all.names = FALSE, USE.NAMES = TRUE)

env environment to be used

all.names logical value indicating whether the function for all values

USE.NAMES logical value indicating whether the returned list of results contain named


> require(stats)
>
> env <- new.env(hash = FALSE) # so the order is fixed
> env$a <- 1:10
> env$beta <- exp(-3:3)
> env$logic <- c(TRUE, FALSE, FALSE, TRUE)
> # what have we there?
> utils::ls.str(env)
a :  int [1:10] 1 2 3 4 5 6 7 8 9 10
beta :  num [1:7] 0.0498 0.1353 0.3679 1 2.7183 ...
logic :  logi [1:4] TRUE FALSE FALSE TRUE
>
> # compute the mean for each list element
>        eapply(env, mean)
$logic
[1] 0.5
 
$beta
[1] 4.535125
 
$a
[1] 5.5
 
> unlist(eapply(env, mean, USE.NAMES = FALSE))
[1] 0.500000 4.535125 5.500000
>
> # median and quartiles for each element (making use of "..." passing):
> eapply(env, quantile, probs = 1:3/4)
$logic
25% 50% 75%
0.0 0.5 1.0
 
$beta
      25%       50%       75%
0.2516074 1.0000000 5.0536690
 
$a
 25%  50%  75%
3.25 5.50 7.75
 
> eapply(env, quantile)
$logic
  0%  25%  50%  75% 100%
 0.0  0.0  0.5  1.0  1.0
 
$beta
         0%         25%         50%         75%        100%
 0.04978707  0.25160736  1.00000000  5.05366896 20.08553692
 
$a
   0%   25%   50%   75%  100%
 1.00  3.25  5.50  7.75 10.00


. 8, rapply Base {}
rapply is a recursive version of lapply

rapply(X, FUN, classes = "ANY", deflt = NULL, how = c("unlist", "replace", "list"), ...)

A list of X

classes on character vector class name, or when any of the matches of any kind

deflt default result, if you use how = "replace", you can not use

how string matching three possible results
----------------
Disclaimer: This article is CSDN blogger "Wu-ethylhexyl 'original article, follow the CC 4.0 by-sa copyright agreement, reprint Please include links to the original source and this statement.
Original link: https: //blog.csdn.net/sinat_26917383/article/details/51086663

Guess you like

Origin www.cnblogs.com/triple-y/p/11411526.html