Exctract number and sum number from free text input, add to df

Jeroen :

I have a dataframe with a column that contains free text entries on years of education. From the free text entries I want to extract all of the numbers and sum them.

Example: data_en$educationTxt[1] gives "6 primary school 10 highschool"

With the following code I can extract both numbers and sum them.

library(stringr)
x <- as.numeric(str_extract_all(data_en$education[1], "[0-9A]+")[[1]])
x <- as.vector(x)
x <- sum(x)

However, I would ideally like to do this for all free text entries (i.e. each row) and subsequently add the results to the dataframe per row (i.e. in a variable such as data_en$educationNum). I'm a bit stuck on how to proceed.

nurandi :

You can use sapply:

data_en$educationNum <- sapply(str_extract_all(data_en$education, "[0-9]+"), 
       function(i) sum(as.numeric(i)))

data_en
#                        education educationNum
# 1 6 primary school 10 highschool           16
# 2 10 primary school 2 highschool           12
# 3                      no school            0

Data

data_en <- data.frame(education = c("6 primary school 10 highschool",
                      "10 primary school 2 highschool",
                      "no school"))

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=398687&siteId=1