Jeroen :
I have a dataframe with a column that contains free text entries on years of education. From the free text entries I want to extract all of the numbers and sum them.
Example: data_en$educationTxt[1] gives "6 primary school 10 highschool"
With the following code I can extract both numbers and sum them.
library(stringr)
x <- as.numeric(str_extract_all(data_en$education[1], "[0-9A]+")[[1]])
x <- as.vector(x)
x <- sum(x)
However, I would ideally like to do this for all free text entries (i.e. each row) and subsequently add the results to the dataframe per row (i.e. in a variable such as data_en$educationNum). I'm a bit stuck on how to proceed.
nurandi :
You can use sapply
:
data_en$educationNum <- sapply(str_extract_all(data_en$education, "[0-9]+"),
function(i) sum(as.numeric(i)))
data_en
# education educationNum
# 1 6 primary school 10 highschool 16
# 2 10 primary school 2 highschool 12
# 3 no school 0
Data
data_en <- data.frame(education = c("6 primary school 10 highschool",
"10 primary school 2 highschool",
"no school"))
Guess you like
Origin http://10.200.1.11:23101/article/api/json?id=398687&siteId=1