目前一心想入门数据分析岗位的我,很好奇目前数据分析岗位的现状,故而准备爬取Boss直聘里上海目前数据分析岗位的情况。
上源代码:
library(xml2) library(rvest) library(stringr) library(dplyr) i <- 1:10 job_inf <- data.frame() for (i in 1:10){ webpage <- read_html(str_c("https://www.zhipin.com/c101020100/h_101020100/?query=%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90&page=",i,"&ka=page-",i),encoding="UTF-8") job_title_html <- html_nodes(webpage,".job-title") job_title <- html_text(job_title_html) salary_html <- html_nodes(webpage,".red") salary <- html_text(salary_html) company_basic_html <- html_nodes(webpage,".company-text p") company_basic <- gsub("<p>","",company_basic_html) company_basic <- gsub("em class=\"vline\"></em>","",company_basic) company_basic <- gsub("</p>","",company_basic) company_basic <- as.character(company_basic) job_needs_html <- html_nodes(webpage,".info-primary p") job_needs <- gsub("<p>","",job_needs_html) job_needs <- gsub("em class=\"vline\"></em>","",job_needs) job_needs <- gsub("</p>","",job_needs) job_needs <- str_replace_all(job_needs," ","") job_needs <- as.character(job_needs) job <- data.frame(job_title,salary,company_basic,job_needs) job_inf <- rbind(job_inf,job) } write.csv(job_inf,file="bossdata.csv")
以上代码若有不解之处,可参考上篇文章里代码里的相关解释。
得到初始如下:
通过Excel整理如下:
在利用tableau软件进行可视化呈现如下:
以上就是正文的全部了,如有不尽之处还请各位看官不吝赐教~
链接:源代码 密码:13ye
链接:tableau文件 密码:xz0j