Modeling and visualization analysis of big data major-related recruitment information based on recruitment website

If you need this project, you can private message the blogger! ! !

In the context of the big data era, the accumulation of data has led to a rapid increase in the demand for talents in the big data industry, and a large amount of recruitment information has been published on the recruitment platform. An in-depth study of this information can help stakeholders better understand industry dynamics and predict its future development. This article mainly makes a visual presentation by analyzing the big data job recruitment information on the 51job website.

This study first uses Python crawler technology to crawl all big data-related job information on the 51job website. Then use Python's data cleaning technology to deal with duplicates and exceptions in the data. Then, we use Python's statistical sorting technology to analyze the data, and present the analysis results through Python data visualization technology. Finally, we use machine learning techniques to predict the average salary for big data-related jobs.

According to the visualization results of 51job's big data job recruitment information, we can see that big data job postings are continuing to grow, which means that society's demand for big data professionals is also growing. Big data jobs are mainly concentrated in first-tier cities, and most of the jobs come from private companies and listed companies, mainly in the computer software, real estate and Internet industries. At the same time, the salary and benefits of big data positions are generally very generous. This information provides job seekers with a reference for choosing big data positions.

Through the prediction model, job seekers can know in advance the expected salary of the upcoming big data position, and the accuracy of the prediction model can be as high as 99%.

Based on the web crawler technology, the crawling of big data major-related recruitment information on the 51job website is mainly implemented by using the xpath method in the web crawler technology and the developer tools of Google Chrome.

First, use Google Chrome to log in to the 51job website, use the developer tools of Google Chrome to check the cookies, useragent and other logos in the webpage, and check the URL changes through the search function and page turning function of the website.

Figure 3-1 The developer tools page of Google Chrome

 

After the above series of crawling operations, a total of 54,950 pieces of data were crawled, and the job information related to big data majors in twelve fields was mainly obtained. The specific fields and their meanings are shown in the following table.

Figure 3-2 Crawling data running results

Table 3-1 Explanation of the meaning of the fields

field

meaning

job_name

Job Title

company_name

Company Name

low_salary

Minimum salary (10,000/month)

High_salary

Maximum salary (10,000/month)

yaoqiu

Require

job_place

work place

company_nature

Company Type

job_content

work content

company_content

Company Details

release

release time

job_class

Job Categories

flip

company benefits

    The first is to use the Counter function of the collections library to check the repeated value of the data. Here, the url of the post is mainly checked statistically. It can be seen from the results that each url only appears once, that is to say, each The data only appears once, and there are no duplicate values. Therefore, there is no need to deduplicate the data.

Figure 3-3 Statistical results of repeated values

Figure 3-4 Comparison of the highest salary outlier before and after processing

Figure 3-5 Comparison chart before and after processing the minimum capital outlier

Figure 3-6 Box plot of maximum salary and minimum salary

 

Figure 3-7 The running results of the top ten categories by proportion

Figure 3-8 The running results of the top ten categories with the highest average salary

Figure 3-9 The running results of the top ten categories of average minimum wage

Judging from job information, each job is located in a different city, and the development level of each city is different.

Figure 3-10 Operation results of the top ten cities with major distribution of big data-related jobs

Statistical analysis of the number of big data professional-related positions in each company through job information. First, count the big data professional-related positions of each company's nature through the list, and then sort the top ten company properties with the largest number of positions.

Figure 3-11 Running results of the top ten companies by nature

Big data majors are also divided into many different positions.

Figure 3-12 The running results of the top ten positions

At present, the ten positions with the largest demand should attract the attention of job seekers to the salaries of these positions. Therefore, the average maximum salary and the average minimum salary of these ten positions are analyzed.

Figure 3-13 The running results of the highest and lowest salaries corresponding to the top ten positions

In order to verify the real development trend of big data major-related jobs in the current society, as well as the growth trend of social demand for this job, the number of jobs released every day is analyzed. Count the number of posts posted by date.

Figure 3-14 Running results of the number of daily releases

 

According to the statistics of the number of job categories, a word cloud diagram is presented for all job categories. From the word cloud diagram, it can be seen that job categories such as computer software, the Internet, e-commerce, and computer services have a relatively large demand for jobs related to big data majors. You can consider the recruitment of these job categories first.

Figure 3-15 Word cloud display of job categories

Figure 3-16 The scatter heat map display of the highest salary and the lowest salary of the job category

Figure 3-17 Box plot display of the average salary of each category

According to the statistics of the top ten job categories, the job categories are presented in a ring fan chart. According to the circular pie chart, it can be seen that the positions of computer software, real estate, and Internet/e-commerce account for a relatively large proportion. If job seekers want to obtain more job opportunities, they can give priority to these three job categories.

Figure 3-18 The circular pie chart display of the top ten job categories

According to the statistics of the top ten average minimum salaries and the average highest salary job categories, the job categories are presented in a histogram. According to the histogram, it can be seen that the average minimum salary of the top ten job categories is above 14,000/month, and the average maximum salary of the top ten job categories is above 22,000/month.

Figure 3-19 The bar chart display of the top ten minimum salary job categories

Figure 3-20 The bar chart display of the top ten job categories with the highest salary

According to the statistics of the number of jobs related to big data majors in each city, the city job distribution is presented in a China map heat map. It can be seen from the figure that Shanghai is the city with the largest demand for jobs related to big data, followed by Guangdong, the third echelon is Beijing, Jiangsu, and Zhejiang, Sichuan and Hubei are in the fourth echelon. It can be seen that Beijing, Shanghai and Guangzhou, as first-tier cities, have a more urgent demand for big data-related jobs.

Figure 3-21 The heat map of the China map showing the number of big data jobs in each city

According to the statistics of the number of big data professional-related positions in the top ten companies, the nature of the company is presented in a bar chart. As can be seen from the bar chart, private companies accounted for the largest proportion, accounting for 66.15%, followed by listed companies and state-owned enterprises.

Figure 3-22 The bar chart showing the nature of the top ten companies by the number of jobs

According to the statistics of the number of job titles, the number of jobs is displayed in a bar graph and a circular fan graph. It can be seen from the bar chart and circular fan chart that the demand for big data development engineers is the strongest, accounting for 48%. Followed by big data analysis engineers, accounting for 15%.

Figure 3-23 The bar chart display of the top ten posts by number of posts

Figure 3-24 The circular fan chart displaying the proportion of the top ten posts by number of releases

According to the average maximum salary and minimum salary statistics of the top ten positions, the salary is presented in a bi-line chart. It can be seen from the figure that the salaries of these salary positions are relatively average and very stable.

3-25 Bi-line chart showing the highest and lowest salaries of the top ten positions

Statistics based on the number of jobs posted daily

3-26 Line chart display of the number of posts released daily

Present the welfare field (fuli) in the data in a word cloud graph

3-27 Word Cloud Display of Job Benefits

 

Figure 3-28 Correlation heat map display of features

Figure 3-29 Running Results of Random Forest Model Prediction Accuracy

Figure 3-30 xgboost prediction accuracy running results

Figure 3-31 Average salary prediction results of random forest

Using the data to train the model, I used plot_learning_curve() to present the learning curve of the model. It can be seen that the accuracy of the model is constantly increasing during the training process.

Figure 3-32 Line chart display of model learning curve


 

every word

What is done on paper will eventually become shallow, and I will never know that this matter must be practiced

Guess you like

Origin blog.csdn.net/weixin_47723732/article/details/131551563