How far can self-taught Python get a job? 1300+ recruitment information tells you the answer

With the development of the mobile Internet and the impact of popular fields such as machine learning, more and more people know and start to learn Python. Whether you are from a major or a non-graduate, Python is undoubtedly the first language that is very suitable for you to get started in the computer world. Its syntax is very concise, and the programs written are easy to understand. This is also Python's consistent philosophy of "simple and elegant". On the basis of ensuring that the code is readable, complete your ideas with as little code as possible.

So, to what extent we learn Python, we can start looking for a job. Everyone knows that practice is the only criterion for testing here. So how much we can find a job depends on the market demand. After all, companies recruit you. It's for work, not for you to study with pay.

So, today we try to crawl the recruitment information about Python on the drop-down hook to see what kind of talents the market needs.

1. Web page structure analysis

Open the homepage of Lagou.com, enter the keyword "Python", then press F12 to open the webpage debugging panel, switch to the "Network" tab, select "XHR" as the filter condition, click Search when everything is ready, and carefully observe the network requests of the webpage data.

From these requests, we can roughly guess that the data seems to be obtained from the interface jobs/positionAjax.json.

insert image description here
Don't worry, let's verify, clear the network request records, and try turning the page. When clicking on the second page, the request is recorded as follows.
insert image description here
It can be seen that these data are obtained through POST request, and pn in Form Data is the current page number. Ok, the web page analysis is done, and then you can write a crawler to pull data. Your crawler code might look something like this.

url = 'https://www.lagou.com/jobs/positionAjax.json?px=new&needAddtionalResult=false'
headers = """
accept: application/json, text/javascript, */*; q=0.01
origin: https://www.lagou.com
referer: https://www.lagou.com/jobs/list_python?px=new&city=%E5%85%A8%E5%9B%BD
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36
"""

headers_dict = headers_to_dict(headers)

def get_data_from_cloud(page):
    params = {
    
    
        'first': 'false',
        'pn': page,
        'kd': 'python'
    }
    response = requests.post(url, data=params, headers=headers_dict, timeout=3)
    result = response.text
    write_file(result)

for i in range(76):
    get_data_from_cloud(i + 1)

After the program is written, you press the run button with excitement, trembling hands, and full of anticipation. Excitedly waiting to receive the data, but the result data you get is likely to be like this.

{
    
    "success":true,"msg":null,"code":0,"content":{
    
    "showId":"8302f64","hrInfoMap":{
    
    "6851017":{
    
    "userId":621208...
{
    
    "status":false,"msg":"您操作太频繁,请稍后再访问","clientIp":"xxx.yyy.zzz.aaa","state":2402}
...

Don't doubt, this is the result I got. This is because Lagou.com has an anti-crawler mechanism. The corresponding solution is not to crawl frequently, and to pause appropriately after each data acquisition, for example, sleep for 3 seconds between every two requests, and then add it when requesting data. cookie information. The completed crawler program is as follows:

home_url = 'https://www.lagou.com/jobs/list_python?px=new&city=%E5%85%A8%E5%9B%BD'
url = 'https://www.lagou.com/jobs/positionAjax.json?px=new&needAddtionalResult=false'
headers = """
accept: application/json, text/javascript, */*; q=0.01
origin: https://www.lagou.com
referer: https://www.lagou.com/jobs/list_python?px=new&city=%E5%85%A8%E5%9B%BD
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36
"""

headers_dict = string_util.headers_to_dict(headers)

def get_data_from_cloud(page):
    params = {
    
    
        'first': 'false',
        'pn': page,
        'kd': 'python'
    }
    s = requests.Session()  # 创建一个session对象
    s.get(home_url, headers=headers_dict, timeout=3)  # 用 session 对象发出 get 请求,获取 cookie
    cookie = s.cookies
    response = requests.post(url, data=params, headers=headers_dict, cookies=cookie, timeout=3)
    result = response.text
    write_file(result)

def get_data():
    for i in range(76):
        page = i + 1
        get_data_from_cloud(page)
        time.sleep(5)

Not surprisingly, all the data can be obtained now, a total of 1131 items.

2. Data cleaning

Above we stored the obtained json data in the data.txt file, which is inconvenient for our subsequent data analysis operations. We are going to use pandas to analyze the data, so we need to do some data formatting.

The process is not difficult, but a bit cumbersome. The specific process is as follows:

def get_data_from_file():
    with open('data.txt') as f:
        data = []
        for line in f.readlines():
            result = json.loads(line)
            result_list = result['content']['positionResult']['result']
            for item in result_list:
                dict = {
    
    
                    'city': item['city'],
                    'industryField': item['industryField'],
                    'education': item['education'],
                    'workYear': item['workYear'],
                    'salary': item['salary'],
                    'firstType': item['firstType'],
                    'secondType': item['secondType'],
                    'thirdType': item['thirdType'],
                    # list
                    'skillLables': ','.join(item['skillLables']),
                    'companyLabelList': ','.join(item['companyLabelList'])
                }
                data.append(dict)
        return data

data = get_data_from_file()
data = pd.DataFrame(data)
data.head(15)

insert image description here

3. Data analysis

Obtaining data and cleaning data are just our means, not our goal. Our ultimate goal is to mine the needs of recruiters through the obtained recruitment data, and use this as a goal to continuously improve our skill map.

City

Let's take a look at which cities have the greatest recruitment needs. Here we only take the top 15 city data.

top = 15
citys_value_counts = data['city'].value_counts()
citys = list(citys_value_counts.head(top).index)
city_counts = list(citys_value_counts.head(top))

bar = (
    Bar()
    .add_xaxis(citys)
    .add_yaxis("", city_counts)
)
bar.render_notebook()

insert image description here

pie = (
    Pie()
    .add("", [list(z) for z in zip(citys, city_counts)])
    .set_global_opts(title_opts=opts.TitleOpts(title=""))
    .set_global_opts(legend_opts=opts.LegendOpts(is_show=False))
)
pie.render_notebook()

insert image description here
As can be seen from the above figure, Beijing basically accounts for more than a quarter of the recruitment volume, followed by Shanghai, Shenzhen, and Hangzhou. In terms of demand alone, Guangzhou is replaced by Hangzhou among the four first-tier cities.

This also explains from the side why we are going to develop in first-tier cities.

academic qualifications

eduction_value_counts = data['education'].value_counts()

eduction = list(eduction_value_counts.index)
eduction_counts = list(eduction_value_counts)

pie = (
    Pie()
    .add("", [list(z) for z in zip(eduction, eduction_counts)])
    .set_global_opts(title_opts=opts.TitleOpts(title=""))
    .set_global_opts(legend_opts=opts.LegendOpts(is_show=False))
)
pie.render_notebook() 

insert image description here
It seems that most companies require at least a bachelor's degree. It has to be said that in today's society, a bachelor's degree has basically become the minimum requirement for job hunting (except for those with particularly strong abilities).

working years

work_year_value_counts = data['workYear'].value_counts()
work_year = list(work_year_value_counts.index)
work_year_counts = list(work_year_value_counts)

bar = (
    Bar()
    .add_xaxis(work_year)
    .add_yaxis("", work_year_counts)
)
bar.render_notebook()

insert image description here
Intermediate engineers with 3-5 years are in the most demand, followed by junior engineers with 1-3 years.

In fact, this is also in line with market laws, because the frequency of senior engineers changing jobs is much lower than that of junior and middle-level engineers, and the demand for senior engineers in a company is far lower than that of junior and middle-level engineers.

However, I found that there are very few recruits for less than one year, so many self-study students say that it is difficult to enter the industry!

industry

Let's take a look at which industries these recruiters belong to. Because the industry data is not very regular, it is necessary to cut each record separately according to .

industrys = list(data['industryField'])
industry_list = [i for item in industrys for i in item.split(',') ]

industry_series = pd.Series(data=industry_list)
industry_value_counts = industry_series.value_counts()

industrys = list(industry_value_counts.head(top).index)
industry_counts = list(industry_value_counts.head(top))

pie = (
    Pie()
    .add("", [list(z) for z in zip(industrys, industry_counts)])
    .set_global_opts(title_opts=opts.TitleOpts(title=""))
    .set_global_opts(legend_opts=opts.LegendOpts(is_show=False))
)
pie.render_notebook()

insert image description here
The mobile Internet industry accounts for more than a quarter of the demand, which is in line with the general environment we know.

Skill Requirements

Take a look at the word cloud of skills required by recruiters.

word_data = data['skillLables'].str.split(',').apply(pd.Series)
word_data = word_data.replace(np.nan, '')
text = word_data.to_string(header=False, index=False)

wc = WordCloud(font_path='/System/Library/Fonts/PingFang.ttc', background_color="white", scale=2.5,
               contour_color="lightblue", ).generate(text)

wordcloud = WordCloud(background_color='white', scale=1.5).generate(text)
plt.figure(figsize=(16, 9))
plt.imshow(wc)
plt.axis('off')
plt.show()

insert image description here
Except for Python, backend, MySQL, crawler, full stack, algorithm, etc. appear most frequently.

salary

Next, let's take a look at the salary conditions given by major companies.

salary_value_counts = data['salary'].value_counts()
top = 15
salary = list(salary_value_counts.head(top).index)
salary_counts = list(salary_value_counts.head(top))

bar = (
    Bar()
    .add_xaxis(salary)
    .add_yaxis("", salary_counts)
.set_global_opts(xaxis_opts=opts.AxisOpts(name_rotate=0,name="薪资",axislabel_opts={
    
    "rotate":45}))
)
bar.render_notebook()

insert image description here
The salaries offered by most companies are still very impressive, basically between 20K-35K, as long as you are skilled, it is difficult to find a job with satisfactory salary.

Welfare

Finally, let's take a look at the additional benefits offered by the company.

word_data = data['companyLabelList'].str.split(',').apply(pd.Series)
word_data = word_data.replace(np.nan, '')
text = word_data.to_string(header=False, index=False)

wc = WordCloud(font_path='/System/Library/Fonts/PingFang.ttc', background_color="white", scale=2.5,
               contour_color="lightblue", ).generate(text)

plt.figure(figsize=(16, 9))
plt.imshow(wc)
plt.axis('off')
plt.show()

insert image description here
Double salary at the end of the year, performance bonus, flat management. They are all well-known benefits, among which flat management is the characteristic of Internet companies. Unlike state-owned enterprises or other physical enterprises, the concept of superiors and subordinates is relatively heavy.

Four. Summary

Today we captured 1300+ pieces of recruitment data about Python from Lagou.com. After analyzing this batch of data, we came to the following conclusions:

Regarding education, you’d better graduate with a bachelor’s degree. The market has a relatively large demand for engineers with 1-5 years of work experience. The cities with the largest demand are Beijing, Shanghai, Shenzhen and Hangzhou. The industry with the largest demand is still the mobile Internet, and most companies have Give a good salary package.

For those students who want to get started by self-study: It is recommended not only to learn Python, but also to learn algorithms, databases and other related knowledge. Any language is the same. As long as you persist in learning, there will always be a chance to enter the industry !

The road you are walking now may be the most difficult road in your life!

About Python Technical Reserve

It is good to learn Python whether it is employment or sideline business to make money, but to learn Python, you still need a study plan. Finally, everyone will share a full set of Python learning materials to help those who want to learn Python!

1. Python Learning Outline

The technical points in all directions of Python are sorted out to form a summary of knowledge points in various fields. Its usefulness lies in that you can find corresponding learning resources according to the above knowledge points to ensure that you can learn more comprehensively.

insert image description here
Due to limited space, only part of the information is shown, you need to click the link below to get it

CSDN: A complete set of learning materials from Python zero-based entry to actual combat, free to share

2. Essential development tools for Python

insert image description here

3. Introductory learning video

insert image description here

4. Practical cases

Optical theory is useless, you have to learn to follow along, and you have to do it yourself, so that you can apply what you have learned to practice. At this time, you can learn from some actual combat cases.

insert image description here

5. Python sideline part-time and full-time routes

insert image description here

6. Internet company interview questions

We must learn Python to find high-paying jobs. The following interview questions are the latest interview materials from first-line Internet companies such as Ali, Tencent, and Byte, and Ali bosses have given authoritative answers. After finishing this set The interview materials believe that everyone can find a satisfactory job.
insert image description here
insert image description here
This complete set of learning materials for Python has been uploaded to CSDN. If you need it, you can also scan the official QR code of CSDN below or click the WeChat card at the bottom of the home page and article to get the collection method. [Guaranteed 100% free]

insert image description here

Guess you like

Origin blog.csdn.net/Z987421/article/details/131226812