Python用数据说明程序员需要掌握的技能

欢迎加入学习交流QQ群:657341423


程序员是一个不错的职业,尽管很苦逼,但发展的前景很可观。想要成为一名程序员,需要掌握哪些技能才算是一名合格的程序员呢?本章节我们通过数据来告诉你。
我们以前程无忧的职业招聘信息为数据源,职位关键字搜索为Python,搜索地区为广州。根据这一条件,我们编写相关的爬虫代码

import requests
import csv
from bs4 import BeautifulSoup

def get_info(page):
    url = 'https://search.51job.com/list/030200,000000,0000,00,9,07%252C08,python,2,'+page+'.html?lang=c&stype=1&postchannel=0000&workyear=99&cotype=99&degreefrom=99&jobterm=99&companysize=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=#top'
    headers = {
        'Host':'search.51job.com',
        'Upgrade-Insecure-Requests': '1',
        'User-Agent':'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36'
    }
    r = requests.get(url, headers=headers)
    soup = BeautifulSoup(r.content.decode('gbk'),'html5lib')
    find_div = soup.find_all('div',class_='el')
    for i in find_div:
        find_href = i.find('a')
        if 'https://jobs.51job.com' in str(find_href):
            url = find_href['href']
            r = requests.get(url, headers=headers)
            soup = BeautifulSoup(r.content.decode('gbk'),'html5lib')
            find_job = soup.find('div', class_='bmsg job_msg inbox')
            if find_job:
                find_job = find_job.find_all('p')
                temp_list = []
                for k in find_job:
                    if not ':' in str(k) and not ':' in str(k) and k.getText():
                        if '、' in k.getText():
                            text = k.getText().split('、')[1].strip()
                        else:
                            text = k.getText().strip()
                        temp_list.append(text)
                if ''.join(temp_list).strip():
                    f = open('text.csv','a',newline='',encoding='utf-8')
                    writer = csv.writer(f)
                    writer.writerow([''.join(temp_list)])
                    f.close()

if __name__ == '__main__':
    for i in range(28):
        page = str(i+1)
        get_info(page)

代码运行后,生成text.csv文件,文件内容如下:
这里写图片描述


有了数据源,下一步是对数据进行分析。首先使用jieba对数据进行分词并清洗,然后使用gensim模块计算相关词列表,在analysis.py编写以下代码:

import csv,re
import jieba
# 数据清洗并分词
csv_reader=csv.reader(open('text.csv',encoding='utf-8'))
seg_list = []
for row in csv_reader:
    temp_list = jieba.cut(row[0], cut_all=False)
    results = re.sub('[()::?“”;.~?/《》【】,,。!()·、.\d ]+', ' ', ' '.join(temp_list))
    seg_list.append(results)
# 将分词写入文件
f = open('data.txt','w',encoding='utf-8')
f.write(' '.join(seg_list))
f.close()

# 通过word2vec计算相关词列表
from gensim import models
sentences = models.word2vec.LineSentence('data.txt')
model = models.word2vec.Word2Vec(sentences, size=1000, window=25, min_count=5, workers=4)
sim = model.wv.most_similar('python', topn=50)
for s in sim:
    print("word:%s,similar:%s " %(s[0],s[1]))

运行analysis.py,结果如图所示:
这里写图片描述这里写图片描述
从结果可以看到,要作为一名合格的Python程序员,首先主要掌握Django和scrapy两大框架,selenium是自动化测试技术;数据库以MySql数据库为主,掌握sql语句不在话下;掌握memcached缓存系统,linux操作,计算机TCP协议;最后还要涉猎Java,C和Nodejs等一些目前主流开发语言等。

猜你喜欢

转载自blog.csdn.net/HuangZhang_123/article/details/80497951