1, watercress web crawlers collected from the data, the database is connected mongo, mongo introduced into the data, the code is as follows:
# Access URL # use requests to access the Import PANDAS AS pd Import requests Import pymongo Import Re U = ' https://book.douban.com/tag/ philosophy ' r = requests.get (url = U) # parse URL # Use BeautifulSoup parsing the URL from BS4 Import BeautifulSoup Soup = BeautifulSoup (r.text, ' lxml ' ) urlist = [] for I in Range (. 7 ): urlist.append ('https://book.douban.com/tag/哲学?start=' + str(20*i)+ '&type=T') n=0 for u in urlist: r = requests.get(url=u) soup =BeautifulSoup(r.text,'lxml') soup.find('div',id="content").h1.text lis = soup.find('ul',class_='subject-list').find_all('') For Li in LIS: DIC = {} # Create an empty dictionary, the stored data DIC [ ' Title ' ] = li.h2.text.replace ( ' ' , '' ) .replace ( ' \ n- ' , '' ) DIC [ ' other information ' ] = li.find ( ' div ' , the class_ = " Pub " ) .text.replace ( ' ' , '' ) .replace ( ' \ n- ' , '') dic['评分']=li.find('span',class_="rating_nums").text dic['评价人数']=re.search(r'(\d*)人',li.find('span',class_="pl").text.replace(' ','').replace('\n','')).group(1) datatable.insert_one (DIC) "(Print n-+ =. 1 The acquired data storage each# Success of data acquisition% i " % n-) myclient = pymongo.MongoClient ( " MongoDB: // localhost: 27017 " ) DB = myclient [ ' watercress Data Acquisition ' ] DataTable = DB [ ' Test ' ] # Create a connected mongo database connection, and create a database watercress data acquisition, and test forms
2, mongo installation configuration: https: //www.cnblogs.com/zhoulifeng/p/9429597.html#4242074
3, ROBO 3T installation: https: //www.cnblogs.com/tugenhua0707/p/9250673.html