- Preface
- The night is too beautiful, the reptiles are not so dangerous
- Good at using others' UA
- The crawler process analyzes the webpage to obtain the ID value of each hero and analyzes the original painting webpage
- Conclusion
Preface
I have had a lot of time to learn py, and I always forget to write a blog. I am also very helpless! As a bookkeeper who is crazy about code, I am ashamed of my nickname!
Seeing so many big guys in csdn, who have never stopped for decades, have also given me a lot of inspiration. I want to be your role model too! I feel proud.
As a person who likes to read, I also like to play games. I saw someone climbing the skin of the glory of the king before, but I am an old player of the glory of the king, so I climbed the League of Legends.
Hahaha, I didn't expect it!
In the course of this crawler tutorial, I will also share with you some simple and practical crawling tips.
The night is too beautiful, the reptiles are not so dangerous
When crawling, don't attack~, ah. . Other servers can't stand it. . .
You have to learn to pause, restrain yourself a little, and sleep when you need to sleep.
While people are sleeping, the degree of restriction is the lowest. If you can crawl later, you have not seen Los Angeles at 4 in the morning, but you can still see the crawlers at 4 in the morning.
This way your IP address will not be easily blocked.
Good at using others' UA
If you are looking at the robots.txt of someone else’s website, you will see other people’s statements stating what content can be crawled and what content cannot be crawled. However, don’t ignore other people’s statements, what search engine you want to crawl, such as the following
0BtYRO.png
Have you seen it, the robots.txt defined by others is worth noting User-Agent , then when you construct headers in Python, User-Agent can directly specify their robots definition, such as: Baidu's UA , Google’s UA or Sogou’s UA, etc. If you go to see it again, it's called a friendly.
Crawling process
Analyze web pages
Through the developer mode F12, you will find the file pointed to by the arrow. If you don't see it, refresh it and try.
0BtUQe.png
url0 ='https : //game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js' try: response = requests.get(url0, headers=headers) response.raise_for_status() response. encoding = response.apparent_encoding # Set the encoding format hreolist = response.json() # Convert Response to json format print(hreolist) # Print out the list of heroes print(len(hreolist['hero'])) # Print the number of heroes: 151 the except Exception AS E: Print (E) copying the code
Through the above code, I successfully obtained all heroes and the total number of heroes.
Here is just a screenshot of the printed information
{'hero': [{'heroId': '1','name':'Dark Daughter','alias':'Annie','title':'Annie','roles': ['mage'] ,'isWeekFree': '0','attack': '2','defense': '3','magic': '10','difficulty': '6','selectAudio':'https:// game.gtimg.cn/images/lol/act/img/vo/choose/1.ogg','banAudio':'https://game.gtimg.cn/images/lol/act/img/vo/ban/ 1.ogg','isARAMweekfree': '0','ispermanentweekfree': '0','changeLabel':'no change','goldPrice': '4800','couponPrice': '2000','camp': '','campId':'','keywords':'Annie, Lady of Darkness, Lady of Fire, Annie,anni,heianzhinv,huonv,an,hazn,hn'} Copy code
In fact, through the above json information, you will find that the list of heroes is written under hero.
Get the ID value of each hero
Through the json value just obtained, you will find that there is a key in these values: 'heroId' , then what is this'heroId' used for?
I didn’t know this at first, then I entered the website of the original skin painting, and immediately became cheerful
https://lol.qq.com/data/info-defail.shtml?id=1 Annie https://lol.qq.com/data/info-defail.shtml?id=2 Olaf https://lol .qq.com/data/info-defail.shtml?id=876 Lilia Copy code
Through the above three URL addresses, you will find that heroId is a query parameter id.
But there is a pit here. You must have seen it. There are only 151 heroes and the id value is 876. Yes, there will be no problems with the first 100 heroes, very regular, but after more than 100, problems will occur. The id value of each hero jumps a lot, so you have to enter the original painting of each hero to crawl The picture must be correctly stitched URL. Obtaining the ID value of each hero becomes an essential step.
url ='https: //game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js ' hero_list_json = hreolist hero_lists = hero_list_json['hero'] # Get the hero list hero_id = list(map( lambda x: x['heroId'], hero_lists)) # Get hero number Copy code
Analyze the original painting page
Open developer mode, you will find a file
0BttzD.png
In the picture above, you can see that skins has 10 values, click on the first one to see loadingImg , and the value corresponding to this key is the URL address of the original skin painting.
Of course, as old players know, Lilia has only two skins, but why there are 10 values in the skins, click on the third to tenth in turn, you will find that the remaining loadingImg values are all empty.
url_list = [] # Save the url address of each hero's information for hero_id in heros_id: url ='https ://game.gtimg.cn/images/lol/act/img/js/hero/{}.js '. format (hero_id) # Print (url) url_list.append (url) copy the code
url1 ='https : //game.gtimg.cn/images/lol/act/img/js/hero/876.js' try: response = requests.get(url1, headers=headers) response.raise_for_status() response. encoding = response.apparent_encoding # Set the encoding format hreo_info = response.json() skins = hreo_info['skins'] # Get hero skin information # Traverse the loadingImg and skin name of each skin for skin in skins: print(skin['loadingImg ']) Print (Skin [' name ']) the except Exception AS E: Print (E) copying the code
Through the above two sets of code ideas, then it is possible to crawl the original skin of a hero. It is necessary to obtain all the original skins, which is nothing more than one more cycle.
When you can climb the original painting of the first hero, are you afraid of not getting other heroes?
Conclusion
The idea of crawling the original heroes of the League of Legends has been shared with everyone.
My dear reader, can you take down all the hero skins of the King's Glory?
I believe you are absolutely fine, come on!
A very simple little crawler case!
Click here for the complete project code