Getting Started with Python Tips

"Java from the heart" was renamed "KEN DO EVERTHING". Ken (Can) Do Everything, all-powerful

Believe youseft then you can do everthing !

I also just entered the door python, it can only give a little suggestion, not guidance (temporarily only contact with the python reptile)

Getting trilogy

1. Learn grammar, syntax python3 recommendation can be seen on the rookie tutorial

https://www.runoob.com/python3/python3-tutorial.html

There are other language-based, in fact, can easily go over, because the python syntax is very simple, do not look at the syntax can also directly read, really do not understand the time and then go back and check, the effect may be better

2. Video learning, teacher recommendation Liao python reptile practical courses.

No public backstage send [] get python reptile

Do not just look, watching the video have to do it yourself combat will have a harvest!

3. The actual project, find a few sites they want to crawl in, try to use what they have learned and start messing around it!

At that time the actual encounter pages are dynamically generated js, could not get the data directly from the response, so the use of selenium framework.

Some say the place to note

1. Do not use the selenium integrated into scrapy (possibly humble opinion, be sure to correct me if wrong)

Because Scrapy in Spirder parse method is single-threaded, response queue is referred to parse the serial processing, the use of selenium can not open multiple browser crawling, and selenium have been slow enough, so will not integrate selenium to scrapy. You can use selenium alone and then turn on multi-threaded climb to take, will be much faster.

(I later discovered that there scrapy library scrapy-splash crawls dynamic pages!). However, because selenium occurs relatively early, using scrapy-splash of the people will be less than selenium)

2. Do not use time.sleep to wait for a page to load

Because you do not know its actual load time is long, set on more than a waste of time, fewer pages may not load does not come out.

For selenium framework recommended

WebDriverWait (driver, 10) .until (specific element), set a time, wait for a particular element, does not appear timeout exception will be thrown. You can then add the operation was retried after certain number of retries to give up the crawling, the easiest way is to use cycle.

3. Anti-climb

Program found crawling out of the question, not crawling data, can be considered under is not the site uses some anti-climb policy, then you need to use the corresponding method to deal with the anti-anti-climb

The main anti-climb policy are the following categories:

Analyzing ① User-Agent, if the browser;

② judge a short time the same IP visits;

③ access to resources after the user logs on;

④ short time using a different IP users to access resources, unusual sign;

⑤ codes, the slide click Verify;

⑥ data encryption and decryption processing;

Anti-climb for different strategies, different coping styles:

①User-Agent: initiation request time to add headers, camouflage browser;

② short visits restrictions: You can use a proxy or delay crawling;

③ after login access: Analog Log in Save cookie, add cookie information is requested;

④ abnormal Login: Prepare a large number of accounts, different binding agents crawling;

⑤ codes: python library using the corresponding process, github above can be found;

⑥ data encryption and decryption: using the corresponding algorithm cracks, or using selenium crawl;

4. Stability of the code to be considered site crawling

Web site may appear the following situations

1. The website crawling performance is poor, very slow page response times

2. Sometimes collapse out directly to the site

3. Maintenance State

Code must be considered that happens, writing correspondence with the exception processing logic, otherwise the crawlers will collapse out or get stuck

The crawler not permanently available

Whether analysis or parsing response page crawling, all others things, in case the website is subject to change, your program is not available. If you take a reptile project, it must inform customers in advance

Good smart, I know so little, I told you.

To open your journey python


Article first appeared in public KEN DO EVERTHING [No.]
this number of public focus on java related technologies, but not limited to, java, mysql, python, interview skills, understanding of life and so on. Share high-Bowen, dry technology, learning resources and other premium content.
Welcome attention, learn together, grow together!

Guess you like

Origin www.cnblogs.com/KEN-DO-EVERTHING/p/12238657.html