Introduction to Python Basics (12): A Preliminary Exploration of Selenium-Dynamic Web Pages & Static Web Pages

Selenium collects data

Selenium is a tool for web application testing. Selenium tests run directly in the browser, just like real users. Supported browsers include IE (7, 8, 9, 10, 11), Mozilla Chrome, Safari, Google
Chrome, Opera, etc.

Dynamic Webpage & Static Webpage

Static web pages refer to actual HTML files stored in the server file system. When the user enters the URL of the page in the browser
and presses Enter, the browser will download, render and present the corresponding HTML file in the window. Early websites were usually made of static pages.

1. Dynamic web pages

Dynamic web pages are relative to static web pages. When a browser requests a page from the server, the server dynamically generates an HTML page according to the current time, environment parameters, database operations, etc., and then sends it to the browser (the subsequent processing is the same as a static web page).

Obviously, the "dynamic" in a dynamic web page refers to the dynamic generation of the server-side page, on the contrary, "static" means that the page is a real, independent file.

Notice:

  • The dynamic page technology corresponds to the static page technology, that is to say, the suffix of the web page URL is not the
    common form of static web pages such as .
  • The dynamic webpage mentioned here is not directly related to the visual "dynamic effects" such as various animations and scrolling subtitles on the webpage. The dynamic webpage can also be pure text content, or it can contain various animation content. These are just the manifestations of the specific content of the webpage.

1.1 JavaScript

JavaScript is a scripting language belonging to the network. It has been widely used in the development of Web applications. It is often used to add various dynamic functions to web pages and provide users with smoother and more beautiful browsing effects. Usually lavaScript script realizes its own function by embedding in HTML.

It can be seen in the tags of the source code of the web page, for example:

<script type="text/javascript"
src="https://statics . huxiu. com/w/mi ni/static_ 2015/js/sea. js?v=201601150944">
</script>

JavaScript can dynamically create HTML content, which will only be generated and displayed after the JavaScript code is executed. If you use the traditional method to collect page content, you can only get the content on the page before the JavaScript code is executed.

JQuery
JQuery is a fast and concise JavaScript framework, which encapsulates common functional codes in JavaScript, provides a simple JavaScript design mode, and optimizes HTML document operations, event handling, animation design and Ajax interaction. -A characteristic of a website using JQuery is that the source code contains a JQuery entry, such as:

<script type="text/javascript"
src="https://stati CS . huxiu. com/w/mini/static_ 2015/js/jquery-1.11.1.min.js?
v=201512181512"></script>

If jQuery appears in the source code of a website page, you must be extra careful when collecting data from this website. Because jQuery can dynamically create HTML content, the content will only be generated and displayed after the JavaScript code is executed. If you use the traditional method to collect page content, you can only get the content on the page before JavaScript code execution.

1.2 Ajax

A website that uses Ajax technology to update the content of a web page has a great feature, that is, it can update a certain part of the web page without reloading the entire web page.

Ajax is actually not a language, but a series of technologies used to complete network tasks (it can be considered similar to network data collection). An Ajax website can interact with a web server without using the entire page load.

1.3 DHTML

DHTML: Dynamic HTML Dynamic HTML, this technology is not a new technology, but integrates the
HTML, CSS, and JavaScript learned before, and uses S to manipulate page elements to make the elements change dynamically, so that the page and users have interactive behavior.

2. Dynamic web page processing method

Using a dynamically loaded website, there are several ways to solve it with Python:

  1. Directly crack the content collected in the JavaScript code.
  2. Packet capture analysis, view the request response information of the screenshot, forge the request, and realize the response acquisition. (recommend)
  3. Use Python's third-party library to run JavaScript and directly capture the page you see in the browser. (recommend)

Since the browser can get the data, you can simulate a browser and get the data from the browser. That is to use the program to control the browser, so as to achieve the purpose of data collection.

Guess you like

Origin blog.csdn.net/Dangerous_li/article/details/127578497