Python crawler understands Web front end - HTML

HTML language

  HTML is a plain text language, and webpage files written in HTML are also standard plain text files. We can open it with people and text editors, such as the "Notepad" program of Windows , and view the HTML source code in it, or we can view it through the corresponding "View" -> "Source File" command when opening a web page with a browser HTML code in a web page . HTML files can be directly interpreted and executed by the browser without compilation. When a webpage is opened with a browser, the browser reads the HTML code in the webpage, analyzes its grammatical structure, and then displays the content of the webpage according to the interpretation result.

Labels, Elements, Structure Overview

HTML tags

  HTML tags are divided into two types: tags that appear alone and tags that appear in pairs.
Most tags come in pairs, consisting of a first tag and a last tag. The format of the first tag is <element name>, and the format of the tail tag is </element name>. Its full syntax is as follows:

<元素名称>要控制的元素</元素名称>

  Pair labelsIt only affects the part of the file contained in it . For example, the <title> and </title> tags are used to define the scope of the title element, that is, the part between the <title> and </title> tags is this HTML5 file title.

  separate labelThe format of is <element name>, its function is to insert an element at the corresponding position, for example, the <br> tag is to insert a newline character at the position of the tag.

Note:
  In each HTML5 tag, uppercase and lowercase can be mixed . For example <HTML5> , <Html5> , <html5> , the result is the same.

  In each HTML5 tag, you can also set some attributes to control the elements created by the HTML5 tag. These attributes will be located in the first tag of the created element, so the basic syntax of the first tag is as follows:

<元素名称 属性 1="值1" 属性 2="值 2"...>

  And the end tag is created by

</元素名称>

  Therefore, the complete definition syntax of an element in an HTML5 file is as follows:

<元素名称 属性 1="值1" 属性 2="值 2"...>元素资料</元素名称>

Description:
  In the grammar, set the quotation marks used for each attribute“ ”Can be omitted.


element

  When a piece of text is enclosed by a set of HTML5 tags, the piece of text and the HTML5 tag that contains the text are called; an element.

  Because in the HTML5 syntax, each element formed by HTML5 tags and text can also contain another element. Therefore, the whole HTML5 file is like one big element containing many small elements.

  In all HTML5 files,The outermost element is established by the <HTML5> tag. The elements created by the <HTML5> tag contain two main sub-elements, which are created by the <head> tag and the <body> tag. The content of the element locked by the <head> tag is the title of the file, and the content of the element established by the <body> tag is the body of the file.

HTML file structure

  Before introducing the HTML file, let's look at a simple HTML file and its display on the browser.

  Let's start writing an HTML file, use a file editor, for example, write the following code in the Notepad that comes with Windows , and then save it as.htmldocument.

<HTML5>
    <head>
        <title>文件标题</title>
    </head>
        <body>
            文件正文
        </body>
</HTML5>

  The running result is shown in the figure below:

Example picture 1-1

The basic structure of the HTML file   can be seen from the above code , as shown in the figure below.

Example image 1-2

  Among them, the part between <head> and </head> is the file header part of the HTML file , which is used to describe the title of the file and some common attributes of the entire file . The part between <body> and </body> is the main part of the file. The tags introduced below, unless otherwise specified, are used nested in this pair of tags.

Basic HTML tags

file start tag <html>

  In any HTML file, the first HTML tag is <html> , which is used to indicate that the file is written in Hypertext Markup Language ( HTML ). <html> appears in pairs, the first tag <html> and the tail tag </html> are located at the front and back of the file respectively, and all files and HTML tags in the file are included in it. For example:

<html>
    文件的全部内容
</html>

  This tag does not have any attributes

  In fact, the commonly used Web browsers (such as IE) can automatically recognize HTML files and do not require <html> tags, nor do they perform any operations on the tags. However, in order to improve the applicability of the file and make the written HTML file adaptable to the ever-changing web browsers, you should develop the habit of using this tag! ! !

File header tag <head>

  It is customary to divide an HTML file into two parts : the file header and the file body . The main part of the file is the content seen in the user area of ​​the Web browser window, and the header part of the file is used to specify the title of the file (appearing in the title bar of the Web browser window) and some attributes of the file.

  <head> is a tag representing the head of the web page. In the element defined by the <head> tag, no content of the web page is placed, but information about the HTML file is placed, that is to say, it does not belong to the main body of the HTML file. It contains information such as the title of the file, the encoding method, and the URL . Most of this information is used to provide indexing, identification or other applications.

  If the text written between <head> and </head> is written in the <title> tag, it represents the name of the webpage and is displayed at the top of the webpage window as the name of the window.

Note:
If the HTML file does not need to provide relevant information, the <head> tag can be omitted.


Document title tag <title>

  Every HTML file needs to have a file name. In the browser, the file name is displayed at the top of the window as the window name, which is useful for the browser's collection function. If the viewer thinks that a web page is very useful to him and wants to read it frequently in the future, he can select the "Add to Favorites" command in the "Favorites" menu of the IE browser to save it for later recall. The name of the web page should be written between <title> and </title> , and the <title> tag should be included in the <head> and </head> tags.

  The tags of the HTML file can be nested , that is, another pair of sub-tags can be embedded in a pair of tags, which are used to specify the attributes of the scope before the target or a part of them, and are nested in the <head> tag . There are mainly <title> tags.

meta information tag <meta>

  The information provided by the meta element is not visible to the user, which is not displayed on the page, and is generally used to define the name, keywords, author, etc. of the page information. In HTML , the meta tag does not need to set an end tag, and there is a meta content inside an angle bracket , and there can be multiple meta elements in an HTML head page. There are two attributes of the meta element : name and http-equiv , among which the name attribute is mainly used to describe the webpage, so as to facilitate search engine robots to search and classify .

The main body tag of the page <body>

  The main part of a web page begins with the <body> tag and ends with <body> . There are many attribute settings in the main body tag of the web page, as shown in the following table:

Attributes describe
text Set the color of the page text
bgcolor Set the color of the page background
background Set the background image of the page
bgproperties Set the background image of the page to be fixed and not scroll with the scrolling of the page
link Set the default link color of the page
alink Sets the link color when the mouse is clicking
vlink Set the color of the link after visiting
topmargin Set the top margin of the page
leftmargin Set the left margin of the page

Guess you like

Origin blog.csdn.net/m0_68192925/article/details/125898799