python Dynamic Video Downloader

Here to share with you some applications python reptiles, mainly with reptiles with a simple GUI interface Implementation of the video, music and novels. Today on the first describes how to implement a dynamic video downloader.

 

Crawling video movie heaven

First introduced python crawling video movie Heaven website (including movies, television dramas, variety, etc.), mainly used selenium dynamic web technology coupled with simple crawler technology.

(1) Home Movie face Address: https://www.dytt8.net/

(2) use of technology: selenium analog browser to run.

(3) first need to install and configure the library selenium different browsers and plug-ins with the library. Installation configuration process here ignored.

(4) Then we open the home page with the following code, and outputs the source of this page:

def getSource(url):
    browser = webdriver.Chrome()
    browser.get(url)
    print(browser.page_source)
    browser.close()

(5) Then we find the search for the corresponding page elements labels, and choose the type and the search now button corresponding label.

They are as follows:

 

(6) Then we use the following code to the analog information input by the user into the browser

  Because not has been loaded into the advertising pages, so there is room for improvement, then we need to extend the load time. Here a display waiting and waiting implicit, simple implicit to wait.

       Sometimes errors occur, conceal the div is possible to do some operations, the will disappear, such as the page is still loading. This time click on the element, then click directly on the loading of labels, so before this operation can add a wait, so that after masked div go away, and then wait for the left menu to click on to state; or refresh operations this div can disappear, and then wait for the menu to the left to clickable state.

Code:

def putUserMessger(url,this_name,this_type):
    '''
    : Param url: browser URL
    : Param this_name: need to download the video name
    : Param this_type: need to download video types
    '''
    this_browser = webdriver.Chrome()
    this_browser.implicitly_wait(10)
    this_browser.get(url)
    # The name of downloaded videos and video types simulate browser match 
    # label property search input box has a name and class, get here by the name attribute 
    this_browser.find_element_by_name ( ' keyword ' ) .send_keys (this_name)
    the time.sleep ( 2 )
     # selection type is a drop-down box comes html drop-down box, do not leave input dropdown 
    the Select (this_browser.find_element_by_name ( ' Field ' )). select_by_visible_text (this_type)
    the time.sleep ( 2 )
     # Click the Search Now button, submit it is not a simple click, it will involve interaction around the table 
    this_browser.find_element_by_name ( ' the Submit ' ) .click ()
    this_browser.close()


def main():
    name = INPUT ( ' Please input video name: ' )
    type = INPUT ( ' Select Type: ' )
    url = 'https://www.dytt8.net/'
    putUserMessger(url,name,type)

  But still there will be the following questions:

selenium.common.exceptions.WebDriverException: Message: unknown error: Element <input name="Submit" type="Submit" value="立即搜索"> is not clickable at point (702, 220). Other element would receive the click: <div style="width: 1017px; height: 577px;"></div>
  (Session info: chrome=73.0.3683.86)
  (Driver info: chromedriver=73.0.3683.68 (47787ec04b6e38e22703e856e101e840b65afe72),platform=Windows NT 10.0.17134 x86_64)

But we discovered that in fact there is a pattern after we click, so use another method.

(6) two-story three-page configuration parameters and video output address

We first analyze the url:

The second layer analysis page address is:

http://s.ygdy8.com/plus/so.php?typeid=1&keyword=%C4%E3%B5%C4%C3%FB%D7%D6

It is an integral & keyword = name gdk video encoded by the video http://s.ygdy8.com/plus/so.php?+typeid= number. So we need to convert Chinese characters for the web address url encoding.

With the following code will be needed to build a web site:

def main():
    name = INPUT ( ' Please input video name: ' )
    type = INPUT ( ' Select Type: ' )
    ret = quote(name, encoding="gbk")
    dict = { ' Movie ' : ' 1 ' , ' drama ' : ' 2 ' , ' variety ' : ' 99 ' , ' the old variety ' : ' 89 ' , ' game ' : ' 19 ' , ' animation ' : ' 16 ' }
    url = 'http://s.ygdy8.com/plus/so.php?' + 'typeid=' + dict[type] + '&keyword=' + ret

Then we analyze the web page:

All output video information and tertiary Address:

def putUserMessger (URL)
     ''
    : Param url: Video URL
    '''
    this_browser = webdriver.Chrome()
    this_browser.get(url)
    # 用css选择器选择
    input1 = this_browser.find_elements_by_css_selector('.co_content8 ul td a')
    for i in input1:
        print(i.text)
        print(i.get_attribute('href'))
    this_browser.close()

(7)三级网页找到下载界面

下载的链接的位置是:

然后用request配合pyquery下载即可。

下载链接如下:

(8)完整代码

这里没有用到数据库,上面的代码再配合界面,这里只暂时没有界面的代码如下:

# encoding: utf-8
from selenium import webdriver
from urllib.request import quote
import requests
from pyquery import PyQuery as pq
from tkinter import *


def putUserMessger(url):
    '''
    :param url: 视频网址
    '''
    last_url = {}
    this_browser = webdriver.Chrome()
    this_browser.get(url)
    # 用css选择器选择
    input1 = this_browser.find_elements_by_css_selector('.co_content8 ul td a')
    for i in input1:
        #用字典保存视频的名字与下载地址
        last_url[i.text] = i.get_attribute('href')
    this_browser.close()
    return last_url


def download(all_url):
    this_download = {}
    for name,url in dict.items(all_url):
        r = requests.get(url)
        r.encoding = r.apparent_encoding
        doc = pq(r.text)
        this_url = doc('#Zoom a')
        this_download[name] = this_url.attr('href')
    return this_download



type = 0
name = 0

def myRadiobutton():
    global type
    type = v.get()


def my_all():
    name = var.get()
    ret = quote(name, encoding="gbk")
    url = 'http://s.ygdy8.com/plus/so.php?' + 'typeid=' + str(type) + '&keyword=' + ret
    all_url = putUserMessger(url)
    result = download(all_url)
    print(result)


# 创建一个主窗口,用于容纳整个GUI程序
root = Tk()
# 设置主窗口对象的标题栏
root.title("视频下载器")
L1 = Label(root, text="请选择类型:")
L1.pack(side = TOP)
v = IntVar()
Radiobutton(root, text='电影', variable=v, command=myRadiobutton,value=1).pack(anchor=W)
Radiobutton(root, text='电视剧', variable=v, command=myRadiobutton,value=2).pack(anchor=W)
Radiobutton(root, text='综艺', variable=v, command=myRadiobutton,value=99).pack(anchor=W)
Radiobutton(root, text='旧综艺', variable=v, command=myRadiobutton,value=89).pack(anchor=W)
Radiobutton(root, text='游戏', variable=v, command=myRadiobutton,value=19).pack(anchor=W)
Radiobutton(root, text='动漫', variable=v, command=myRadiobutton,value=16).pack(anchor=W)

var = StringVar()
L2 = Label(root, text="请输入视频名")
L2.pack(side = LEFT)
E1 = Entry(root, bd=5,textvariable=var)
E1.pack(side = RIGHT)

B = Button(root, text="点我",command=my_all).place(x=120, y=80)
# 显示界面,进入主事件循环
root.mainloop()

结果如下:

 

Guess you like

Origin www.cnblogs.com/ITXiaoAng/p/11524648.html