Record a different streaming sites implementation, and climb the hang of it in Python Reptile

Find a movie today, I wanted to download it.

First opened Networks tools to analyze:

Preliminary analysis found that would pull the video file to load TS format, suggesting that this is a m3u8 index recorded hundreds segment TS file, so easy to load fast forward.

 

 

However, the actual analysis m3u8 file and found that this is not a valid index file, you should just load a form, the actual handler elsewhere:

 

 

But this analysis js too much trouble. Through several attempts, discovered the law: video file name is composed of y8TL59oh4680xxx.ts, xxx is the serial number, so much easier!

The music file before climbing reptiles altered, get such a program:

import requests
import os
import re
from tkinter import Tk
from tkinter.simpledialog import askinteger, askfloat, askstring
from tkinter.filedialog import askopenfilename, askopenfilenames, asksaveasfilename, askdirectory
from tkinter.messagebox import showinfo, showwarning, showerror

def downloadSong(SongID, FileName):
    headers = {"user-agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"}
    r = requests.get("https://www.mmicloud.com/20190406/I1RrJf8s/2000kb/hls/y8TL59oh" + str(SongID) + ".ts",headers=headers);
    #print("State:")
    #print(r)
    filepath=os.path.join(str(SongID) + ".ts")
    with open(filepath,"wb") as file:
        file.write(r.content)
    print(SongID)

for i in range(4680000, 4680900):
    downloadSong(i, str(i))

The program loop crawling file name from y8TL59oh4680000.ts to y8TL59oh4680899.ts of 900 video files.

The reason why the maximum cycle program set at 4,680,900, because I have found more than 860 films segment, so the more download some, if not download is the next over, the idea is wrong does not matter.

So he started running, it looks good work, smoothly download files:

 

 

 So I put down the matter at hand, first to rest. After about half an hour, he has been downloaded more than 300 files:

 

 

I'll calm down, the reptile should be no problem, so I wrote some code with VSCode. When I saw the taskbar again, reptile had disappeared!

I started crawling again, after a while they will have the same problem! Is the variable i overflow? Try to debug it, to try to narrow the range of i:

import requests
import os
import re
from tkinter import Tk
from tkinter.simpledialog import askinteger, askfloat, askstring
from tkinter.filedialog import askopenfilename, askopenfilenames, asksaveasfilename, askdirectory
from tkinter.messagebox import showinfo, showwarning, showerror

def downloadSong(SongID, FileName):
    headers = {"user-agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"}
    r = requests.get("https://www.mmicloud.com/20190406/I1RrJf8s/2000kb/hls/y8TL59oh4680" + str(SongID) + ".ts",headers=headers);
    #print("State:")
    #print(r)
    filepath=os.path.join(str(SongID) + ".ts")
    with open(filepath,"wb") as file:
        file.write(r.content)
    print(SongID)

for i in range(566, 900):
    downloadSong(i, str(i))

After debug, I found that the program should be no problem, but because the console window is minimized, reptiles will be recovered out of memory, resulting in a program exits.

Shetenglebantian!

I replaced with IDLE editor comes Run Modules, ordinary window, then put out is not easy to be recycled:

 

 

After a while, reptile finally climbed the file. A look at the folder has gone wrong:

 

 

File name inconsistent!

Remember the time before we debug the scope of the variable i piecemeal yet? That's why!

Well, check all long file name, right, rename, named as a, then the file will be automatically named a (1), a (2), a (3), a (4), a (5 ), ... this way.

problem. . solved?

 

I took these named a (1), a (2), a (3), a (4), a (5), ... to transcode documents, combined, a whole to and fro over an hour . After the merger, only to find,

File order full of chaos! ! !

Ah ah ah ah ah ah ah ah ah ah ah Independence Day of Windows! ! ! ! ! ! ! ! ! !

 

No way, air can not get out there, but to continue to write code. . .

Fortunately, I did not leave a folder to rename off, then use python to write a batch rename program it:

import os
PROJECT_DIR_PATH = os.path.dirname(os.path.abspath(os.path.abspath(__file__)))
DIR_PATH = os.path.join(PROJECT_DIR_PATH, 'data')
files = os.listdir(DIR_PATH)
for filename in files:
    name, suffix = os.path.splitext(filename)
    new_name = os.path.join(DIR_PATH, name[4:7])
    old_name = os.path.join(DIR_PATH, filename)
    os.rename(old_name, new_name)

The file into this directory, you can use the above procedure:

 

Brisk run your program, found that naming is successful, but there is no extension. . .

 

 

Mistakes mistakes! Write a remediation program:

import os
PROJECT_DIR_PATH = os.path.dirname(os.path.abspath(os.path.abspath(__file__)))
DIR_PATH = os.path.join(PROJECT_DIR_PATH, 'data')
files = os.listdir(DIR_PATH)
for filename in files:
    name, suffix = os.path.splitext(filename)
    new_name = os.path.join(DIR_PATH, filename + ".ts")
    old_name = os.path.join(DIR_PATH, filename)
    os.rename(old_name, new_name)

Frightened runs out, finally normal catalog:

 

Then again transcoding, merger, but also more than one hour. Finally, finally we got the fruits of victory:

 

 

 too difficult!

 

Download this movie took me a whole day. Morning and afternoon to find sources for the afternoon to write code + write + reptiles crawling resources, the evening had to worry about renaming and transfer code issues, which I think are enough 6-7 middle piece movie. ε = ('ο `*))) Oh. . .

Not much to say, the film can only be read tomorrow. good night everybody!

 

Guess you like

Origin www.cnblogs.com/lyj00912/p/12630122.html