Experience the crawler to get movies

"Yesterday I crawled for love, and today I crawled for movies. This chapter will take you to experience how to crawl for a movie."

Note: For non-profit things, please do not play around.

I won’t provide the address of the movie, there are a lot of it on the Internet.

The libraries involved in the article, such as m3u8, need to be downloaded by pip. It is not recommended to directly pip install ~ because the download cannot be done (try it yourself), I specified the source to download.


tutorial

The old routine, grab the bag.

86068d6805d62e2a75b038de0dad1905.pngOnly a single .m3u8 link is involved here. The first m3u8 package is an advertisement, you can download it yourself.

Directly lock the second one, get the request, and get the address.

Parse m3u8, that is, initiate a request to see

url = 'https://www.ldxmcloud.com/20230224/McubV3Kc/1100kb/hls/index.m3u8'
headers = {
    'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.62'
}
reslut = requests.get(url=url,headers=headers)
reslut.encoding='utf-8'
print(reslut.text)

In this way, you can get a bunch of .....ts things, sometimes there will be confusion, for example, here is a .jpg file, but it’s okay, click it and it will be reloaded to .ts.

"But also pay attention, the suffix is ​​.jpg, we need to change it later"

Then we use regular expressions to match all https://....jpg files.

If you don't know how to do it, you won't teach it. Let's learn by yourself.

res = re.findall('(https:.*jpg)',reslut.text)

Ok, got all the .jpg files. Next came the request.

for value in res:
    vodie = requests.get(url=value, headers=headers)
    with open('report/' + value[-12:-4] + '.ts','ab+') as w:
        w.write(vodie.content)
        print("加载成功~",value)

Loop out each jpg file, pay attention to the save format gaicheng.ts. Write with ab+, the others are gone, just write as usual.

In this way, I can see the ts files one by one under my report.

1a874a70fea3abf8e83faf4803a7f502.png

Single thread is a bit slow, sometimes the request will get stuck, so you can close it and run it again.

"I haven't finished the request here, it's just an example, you can run it and try it yourself—————It is recommended to open multi-threading"

Finally, the key thing is to merge ts into an MP4 file. There are a lot of Baidu, just pick one up.

from tqdm import tqdm
import os

path = 'report/'
files = os.listdir(path)
print(files)
for file in tqdm(files, desc="正在转换视频格式:"):
        if os.path.exists(path + file):
            with open(path + file, 'rb') as f1:
                with open(path + "电影.mp4", 'ab') as f2:
                    f2.write(f1.read())
        else:
            print("失败")

"tqdm is optional, it's just a progress bar, you need a third party to download pip."

After running, you can see an mp4 file, and then you can play it smoothly.


Note: The .m3u8 here does not have any encryption. Many movies are actually encrypted and need to be decrypted. You can learn about the m3u8 library.

Finally: gitee address~ Take the code, single-threaded, packaged. https://gitee.com/qinganan_admin/reptile-case/tree/master/%E7%94%B5%E5%BD%B1

Guess you like

Origin blog.csdn.net/weixin_52040868/article/details/130023787