python 练习0009 - 代码天地

python 练习0009

其他 2019-04-16 10:51:51 阅读次数: 0

版权声明：喜欢就点个赞吧，有啥疑问可以留言交流~ https://blog.csdn.net/m0_38015368/article/details/89298260

问题

一个HTML文件，找出里面的链接。

代码

import requests, re
from bs4 import BeautifulSoup

def get_html_text(url):
    try:
        r = requests.get(url)
        r.encoding = 'utf-8'
        return r.text
    except:
        return ''

def get_urls(html, base_url):
    soup = BeautifulSoup(html, 'html.parser')
    urls = set()
    for url in soup.find_all('a'):
        # 可能 <a> 标签中无 href 属性
        try:
            url = url['href']
            # 绝对路径 http://www.baidu.com/path/index.php?q=1
            absolute_url = r'((http|https|ftp)://)?(\w+)(\.\w+)+'
            # 相对路经 ./index.php
            relative_url = r'\.?(/\w+)+/?'

            absolute_url_pattern = re.compile(absolute_url, re.IGNORECASE))
            relative_url_pattern = re.compile(relative_url)

            if absolute_url_pattern.match(url):
                urls.add(url)
                continue
            if relative_url_pattern.match(url):
                url = base_url + url
                urls.add(url)
                continue                
        except:
            continue
    return urls
        

def show(urls):
    for url in urls:
        print(url)

if __name__ == '__main__':
    url = 'http://zzzsdust.com'
    html = get_html_text(url)
    urls = get_urls(html, url)
    show(urls)

猜你喜欢

转载自blog.csdn.net/m0_38015368/article/details/89298260

python 练习0009

Python-每日习题-0009-time

python练习

python 练习

python练习：

练习——python

PYTHON学习0009：浮点数及其精确度、科学计数法----2019-6-7

Leetcode题解 0009期

周记0008：0009

面试题0009

#0009. 移动硬币

Leetcode 0009: Palindrome Number

Python练习(十一)——爬虫练习

【练习】Python练习题

Python篇：Hello Python练习

Python练习汇总

Python练习—循环

python练习1

Python运维练习

Python编程练习

Python练习之pillow

python列表的回顾练习

Python练习题

python练习---模块学习

python练习---模拟sql

Python 练习题

python习题练习

Python练习--1

python 正则练习

Python 练习实例2

今日推荐

技术解析 GPT-4o：即时语音交互的突破与 GenAI 发展策略

开源大模型与闭源大模型

微信小程序授权登录获取用户的openid

亿级流量系统架构设计与实战

人工智能时代的程序设计教学与课程设计

纽交所技术问题致伯克希尔 (BRK.A) 显示跌近 100%

探索 api.maynor1024.live：一站式 AI 服务平台

AI一键去衣技术：窥见深度学习在图像处理领域的革命(最后有彩蛋)

艾体宝案例 | 使用Redis和Spring Ai构建rag应用程序

Apple M1 vs 高通8Gen2 vs Apple A12Z各方面比较

【升职加薪必备架构图】Springboot学习路线汇总_springboot四层架构流程图

与Apollo共创生态：Apollo7周年大会自动驾驶生态利剑出鞘

周排行

事务隔离级及脏读、幻读和不可重复读

rtos：zephyr同步信号量

把对象转换为JSON格式的数据

iOS Dev (56) iTunes Store 销售日报更新时间

Failed to start mongod.service: Unit not found;mongodb in unbuntu

Upgrading PHP on CentOS 6.5 (Final)

（四）王道机试指南___排版问题

TensorFlow之手写体识别

xcode xib报错 Safe Area Layout Guide Before IOS 9.0

【LeetCode】76. Minimum Window Substring（C++）

每日归档

更多

2024-06-05(0)

2024-06-04(10)

2024-06-03(52)

2024-06-02(4)

2024-06-01(60)

2024-05-31(47)

2024-05-30(4)

2024-05-29(65)

2024-05-28(2)

2024-05-27(56)