ScrapeKit and Swift writing programs

The following is a crawler program written using ScrapeKit and Swift for crawling images. At the same time, we used the proxy code to obtain the proxy.

import ScrapeKit
​
class PeopleImageCrawler: NSObject, ScrapeKit.Crawler {
    let url: URL
    let proxyUrl: URL
​
    init(url: URL, proxyUrl: URL) {
        self.url = url
        self.proxyUrl = proxyUrl
    }
​
    func crawl() -> [String: Any] {
        var images = [String]()
        
        let html = try? String(contentsOf: url, encoding: .utf8)
        let doc = try? HTML(html: html, encoding: .utf8)
        
        if let imgElems = doc?.css("img") {
            for imgElem in imgElems {
                if let imgUrl = imgElem.attr("data-src") {
                    images.append(imgUrl)
                }
            }
        }
        
        return ["images": images]
    }
}
​
let targetUrl = URL(string: "https://www.people.com.cn")!
let proxyUrl = URL(string: "https://www.duoip.cn/get_proxy")!
​
let crawler = PeopleImageCrawler(url: targetUrl, proxyUrl: proxyUrl)
let result = crawler.crawl()
​
print(result)

This program first imports the ScrapeKit library, and then defines a class called PeopleImageCrawler, which inherits from ScrapeKit.Crawler. We provide an initialization method for passing in the target URL and proxy URL. In the crawl method, we use the ScrapeKit library to parse the HTML document and find all <img> tags. If found, we add the image's data-src attribute value to the images array. Finally, we return the images array as a key-value pair of the dictionary.

In the main function, we create a targetUrl and a proxyUrl, and then instantiate an object of the PeopleImageCrawler class. Next, we call the crawler.crawl() method to start crawling and print the results.

Guess you like

Origin blog.csdn.net/weixin_73725158/article/details/133981306