Friends who write crawlers must know about browser automation, such as Selenium or Puppeteer. Among them, I use Selenium more, which is a tool for web application testing. Selenium tests run directly in the browser, just like a real user. So we will use Selenium to imitate the user to operate the browser to crawl data.
The development language used before is Python, let's try Go selenium today.
Install
A dependency library I am currently using is github.com/tebeka/sele… , which is relatively complete and under maintenance.
go get -t -d github.com/tebeka/selenium
复制代码
In addition, we need to install for different types of browsers WebDriver
, Google Chrome needs to install ChromeDriver , Firefox needs to install geckodriver .
case
Here we use Google Chrome, we first need to specify the location of ChromeDriver and start a WebDriver server, and then we can start operating the browser.
package main
import (
"fmt"
"os"
"strings"
"time"
"github.com/tebeka/selenium"
)
const (
chromeDriverPath = "/path/to/chromedriver"
port = 8080
)
func main() {
// Start a WebDriver server instance
opts := []selenium.ServiceOption{
selenium.Output(os.Stderr), // Output debug information to STDERR.
}
selenium.SetDebug(true)
service, err := selenium.NewChromeDriverService(chromeDriverPath, port, opts...)
if err != nil {
panic(err) // panic is used only as an example and is not otherwise recommended.
}
defer service.Stop()
// Connect to the WebDriver instance running locally.
caps := selenium.Capabilities{"browserName": "chrome"}
wd, err := selenium.NewRemote(caps, fmt.Sprintf("http://localhost:%d/wd/hub", port))
if err != nil {
panic(err)
}
defer wd.Quit()
// Navigate to the simple playground interface.
if err := wd.Get("http://play.golang.org/?simple=1"); err != nil {
panic(err)
}
// Get a reference to the text box containing code.
elem, err := wd.FindElement(selenium.ByCSSSelector, "#code")
if err != nil {
panic(err)
}
// Remove the boilerplate code already in the text box.
if err := elem.Clear(); err != nil {
panic(err)
}
// Enter some new code in text box.
err = elem.SendKeys(`
package main
import "fmt"
func main() {
fmt.Println("Hello WebDriver!")
}
`)
if err != nil {
panic(err)
}
// Click the run button.
btn, err := wd.FindElement(selenium.ByCSSSelector, "#run")
if err != nil {
panic(err)
}
if err := btn.Click(); err != nil {
panic(err)
}
// Wait for the program to finish running and get the output.
outputDiv, err := wd.FindElement(selenium.ByCSSSelector, "#output")
if err != nil {
panic(err)
}
var output string
for {
output, err = outputDiv.Text()
if err != nil {
panic(err)
}
if output != "Waiting for remote server..." {
break
}
time.Sleep(time.Millisecond * 100)
}
fmt.Printf("%s", strings.Replace(output, "\n\n", "\n", -1))
// Example Output:
// Hello WebDriver!
//
// Program exited.
}
复制代码
Summarize
It's not very complicated to use, but it feels that Go Selenium is not very popular, github.com/tebeka/sele... The number of stars on GitHub is only 1k+.