go lang chrome 爬虫 (MAC 系统)

版权声明:原创文章欢迎转载,不过要记得加出处哦 https://blog.csdn.net/wljk506/article/details/82907480

本文是在 mac 系统操作

Chrome headless 模式 介绍

Chrome-headless 模式, Google 针对 Chrome 浏览器 59版 新增加的一种模式,可以让你不打开UI界面的情况下使用 Chrome 浏览器,所以运行效果与 Chrome 保持完美一致。
官方:
https://developers.google.cn/web/updates/2017/04/headless-chrome
其他:
https://linux.cn/article-8850-1.html
https://www.cnblogs.com/fnng/p/7797839.html

安装必备条件

1. golang 环境及环境变量配置

https://blog.csdn.net/fenglailea/article/details/70216964

2. 必要的包(主要是被墙的包)

mkdir -p $GOPATH/src/golang.org/x
cd $GOPATH/src/golang.org/x
git clone https://github.com/golang/net.git #这个就是net包
git clone https://github.com/golang/crypto.git #这个就是crypto包
git clone https://github.com/golang/sys.git
git clone https://github.com/golang/mobile.git
git clone https://github.com/golang/text.git
git clone https://github.com/golang/tools.git
git clone https://github.com/golang/image.git
... 等等 需要的包,请自行替换

3.安装chrome 并设置环境变量

安装 chrome

https://www.google.cn/intl/zh-CN/chrome/
如果你被墙,请自行更换其他下载地址

设置环境变量

来自
https://developers.google.cn/web/updates/2017/04/headless-chrome

vim ~/.bash_profile

最后一行加入

alias chrome="/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome"
alias chrome-canary="/Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary"
alias chromium="/Applications/Chromium.app/Contents/MacOS/Chromium"

使生效

source ~/.bash_profile 

4.XCODE

mac xcode 这个工具必须要安装,否则报错
App Store 搜索 xcode,进行安装(注意:因为此软件非常大,下载速度比较慢)

安装 chromedp

go get -u github.com/chromedp/chromedp

官方案例

https://github.com/chromedp/chromedp

go get -u -d github.com/chromedp/examples

执行案例格式

go run $GOPATH/src/github.com/chromedp/examples/xxxxxx/main.go

xxxxxx : 表示目录名称

具体目录 请看官方 https://github.com/chromedp/examples#available-examples

执行案例

go run $GOPATH/src/github.com/chromedp/examples/eval/main.go

输出:

2018/10/05 11:29:40 <- {"id":1,"method":"Log.enable","params":{}}
2018/10/05 11:29:40 -> {"id":1,"result":{}}
2018/10/05 11:29:40 <- {"id":2,"method":"Runtime.enable","params":{}}
2018/10/05 11:29:40 -> {"method":"Runtime.executionContextCreated","params":{"context":{"id":1,"origin":"://","name":"","auxData":{"isDefault":true,"frameId":"E1EA547C788F3E5457AB5D1B4FB25D83"}}}}
2018/10/05 11:29:40 -> {"id":2,"result":{}}
2018/10/05 11:29:40 <- {"id":3,"method":"Inspector.enable","params":{}}
2018/10/05 11:29:40 -> {"id":3,"result":{}}
2018/10/05 11:29:40 <- {"id":4,"method":"Page.enable","params":{}}
2018/10/05 11:29:40 -> {"id":4,"result":{}}
2018/10/05 11:29:40 <- {"id":5,"method":"DOM.enable","params":{}}
2018/10/05 11:29:40 -> {"id":5,"result":{}}
2018/10/05 11:29:40 <- {"id":6,"method":"CSS.enable","params":{}}
2018/10/05 11:29:40 -> {"id":6,"result":{}}
2018/10/05 11:29:40 <- {"id":7,"method":"Page.getResourceTree","params":{}}
2018/10/05 11:29:40 -> {"id":7,"result":{"frameTree":{"frame":{"id":"E1EA547C788F3E5457AB5D1B4FB25D83","loaderId":"37D3BC533258DA6D9BE27DDC937D8FAF","url":"about:blank","securityOrigin":"://","mimeType":"text/html"},"resources":[]}}}
2018/10/05 11:29:40 <- {"id":8,"method":"DOM.getDocument","params":{"pierce":true}}
2018/10/05 11:29:40 -> {"id":8,"result":{"root":{"nodeId":1,"backendNodeId":1,"nodeType":9,"nodeName":"#document","localName":"","nodeValue":"","childNodeCount":1,"children":[{"nodeId":2,"parentId":1,"backendNodeId":2,"nodeType":1,"nodeName":"HTML","localName":"html","nodeValue":"","childNodeCount":2,"children":[{"nodeId":3,"parentId":2,"backendNodeId":3,"nodeType":1,"nodeName":"HEAD","localName":"head","nodeValue":"","childNodeCount":0,"attributes":[]},{"nodeId":4,"parentId":2,"backendNodeId":4,"nodeType":1,"nodeName":"BODY","localName":"body","nodeValue":"","childNodeCount":0,"attributes":[]}],"attributes":[],"frameId":"E1EA547C788F3E5457AB5D1B4FB25D83"}],"documentURL":"about:blank","baseURL":"about:blank","xmlVersion":""}}}
2018/10/05 11:29:40 <- {"id":9,"method":"Page.navigate","params":{"url":"https://www.baidu.com"}}
2018/10/05 11:29:41 -> {"id":9,"result":{"frameId":"E1EA547C788F3E5457AB5D1B4FB25D83","loaderId":"68AE558838E29AB64CFD2C5506D46BBE"}}
2018/10/05 11:29:41 <- {"id":10,"method":"Runtime.evaluate","params":{"expression":"Object.keys(window);","returnByValue":true}}
2018/10/05 11:29:41 -> {"method":"Page.frameStartedLoading","params":{"frameId":"E1EA547C788F3E5457AB5D1B4FB25D83"}}
2018/10/05 11:29:41 -> {"method":"Runtime.executionContextDestroyed","params":{"executionContextId":1}}
2018/10/05 11:29:41 -> {"method":"Runtime.executionContextsCleared","params":{}}
2018/10/05 11:29:41 -> {"method":"Page.frameNavigated","params":{"frame":{"id":"E1EA547C788F3E5457AB5D1B4FB25D83","loaderId":"68AE558838E29AB64CFD2C5506D46BBE","url":"https://www.baidu.com/","securityOrigin":"https://www.baidu.com","mimeType":"text/html"}}}
2018/10/05 11:29:41 -> {"method":"Runtime.executionContextCreated","params":{"context":{"id":2,"origin":"https://www.baidu.com","name":"","auxData":{"isDefault":true,"frameId":"E1EA547C788F3E5457AB5D1B4FB25D83"}}}}
2018/10/05 11:29:41 -> {"method":"DOM.documentUpdated","params":{}}
2018/10/05 11:29:41 <- {"id":11,"method":"DOM.getDocument","params":{"pierce":true}}
2018/10/05 11:29:41 -> {"id":10,"result":{"result":{"type":"object","value":["postMessage","blur","focus","close","frames","self","window","parent","opener","top","length","closed","location","document","origin","name","history","locationbar","menubar","personalbar","scrollbars","statusbar","toolbar","status","frameElement","navigator","customElements","external","screen","innerWidth","innerHeight","scrollX","pageXOffset","scrollY","pageYOffset","screenX","screenY","outerWidth","outerHeight","devicePixelRatio","clientInformation","screenLeft","screenTop","defaultStatus","defaultstatus","styleMedia","onanimationend","onanimationiteration","onanimationstart","onsearch","ontransitionend","onwebkitanimationend","onwebkitanimationiteration","onwebkitanimationstart","onwebkittransitionend","isSecureContext","onabort","onblur","oncancel","oncanplay","oncanplaythrough","onchange","onclick","onclose","oncontextmenu","oncuechange","ondblclick","ondrag","ondragend","ondragenter","ondragleave","ondragover","ondragstart","ondrop","ondurationchange","onemptied","onended","onerror","onfocus","oninput","oninvalid","onkeydown","onkeypress","onkeyup","onload","onloadeddata","onloadedmetadata","onloadstart","onmousedown","onmouseenter","onmouseleave","onmousemove","onmouseout","onmouseover","onmouseup","onmousewheel","onpause","onplay","onplaying","onprogress","onratechange","onreset","onresize","onscroll","onseeked","onseeking","onselect","onstalled","onsubmit","onsuspend","ontimeupdate","ontoggle","onvolumechange","onwaiting","onwheel","onauxclick","ongotpointercapture","onlostpointercapture","onpointerdown","onpointermove","onpointerup","onpointercancel","onpointerover","onpointerout","onpointerenter","onpointerleave","onafterprint","onbeforeprint","onbeforeunload","onhashchange","onlanguagechange","onmessage","onmessageerror","onoffline","ononline","onpagehide","onpageshow","onpopstate","onrejectionhandled","onstorage","onunhandledrejection","onunload","performance","stop","open","alert","confirm","prompt","print","requestAnimationFrame","cancelAnimationFrame","requestIdleCallback","cancelIdleCallback","captureEvents","releaseEvents","getComputedStyle","matchMedia","moveTo","moveBy","resizeTo","resizeBy","getSelection","find","webkitRequestAnimationFrame","webkitCancelAnimationFrame","fetch","btoa","atob","setTimeout","clearTimeout","setInterval","clearInterval","createImageBitmap","scroll","scrollTo","scrollBy","onappinstalled","onbeforeinstallprompt","crypto","ondevicemotion","ondeviceorientation","ondeviceorientationabsolute","indexedDB","webkitStorageInfo","sessionStorage","localStorage","chrome","visualViewport","speechSynthesis","webkitRequestFileSystem","webkitResolveLocalFileSystemURL","openDatabase","applicationCache","caches","h","_ASYNC_START","_chrome_37_fix","__async_strategy","bds","navigate","al_arr","selfOpen","isIE","E","bdUser","bdQuery","bdUseFavo","bdFavoOn","bdCid","bdSid","bdServerTime","bdQid","bdstoken","login_success"]}}}
2018/10/05 11:29:41 -> {"method":"Inspector.detached","params":{"reason":"Render process gone."}}
2018/10/05 11:29:41 -> { "method": "Inspector.detached", "params": { "reason": "target_closed"} }
2018/10/05 11:29:41 window object keys: [postMessage blur focus close frames self window parent opener top length closed location document origin name history locationbar menubar personalbar scrollbars statusbar toolbar status frameElement navigator customElements external screen innerWidth innerHeight scrollX pageXOffset scrollY pageYOffset screenX screenY outerWidth outerHeight devicePixelRatio clientInformation screenLeft screenTop defaultStatus defaultstatus styleMedia onanimationend onanimationiteration onanimationstart onsearch ontransitionend onwebkitanimationend onwebkitanimationiteration onwebkitanimationstart onwebkittransitionend isSecureContext onabort onblur oncancel oncanplay oncanplaythrough onchange onclick onclose oncontextmenu oncuechange ondblclick ondrag ondragend ondragenter ondragleave ondragover ondragstart ondrop ondurationchange onemptied onended onerror onfocus oninput oninvalid onkeydown onkeypress onkeyup onload onloadeddata onloadedmetadata onloadstart onmousedown onmouseenter onmouseleave onmousemove onmouseout onmouseover onmouseup onmousewheel onpause onplay onplaying onprogress onratechange onreset onresize onscroll onseeked onseeking onselect onstalled onsubmit onsuspend ontimeupdate ontoggle onvolumechange onwaiting onwheel onauxclick ongotpointercapture onlostpointercapture onpointerdown onpointermove onpointerup onpointercancel onpointerover onpointerout onpointerenter onpointerleave onafterprint onbeforeprint onbeforeunload onhashchange onlanguagechange onmessage onmessageerror onoffline ononline onpagehide onpageshow onpopstate onrejectionhandled onstorage onunhandledrejection onunload performance stop open alert confirm prompt print requestAnimationFrame cancelAnimationFrame requestIdleCallback cancelIdleCallback captureEvents releaseEvents getComputedStyle matchMedia moveTo moveBy resizeTo resizeBy getSelection find webkitRequestAnimationFrame webkitCancelAnimationFrame fetch btoa atob setTimeout clearTimeout setInterval clearInterval createImageBitmap scroll scrollTo scrollBy onappinstalled onbeforeinstallprompt crypto ondevicemotion ondeviceorientation ondeviceorientationabsolute indexedDB webkitStorageInfo sessionStorage localStorage chrome visualViewport speechSynthesis webkitRequestFileSystem webkitResolveLocalFileSystemURL openDatabase applicationCache caches h _ASYNC_START _chrome_37_fix __async_strategy bds navigate al_arr selfOpen isIE E bdUser bdQuery bdUseFavo bdFavoOn bdCid bdSid bdServerTime bdQid bdstoken login_success]

FAQ

invalid active developer path (/Library/Developer/CommandLineTools)

解决方法:
App Store 搜索 xcode,进行安装(注意:因为此软件非常大,下载速度比较慢)
其他解决方法:https://blog.csdn.net/zhufuing/article/details/53068185

出现 exit 1 等等情况时(我这里无法重现此情况了)

只要重启系统即可解决
原因:xcode 还没有安装,你就执行 go 运行命令了,并且命令 启动chrome的进程无法自动结束, 所以一直报错

来源

golang使用chrome headless获取网页内容
https://www.cnblogs.com/apocelipes/p/9264673.html

https://github.com/chromedp/chromedp

https://blog.csdn.net/qq_28796345/article/details/79698035

http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/

猜你喜欢

转载自blog.csdn.net/wljk506/article/details/82907480