初识nokogiri

Nokogiri是一个HTML、XML、SAX等解析器。它可以通过CSS3或者XPath来检索文档。具有如下特征:

  • XML/HTML DOM parser which handles broken HTML
  • XML/HTML SAX parser
  • XML/HTML Push parser
  • XPath 1.0 support for document searching
  • CSS3 selector support for document searching
  • XML/HTML builder
  • XSLT transformer

安装命令:

gem install nokogiri

这里截取官网上的一段代码示例:

require 'nokogiri'
require 'open-uri'

# Fetch and parse HTML document
doc = Nokogiri::HTML(open('http://www.nokogiri.org/tutorials/installing_nokogiri.html'))

puts "### Search for nodes by css"
doc.css('nav ul.menu li a', 'article h2').each do |link|
  puts link.content
end

puts "### Search for nodes by xpath"
doc.xpath('//nav//ul//li/a', '//article//h2').each do |link|
  puts link.content
end

puts "### Or mix and match."
doc.search('nav ul.menu li a', '//article//h2').each do |link|
  puts link.content
end

运行结果如下:

发布了30 篇原创文章 · 获赞 10 · 访问量 5262

猜你喜欢

转载自blog.csdn.net/wufeng_no1/article/details/86703902