In the third article, let's talk about converting structured data into an RSS feed that can be subscribed to.
written in front
Through the first two articles "RSS Can: Using Golang to Achieve Better RSS Hub Service (1)" and "RSS Can: Using V8 to Make Golang Applications Dynamic (2)" , we have been able to integrate the information on the website , organized into structured data through dynamic configuration.
In this article, let's briefly talk about how to turn these structured data into subscribable RSS feeds, so that the data of the website can be "connected" with our RSS reader.
RSS format standard
Before talking about code implementation, whether as a developer or an RSS product user, it is very necessary to understand the RSS format standard.
There are three well-known genres of "RSS" format standards on the Internet, namely: Atom , RSS , and JSON Feed . The third type appeared in the decline of RSS, and there are few applications and voices. Therefore, the formats supported by major network applications are all in Focus on the first two: RSS and Atom.
TLDR, to put it simply, if you are a content provider and you want your content to be accessed by more people using various RSS clients, choosing the supported RSS 2.0 will maintain very good compatibility. If you are a reader, considering the continuous tracking of article updates and a better reading experience, when the website provides multiple RSS feed formats at the same time, you might as well choose the RSS feed in Atom format first .
Of course, in this article, we will use the open source software library to output the data organized in the previous two articles into three formats. ( no cost anyway )
Key advantages of the Atom format over RSS 2.0
If you don't want to do detailed development for "RSS", we only know how to use it, and this section can be skipped.
- Ability to mark whether the HTML content in the field has been escaped or encoded, which is convenient for developers to use the data when rendering.
- It is no longer necessary to mix both the "body" and "abstract" of content in
description
the field , a newsummary
field is provided that distinguishes between "abstract" and "body", while allowing non-text content to be added to the body. - "RSS" exists in several variants, Atom is more stable and consistent.
- Provides a namespace that conforms to XML standards, can use XML built-in tags to support the description of relative addresses, can use XML built-in tags to tell subscribers the content language, and supports XML Schema, which RSS 2.0 does not have.
- Each information item has a unique ID, and subscribers can track the update of specific content.
- There is a unified and clear time expression specification, which is convenient for the program to process.
application/atom+xml
The MIME media type that is registered with IANA , making it a standard specification, the one used by RSSapplication/rss+xml
has not yet been standardized.
Convert data to RSS feed format using Go
There are many software packages in the Go ecosystem that support generating RSS feeds, and I chose gorilla/feeds, which has a ten-year maintenance history . Although on the 9th of this month, the maintenance team announced that all warehouses in the open source organization will enter a "dormant state" (archive) and will no longer be maintained.
However, for our needs, RSS is an "old and stable" protocol, and gorilla/feeds has been verified for a long time, so it is more appropriate to choose to use it. In addition, for such projects that are not actively maintained or are no longer maintained, Go's special package management method can also be used to help us manage code and make code maintenance changes, which we will mention in subsequent articles.
General Use of Gorilla Feeds
Let's first understand how to use Gorilla Feeds to generate feeds in RSS Feed format, first introduce the package:
import (
"time"
"github.com/gorilla/feeds"
)
The reason why it is introduced here at the same time time
is because I don't want to bother to manually create data. Because different RSS formats have different requirements for time, it may be more appropriate to discuss the processing of time in a follow-up article.
Let's take the previously published article as an example and write a piece of Mock data, which will be used to test the generation of RSS feeds:
now := time.Now()
feed := &feeds.Feed{
Title: "苏洋博客",
Link: &feeds.Link{
Href: "https://soulteary.com/"},
Description: "醉里不知天在水,满船清梦压星河。",
Author: &feeds.Author{
Name: "soulteary", Email: "[email protected]"},
Created: now,
}
feed.Items = []*feeds.Item{
{
Title: "RSS Can:借助 V8 让 Golang 应用具备动态化能力(二)",
Link: &feeds.Link{
Href: "https://soulteary.com/2022/12/13/rsscan-make-golang-applications-with-v8-part-2.html"},
Description: "继续聊聊之前做过的一个小东西的踩坑历程,如果你也想高效获取信息,或许这个系列的内容会对你有用。",
Author: &feeds.Author{
Name: "soulteary", Email: "[email protected]"},
Created: now,
},
{
Title: "RSS Can:使用 Golang 实现更好的 RSS Hub 服务(一)",
Link: &feeds.Link{
Href: "https://soulteary.com/2022/12/12/rsscan-better-rsshub-service-build-with-golang-part-1.html"},
Description: "聊聊之前做过的一个小东西的踩坑历程,如果你也想高效获取信息,或许这个系列的内容会对你有用。这个事情涉及的东西比较多,所以我考虑拆成一个系列来聊,每篇的内容不要太长,整理负担和阅读负担都轻一些。本篇是系列第一篇内容。",
Author: &feeds.Author{
Name: "soulteary", Email: "[email protected]"},
Created: now,
},
{
Title: "在搭载 M1 及 M2 芯片 MacBook设备上玩 Stable Diffusion 模型",
Link: &feeds.Link{
Href: "https://soulteary.com/2022/12/10/play-the-stable-diffusion-model-on-macbook-devices-with-m1-and-m2-chips.html"},
Description: "本篇文章,我们聊了如何使用搭载了 Apple Silicon 芯片(M1 和 M2 CPU)的 MacBook 设备上运行 Stable Diffusion 模型。",
Created: now,
},
{
Title: "使用 Docker 来快速上手中文 Stable Diffusion 模型:太乙",
Link: &feeds.Link{
Href: "https://soulteary.com/2022/12/09/use-docker-to-quickly-get-started-with-the-chinese-stable-diffusion-model-taiyi.html"},
Description: "本篇文章,我们聊聊如何使用 Docker 快速运行中文 Stable Diffusion 模型:太乙。 ",
Created: now,
},
}
Then, write a simple call statement, and the data can be "converted" into the result we need:
atom, err := feed.ToAtom()
if err != nil {
log.Fatal(err)
}
rss, err := feed.ToRss()
if err != nil {
log.Fatal(err)
}
json, err := feed.ToJSON()
if err != nil {
log.Fatal(err)
}
fmt.Println(atom, "\n", rss, "\n", json)
Put the above code into a function that can be called for testing (for example main
), after the program is executed, we will see results similar to the following:
<?xml version="1.0" encoding="UTF-8"?><feed xmlns="http://www.w3.org/2005/Atom">
<title>苏洋博客</title>
<id>https://soulteary.com/</id>
<updated>2022-12-14T12:29:55+08:00</updated>
<subtitle>醉里不知天在水,满船清梦压星河。</subtitle>
<link href="https://soulteary.com/"></link>
<author>
<name>soulteary</name>
<email>soulteary@gmail.com</email>
</author>
<entry>
<title>RSS Can:借助 V8 让 Golang 应用具备动态化能力(二)</title>
<updated>2022-12-14T12:29:55+08:00</updated>
<id>tag:soulteary.com,2022-12-14:/2022/12/13/rsscan-make-golang-applications-with-v8-part-2.html</id>
<link href="https://soulteary.com/2022/12/13/rsscan-make-golang-applications-with-v8-part-2.html" rel="alternate"></link>
<summary type="html">继续聊聊之前做过的一个小东西的踩坑历程,如果你也想高效获取信息,或许这个系列的内容会对你有用。</summary>
<author>
<name>soulteary</name>
<email>soulteary@qq.com</email>
</author>
</entry>
...
...
</feed>
<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
<title>苏洋博客</title>
<link>https://soulteary.com/</link>
<description>醉里不知天在水,满船清梦压星河。</description>
<managingEditor>soulteary@gmail.com (soulteary)</managingEditor>
<pubDate>Wed, 14 Dec 2022 12:29:55 +0800</pubDate>
<item>
<title>RSS Can:借助 V8 让 Golang 应用具备动态化能力(二)</title>
<link>https://soulteary.com/2022/12/13/rsscan-make-golang-applications-with-v8-part-2.html</link>
<description>继续聊聊之前做过的一个小东西的踩坑历程,如果你也想高效获取信息,或许这个系列的内容会对你有用。</description>
<author>soulteary</author>
<pubDate>Wed, 14 Dec 2022 12:29:55 +0800</pubDate>
</item>
<item>
<title>RSS Can:使用 Golang 实现更好的 RSS Hub 服务(一)</title>
<link>https://soulteary.com/2022/12/12/rsscan-better-rsshub-service-build-with-golang-part-1.html</link>
<description>聊聊之前做过的一个小东西的踩坑历程,如果你也想高效获取信息,或许这个系列的内容会对你有用。这个事情涉及的东西比较多,所以我考虑拆成一个系列来聊,每篇的内容不要太长,整理负担和阅读负担都轻一些。本篇是系列第一篇内容。</description>
<author>soulteary</author>
<pubDate>Wed, 14 Dec 2022 12:29:55 +0800</pubDate>
</item>
...
...
</channel>
</rss>
{
"version": "https://jsonfeed.org/version/1",
"title": "苏洋博客",
"home_page_url": "https://soulteary.com/",
"description": "醉里不知天在水,满船清梦压星河。",
"author": {
"name": "soulteary"
},
"items": [
{
"id": "",
"url": "https://soulteary.com/2022/12/13/rsscan-make-golang-applications-with-v8-part-2.html",
"title": "RSS Can:借助 V8 让 Golang 应用具备动态化能力(二)",
"summary": "继续聊聊之前做过的一个小东西的踩坑历程,如果你也想高效获取信息,或许这个系列的内容会对你有用。",
"date_published": "2022-12-14T12:29:55.50867+08:00",
"author": {
"name": "soulteary"
}
},
...
...
]
}
The log results output above include the three formats mentioned above, which can cover the subscription usage of most RSS clients.
Link to information from the website
In the previous article, we parsed the target website through dynamic configuration in the previous article, and converted the information in the website into a data structure in Go. After understanding how Gorilla Feeds outputs RSS format, we only need to "connect" the two together to get the news feed in RSS format.
First, make some adjustments to the function of "parsing website information according to configuration" mentioned above:
func getWebsiteDataWithConfig(config define.JavaScriptConfig) (result define.BodyParsed) {
doc := network.GetRemoteDocument("https://36kr.com/", "utf-8")
if doc.Body == "" {
return result
}
return parser.ParsePageByGoQuery(doc, func(document *goquery.Document) []define.InfoItem {
var items []define.InfoItem
document.Find(config.ListContainer).Each(func(i int, s *goquery.Selection) {
var item define.InfoItem
title := strings.TrimSpace(s.Find(config.Title).Text())
author := strings.TrimSpace(s.Find(config.Author).Text())
time := strings.TrimSpace(s.Find(config.DateTime).Text())
category := strings.TrimSpace(s.Find(config.Category).Text())
description := strings.TrimSpace(s.Find(config.Description).Text())
href, _ := s.Find(config.Link).Attr("href")
link := strings.TrimSpace(href)
item.Title = title
item.Author = author
item.Date = time
item.Category = category
item.Description = description
item.Link = link
items = append(items, item)
})
return items
})
}
When the above function runs normally, you can get an array containing structured data.
Next, write a simple function that calls Gorilla Feeds to generate the RSS feed we need:
func generateFeeds(data define.BodyParsed) {
now := time.Now()
rssFeed := &feeds.Feed{
Title: "36Kr",
Link: &feeds.Link{
Href: "https://36kr.com/"},
Created: now,
}
for _, data := range data.Body {
feedItem := feeds.Item{
Title: data.Title,
Author: &feeds.Author{
Name: data.Author},
Description: data.Description,
Link: &feeds.Link{
Href: data.Link},
// 时间处理这块比较麻烦,后续文章再展开
Created: now,
}
rssFeed.Items = append(rssFeed.Items, &feedItem)
}
atom, err := rssFeed.ToAtom()
if err != nil {
log.Fatal(err)
}
rss, err := rssFeed.ToRss()
if err != nil {
log.Fatal(err)
}
json, err := rssFeed.ToJSON()
if err != nil {
log.Fatal(err)
}
fmt.Println(atom, "\n", rss, "\n", json)
}
Finally, adjust the calling function of the program so that we can test and print the RSS generation result to the terminal log:
func main() {
jsApp, _ := os.ReadFile("./config/config.js")
inject := string(jsApp)
jsConfig, err := javascript.RunCode(inject, "JSON.stringify(getConfig());")
if err != nil {
fmt.Println(err)
return
}
config, err := parser.ParseConfigFromJSON(jsConfig)
if err != nil {
fmt.Println(err)
return
}
data := getWebsiteDataWithConfig(config)
generateFeeds(data)
}
go run main.go
Executing the program with , we get the expected result:
<?xml version="1.0" encoding="UTF-8"?><feed xmlns="http://www.w3.org/2005/Atom">
<title>36Kr</title>
<id>https://36kr.com/</id>
<updated>2022-12-14T13:41:37+08:00</updated>
<link href="https://36kr.com/"></link>
<entry>
<title>iOS 16.2来了,这7个新功能值得关注</title>
<updated>2022-12-14T13:41:37+08:00</updated>
<id>tag:,2022-12-14:/p/2043412066405640</id>
<link href="/p/2043412066405640" rel="alternate"></link>
<summary type="html">Apple 画的饼终于来了。</summary>
<author>
<name>少数派</name>
</author>
<entry>
<title>如何更好地思考:人只能获得自己认知内的成就</title>
<updated>2022-12-14T13:41:37+08:00</updated>
<id>tag:,2022-12-14:/p/2018320727015942</id>
<link href="/p/2018320727015942" rel="alternate"></link>
<summary type="html">5个原则,让你成为一个更好的思考者。</summary>
<author>
<name>神译局</name>
</author>
</entry>
...
Now that the data format that the RSS client can use is settled, let's solve the last step of "RSS subscription", start a simple Web service, and turn the above data into an accessible interface address.
Use Gin to handle RSS web services
Gin is an excellent HTTP web framework. It is not necessarily the fastest framework among all frameworks in the Go ecosystem, but it is definitely among the best in terms of community activity and ease of use.
Start a simple web service with Gin
Gin encapsulates net/http
the capabilities and provides a simple calling method, allowing us to start a web service, such as the following code of less than 20 lines:
package main
import (
"net/http"
"github.com/gin-gonic/gin"
)
func main() {
r := gin.Default()
r.GET("/ping", func(c *gin.Context) {
c.JSON(http.StatusOK, gin.H{
"message": "pong",
})
})
r.Run()
}
After the above code is run, a web service will be started, and the default service address is http://localhost:8080
. When we visit in the browser /ping
, the server will respond and return pong
.
Make RSS subscription data interface
As mentioned above, since there is no cost to generate RSS in different formats, we can support them all and respond to requests from various RSS clients.
When actually providing services, we need to output different data according to the RSS format type requested by the client. Therefore, we need to adjust the function we used to generate the RSS feed above so that it supports generating content according to the type in the request parameter:
func generateFeeds(data define.BodyParsed, rssType string) string {
now := time.Now()
rssFeed := &feeds.Feed{
Title: "36Kr",
Link: &feeds.Link{
Href: "https://36kr.com/"},
Created: now,
}
for _, data := range data.Body {
feedItem := feeds.Item{
Title: data.Title,
Author: &feeds.Author{
Name: data.Author},
Description: data.Description,
Link: &feeds.Link{
Href: data.Link},
// 时间处理这块比较麻烦,后续文章再展开
Created: now,
}
rssFeed.Items = append(rssFeed.Items, &feedItem)
}
var rss string
var err error
switch rssType {
case "RSS":
rss, err = rssFeed.ToRss()
case "ATOM":
rss, err = rssFeed.ToAtom()
case "JSON":
rss, err = rssFeed.ToJSON()
default:
rss = ""
}
if err != nil {
fmt.Println(err)
return ""
}
return rss
}
After completing the adjustment of the generation function, let's complete a simple function implementation that supports calling the above function to output RSS feeds in different formats according to different API request paths:
route := gin.Default()
route.GET("/:type/", func(c *gin.Context) {
var rssType RSSType
if err := c.ShouldBindUri(&rssType); err != nil {
c.JSON(http.StatusNotFound, gin.H{
"msg": err})
return
}
var response string
var mimetype string
switch strings.ToUpper(rssType.Type) {
case "RSS":
mimetype = "application/rss+xml"
response = generateFeeds(data, "RSS")
case "ATOM":
mimetype = "application/atom+xml"
response = generateFeeds(data, "ATOM")
case "JSON":
mimetype = "application/feed+json"
response = generateFeeds(data, "JSON")
}
c.Data(http.StatusOK, mimetype, []byte(response))
})
route.Run(":8080")
Start the service, we visit any address in http://localhost:8080/rss
, http://localhost:8080/atom
, and you can see the data of the RSS feed in the browser.http://localhost:8080/json
There are many RSS subscription tools that support automatic detection of RSS feeds based on tags in web pages, such as Reeder.
In order to facilitate our testing in Reeder, we can write the above RSS feed address into an HTML page, and then "bind" to /
the root :
const hello = `<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>RSS Feed Discovery.</title>
<link rel="alternate" type="application/rss+xml" title="RSS 2.0 Feed" href="http://localhost:8080/rss">
<link rel="alternate" type="application/atom+xml" title="RSS Atom Feed" href="http://localhost:8080/atom">
<link rel="alternate" type="application/rss+json" title="RSS JSON Feed" href="http://localhost:8080/json">
</head>
<body>
RSS Feed Discovery.
</body>
</html>`
route.GET("/", func(c *gin.Context) {
c.Data(http.StatusOK, "text/html", []byte(hello))
})
Re-run the program, when we http://127.0.0.1:8080
input , Reeder will inform us that three feeds have been found. Because the data of the three feeds are the same, you can choose any one here (Atom is recommended).
Click the "Subscribe" button, and the information from the website will appear in Reeder's information list.
So far, we have initially solved the subscription problem of some information sources that cannot be subscribed by the RSS subscription tool mentioned in the first article. As for the "keyword screening" and "NLP content summary aggregation" mentioned in the previous two articles, we will continue to expand in subsequent articles.
Other: a hidden memory leak hidden danger
In the previous article, in order to safely run external JavaScript code that may have an "infinite loop", we used the following code to solve the problem:
duration := time.Since(start)
select {
case val := <-vals:
fmt.Fprintf(os.Stderr, "cost time: %v\n", duration)
return val, nil
case err := <-errs:
return nil, err
case <-time.After(JS_EXECUTE_TIMEOUT):
vm := ctx.Isolate()
vm.TerminateExecution()
err := <-errs
fmt.Fprintf(os.Stderr, "execution timeout: %v\n", duration)
time.Sleep(JS_EXECUTE_THORTTLING)
return nil, err
}
The classmate @Etran in the tossing group today reminded that there is a hidden memory leak problem , which time.After()
may be executed later than we receive vals
the data , resulting in the timer not being released correctly.
So, how to solve this problem? Correcting the code is simple:
duration := time.Since(start)
timeout := time.NewTimer(define.JS_EXECUTE_TIMEOUT)
select {
case val := <-vals:
if !timeout.Stop() {
<-timeout.C
}
fmt.Fprintf(os.Stderr, "cost time: %v\n", duration)
return val, nil
case err := <-errs:
return nil, err
case <-timeout.C:
timeout.Stop()
vm := ctx.Isolate()
vm.TerminateExecution()
err := <-errs
fmt.Fprintf(os.Stderr, "execution timeout: %v\n", duration)
time.Sleep(define.JS_EXECUTE_THORTTLING)
return nil, err
}
at last
When writing this article, I reviewed the development history of RSS and the career history of the core soul figure David Winter, trying to use my perspective to briefly describe the wonderful moments in the long history of RSS.
When the article was about to be published, I changed my mind. Perhaps the story about RSS should be published at the end of this series of articles.
–EOF
We have a small tossing group, which gathers some friends who like tossing.
In the absence of advertisements, we will chat about software and hardware, HomeLab, and programming issues together, and will also share some information about technical salons in the group from time to time.
Friends who like tossing, welcome to read the following content, scan the code to add friends.
- Some suggestions and opinions about "making friends"
- To add a friend, please note the real name and company or school, and indicate the source and purpose, otherwise it will not pass the review.
- Those things about tossing the group into the group
This article uses the "Signature 4.0 International (CC BY 4.0)" license agreement. You are welcome to reprint or re-use it, but you need to indicate the source. Attribution 4.0 International (CC BY 4.0)
Author of this article: Su Yang
Created time: December 14, 2022
Counted words: 11361 words
Reading time: 23 minutes Read
this link: https://soulteary.com/2022/12/14/rsscan-convert-website-information-stream-to-rss -feed-part-3.html