Moments Weibo feed flow, push and pull practice

        Previous " feed flow pull, read diffusion, what exactly is it? "About the pull or push of the feed stream, I only wrote half of the "pull", and today I will introduce the other half of "push" (write diffusion). In order to compare the two schemes of "pull" and "push", and to avoid switching between the two articles, the background and "pull" (read diffusion) scheme will be briefly described first.

Feed business features : There is a friend relationship similar to "followers/fans" , and the personal homepage is composed of feed messages posted by others .

Typical actions of feed business : follow, unfollow, publish feed, pull your own homepage feed stream.

Feed business core data : relational data, feed data.

1. Introduction to the "read diffusion" scheme in pull mode

For example, there are four users ABCD in a feed system, among which:

  • A follows BC, D follows B

Its relationship storage also includes follower relationship and fan relationship. The subtext of "A follows BC, D follows B" is "B has two fans AD, and C has one fan A".

  • B has published four feeds: msg1, msg3, msg5, msg10
  • C has published two feeds: msg2, msg8

Each user has a feed queue, which records all the feed data he has ever published.

In the pull mode, the process of publishing a feed is very simple. For example, C newly released a msg12:

At this point, you only need to add a feed to C's feed queue.

In the pull mode, the process of unfollowing is also very simple, for example, A unfollows C:

At this point, you only need to delete C from A's watch list, and delete A from C's fan list.

In the pull mode, the process for user A to obtain "the home page composed of feeds published by others" is extremely complicated. At this time, it is necessary to:

  • Get A's watch list;
  • Get the feeds published by all users in the followed list;
  • Sort the messages by rank (assuming they are sorted by release time), and extract the corresponding page of feeds by paging.

The advantages of the pull mode ("read diffusion") of the feed stream are:

  • The storage structure is simple, the data storage capacity is small, and only one copy of relational data and feed data is stored;
  • The business process of following, unfollowing, and publishing feeds is very simple;
  • The storage structure and business process are relatively easy to understand, and are suitable for rapid implementation when the number of users, data, and concurrency are not large in the early stage of the project.

The disadvantages are also obvious:

  • The business process of pulling the feed flow list of the circle of friends is very complicated;
  • There are multiple data accesses, and a large amount of memory calculation and network transmission are required, and the performance is low.

2. Brief introduction of push mode "write diffusion" scheme

In push mode (write flooding), relational data is stored exactly the same as in pull mode (read flooding).

feed data, each user also stores the feeds they publish .

As shown above:

  • B has posted 1, 3, 5, 10
  • C once released 2, 8
  • (It may be assumed that the msgid here is in partial order according to the release time of the feed.)

Feed data storage is different from pull (read diffusion) in that each user also needs to store the feed stream it receives .

As shown above:

  • A pays attention to BC, so A's receive queue is 1, 2, 3, 5, 8, 10
  • D follows B, so D's accept queue is 1, 3, 5, 10

In the push mode (writing diffusion), it will be very simple to obtain "the homepage composed of feeds published by others". Suppose there are 3 feeds on one page. If A wants to read the second page of his circle of friends, he can directly return 1, 2, 3 will do. ( The circle of friends on the first page is the latest news, that is, 5, 8, 10 )

In push mode (write flooding), the process of publishing a feed is a bit more complicated.

For example, B newly released a msg12:

  • Add message 12 to B's release feed store
  • Query B all fans AD
  • Add message 12 to the receiving feed storage of fan AD

The reason why this scheme is called push mode (write diffusion) is because when a user publishes a feed: 

  • Push the feed directly to the receiving list of fans , so it is called "push mode"
  • Not only write and release feed storage, but also write multiple fans’ receiving feed storage , so it is called "write diffusion"

In push mode (write flooding), the process of adding attention also becomes complicated.

For example, D adds attention to C:

  • Add C to D's concern store
  • Add D to C's fans store
  • Add the feed released by C to the receiving feed storage of D

In push mode (write diffusion), the process of unfollowing also becomes complicated.

For example A unfollows C:

  • Delete C in A's concern store
  • Delete A in C's fan store
  • Delete the feed published by C in A's receiving feed store 

The advantages of the push mode (write diffusion) of the feed stream are:

  • The IO concentration point of the pull mode (read diffusion) is eliminated, each user reads its own data, and there is less competition for high-concurrency locks and locks;

( In the pull mode (read diffusion), user-published feed storage is easily called an IO bottleneck. )

  • The business process of pulling the feed flow list of Moments is extremely simple and fast;
  • Pulling the list of feed streams from the circle of friends does not require a lot of memory calculations, network transmission, and high performance;

( The feed business is a typical business scenario that reads more and writes less, and the read-write ratio is even higher than 100:1, that is, an average of 1 message is published, and there are at least 100 reads. )

Its disadvantages are:

  • It greatly consumes storage resources , and feed data will be stored in many copies. For example, Yang Mi's 5KW fans, every time she posts a blog post, the message will be redundant by 5KW;

( A friend suggested that a copy of the message entity can be stored, and only the msgid is redundant. In this case, when pulling the feed stream list, the entity must be pulled again, and the network delay will be longer, so many companies choose to directly redundant message entities , of course, this is a compromise design between user experience and storage capacity. )

  • The business flow of adding followers, unfollowing, and publishing feeds will be more complicated.

3. Summary

Summary of the push-pull mode of feed streaming business:

  • Pull mode, read diffusion, save a copy of feed, small storage, centralized access to data by users, poor performance;

  • Push mode, write diffusion, save multiple copies of feed, use redundant storage to change lock conflicts, high performance;

--------------------- 
Reposted from: WeChat public account "The Road to Architects"

Guess you like

Origin blog.csdn.net/my8688/article/details/88377397