In-depth analysis of the design and implementation of news subscription system

This article was first published on the official account: More AI (power_ai), welcome to pay attention, programming and AI dry goods will be delivered in time!

In this chapter, you are asked to design a news subscription system. What is a newsletter subscription? According to Facebook's help page, "The News Feed is the constantly updated list of stories in the middle of your Page. The News Feed includes status updates, photos, videos, links, app activity, and updates from the people, pages, and groups you follow on Facebook. Like” [1]. This is a popular interview question. Common similar problems include: Designing Facebook News Feed, Instagram Feed, Twitter Timeline, etc.

image-20230525202508214

Step 1 - Understand the problem and determine the scope of the design

The first set of clarifying questions is to understand what the interviewer was thinking when they asked you to design a newsfeed system. At the very least, you should figure out which features to support. Here is an example of a candidate communicating with an interviewer:

Candidate : Is this a mobile app? Or a web application? Or both?
Interviewer : Both.

Candidate : What are the important features?
Interviewer : A user can make a post and see her friends' posts on the newsfeed page.

Candidate : Are news feeds sorted in reverse chronological order, or in some specific order, like topic score? For example, posts from your close friends score higher.
Interviewer : For simplicity, let's assume subscriptions are sorted in reverse chronological order.

Candidate : How many friends can a user have?
Interviewer : 5000

Candidate : How much traffic is there?
Interviewer : 10 million daily active users

Candidates : Can feeds contain images, videos, or just text?
Interviewer : It can contain media files, including pictures and videos.

Now that you have gathered the requirements, we focus on designing the system.

Step 2 - Propose a high-level design and get approval

The design is divided into two processes: subscription publishing and news subscription construction.

  • Subscription publishing: When a user publishes a post, the corresponding data is written to the cache and database. A post gets pushed to her friend's news feed.
  • News Feed Construction: For simplicity, let's assume that a news feed is constructed by aggregating friends' posts in reverse chronological order.

News Feed API

The News Feed API is the primary way the client communicates with the server. These APIs are HTTP-based and allow clients to perform operations including posting status, retrieving news feeds, adding friends, and more. We discuss the two most important APIs: the subscription publishing API and the news subscription retrieval API.

Subscribe to publish API

To publish a post, an HTTP POST request is sent to the server. The APIs are as follows:

POST /v1/me/feed
parameters:

  • content: content is the text of the post.
  • auth_token: It is used to authenticate API requests.

News Feed Retrieval API

The API for retrieving news feeds looks like this:

GET /v1/me/feed
parameters:

  • auth_token: It is used to authenticate API requests.

Subscribe to publish

Figure 11-2 shows the high-level design of the subscription-publishing flow.

image-20230525202541527

  • User: User can view newsfeed on browser or mobile app. User publishes a post with "Hello" content via the API:
    /v1/me/feed?content=Hello&auth_token={auth_token}
  • Load Balancer: Distributes traffic to web servers.
  • Web server: The web server redirects traffic to different internal services.
  • Post service: Persist posts in database and cache.
  • Fanout service: Push new content to your friends' newsfeeds. News feed data is stored in cache for fast retrieval.
  • Notification service: notify friends that new content is available and send push notifications.

News Feed Build

In this section, we'll discuss how news feeds are built under the hood. Figure 11-3 shows the high-level design:

image-20230525202600750

  • User: A user sends a request to retrieve her news feed. The request is as follows: /v1/me/feed.
  • Load Balancer: A load balancer redirects traffic to web servers.
  • Web server: The web server routes requests to the news subscription service.
  • News subscription service: The news subscription service fetches news subscriptions from the cache.
  • Newsfeed cache: Stores the newsfeed IDs needed to render newsfeeds.

Step Three - In-Depth Design

The high-level design briefly covers two flows: subscription publishing and news feed construction. Here, we discuss these topics in more depth.

Subscription release deep analysis

Figure 11-4 depicts the detailed design of a subscription publication. We've discussed most of the components in the high-level design, and now we'll focus on two components: the Web server and the diffusion service.

image-20230525202628644

web server

In addition to communicating with clients, web servers also perform authentication and rate limiting.

Only users logged in with a valid auth_token can make posts. The system limits the number of posts a user can make within a certain period of time, which is essential to prevent spam and malicious content.

diffusion service

Diffusion is the process of passing on a post to all of your friends. There are two types of flooding models: flooding on write (also known as push model) and flooding on read (also known as pull model). Both models have pros and cons. We'll explain their workflow and explore the best ways to support our system.

Diffusion on write. In this approach, news feeds are precomputed on write. New posts are sent to friends' caches as soon as they are published.

advantage:

  • News feeds are generated in real time and can be pushed to friends instantly.
  • Because newsfeeds are precomputed when they are written, fetching newsfeeds is very fast.

shortcoming:

  • If a user has many friends, fetching the list of friends and generating news feeds for all of them is slow and time consuming. This is the so-called hotkey problem.
  • For inactive users or users who rarely log in, precomputing news feeds wastes computing resources.

Diffusion while reading. Generates a news feed on read. This is an on-demand model. When a user loads the home page, the most recent posts are pulled.

advantage:

  • Diffusion on read works better for inactive users or users who rarely log in because it doesn't waste computational resources on them.
  • Data is not pushed to friends, so no hotkey issues.

shortcoming:

  • Fetching newsfeeds is slow because newsfeeds are not precomputed.

We took a hybrid approach to reap the benefits of both approaches and avoid their pitfalls. Since getting news feeds quickly is critical, we use a push model for most of our users. For those celebrities or users with many friends/fans, we let fans pull news content on demand to avoid overloading the system. Consistent hashing is a useful technique that can help distribute requests/data more evenly, mitigating hotkey issues.

Let's take a closer look at the diffusion service shown in Figure 11-5.

image-20230525202659614

Diffusion services work as follows:

  1. Get friend ids from graph database. Graph databases are suitable for managing friend relationships and friend recommendations. Readers interested in learning more about this concept are referred to reference [2].
  2. Get friend information from user cache. The system then filters friends based on user settings. For example, if you block someone, her posts won't appear in your newsfeed even if you're still friends. Another reason posts may not appear is that users may selectively share information with specific friends, or hide information from others.
  3. Send friend list and new post id to message queue.
  4. Diffusion workers fetch data from message queues and store news feed data in the news feed cache. You can think of the newsfeed cache as a **map. Whenever a new post is published, it will be added to the news feed table shown in Figure 11-6. If we store entire user and post objects in cache, memory consumption can become very large. Therefore, only the ID is stored. To keep the memory size small, we set a configurable limit. The chances of a user browsing through thousands of posts in a newsfeed are slim. Most users are only interested in the latest content, so cache misses are low.
  5. Store** in the newsfeed cache. Figure 11-6 shows what a news feed looks like in the cache.

image-20230525202713257

News subscription for in-depth analysis

Figure 11-7 depicts the detailed design of news feed fetching.

image-20230525202726527

As shown in Figure 11-7, media content (pictures, videos, etc.) is stored in a CDN for fast retrieval. Let's see how the client gets the news feed.

  1. A user sends a request to get her news feed. The request is as follows: /v1/me/feed
  2. A load balancer redistributes requests to web servers.
  3. The web server calls the news subscription service to obtain news subscriptions.
  4. The news feed service gets the list of post IDs from the news feed cache.
  5. A user's news feed is not just a list of subscription IDs. It contains username, profile picture, post content, post image, etc. So the newsfeed service fetches full user and post objects from caches (user cache and post cache) to build a fully populated newsfeed.
  6. A fully populated newsfeed is returned to the client in JSON format for rendering.

cache architecture

Caching is critical for news feed systems. We divide the caching layer into five layers as shown in Figure 11-8.

image-20230525202745402

  • News Feed: Stores the ID of the news feed.
  • Content: stores data for each post. Popular content is stored in the hot cache.
  • Social Graph: Stores user relationship data.
  • Action: Stores information about whether a user likes a post, replies to a post, or takes other actions on a post.
  • Counters: store counters for likes, replies, followers, following, etc.

Step 4 - Summary

In this chapter, we design a news subscription system. Our design contains two processes: subscription publishing and news subscription retrieval.

Like any system design interview question, there is no perfect way to design a system. Every company has its unique constraints, and you must design a system that conforms to those constraints. Understanding the tradeoffs of your design and technology choices is important. If you have a few minutes left, you can talk about scalability issues. To avoid repetitive discussion, only high-level discussion points are listed below.

Extended database:

  • Vertical Scaling vs Horizontal Scaling
  • SQL vs NoSQL
  • master-slave replication
  • read replica
  • consistency model
  • Database sharding

Additional discussion points:

  • Keep the network layer stateless
  • cache data as much as possible
  • Support for multiple data centers
  • Use message queues to reduce component coupling
  • Monitor key metrics. For example, it is interesting to monitor QPS during peak hours and latency when users refresh their news feed.

Congratulations on getting here! Give yourself a pep talk now. well done!

References

[1] How the newsletter subscription works:

https://www.facebook.com/help/327131014036297/

[2] Refer a friend of a friend using Neo4j and SQL Sever:

http://geekswithblogs.net/brendonpage/archive/2015/10/26/friend-of-friend-recommendations-with-neo4j.aspx

Hello, I am Shisan, a veteran driver who has been developing for 7 years, and a foreign company for 5 years in the Internet for 2 years. I can beat Ah San and Lao Mei, and I have also been ruined by PR comments. Over the years, I have worked part-time, started a business, took over private work, and mixed upwork. Made money and lost money. Along the way, my deepest feeling is that no matter what you learn, you must keep learning. As long as you can persevere, it is easy to achieve corner overtaking! So don't ask me if it's too late to do what I do now. If you still have no direction, you can follow me [public account: More AI (power_ai)], where I will often share some cutting-edge information and programming knowledge to help you accumulate capital for cornering and overtaking.

Guess you like

Origin blog.csdn.net/smarter_AI/article/details/131798061