Social product back-end architecture design

This article will show the reader several key points of architectural design so that a social application can become a true next-generation social product. The following properties will affect the design of the architecture:

a) Availability 

b) Scalability 

c) Scalable performance and flexibility

Target

a) Ensure that the user's content data can be easily discovered and acquired by other users.

b) Ensure that content pushes are relevant, not only semantically, but also from the perspective of the user's device.

c) Ensure real-time updates are generated, pushed and analyzed.

d) Save the user's resources as much as possible.

e) User experience should remain the same regardless of server load changes.

f) Ensure that the application as a whole is secure

All in all, we have a pretty big challenge to deal with, we have to deal with an ever-expanding mass of user-generated content data, an ever-growing number of users, and an iterative new project, all while making sure the performance is good enough. In order to meet the above challenges, we must learn certain key elements of the architecture, which will affect the design of the system. Below are some key decisions and analysis.

data storage

The storage of data and data models is one of the key designs of a good architecture. A social product should be able to handle multiple types of data, so it is necessary to fully analyze and understand the data first, and then design the data model and data storage.

In the first step, we need to determine which data is hot data that is frequently queried, and which data is not frequently needed (such as archived data for analysis). For frequently accessed data, it must always be available, fast to read and write, and horizontally scalable. We currently use MySQL for all our business scenarios, even though our use case does not necessarily require the use of a relational database system. As our data grows, our reads and writes will become a performance bottleneck for our application. We should be prepared for billions of queries per second.

Let's classify our data:

a) Primary data or data in static form, such as user profiles

b) Semantic data

c) User Generated Content Data

d) Session data

It's really hard to find an efficient way to store all these types of data. Therefore, we will choose a specific data storage method for each data type.

Static data: For static data, it is best to choose a document-based storage where both keys and values ​​are queryable. We can choose a document database such as MongoDB. The biggest advantage of choosing MongoDB is that it provides ACID at the document level.

MongoDB can scale across multiple distributed data centers. It will allow us to use replica sets to maintain redundancy and thus solve our availability problems.

Data sharding is an important consideration. Data sharding can ensure data expansion and query speed. Fortunately, MongoDB supports data sharding transparently.

Associative or relational data (core data): Most of our data is relational in nature, for example, A is B's friend, C is A and B's friend, such highly semantic data is best suited for graph processing models. We should store such data in a graph database such as Neo4j. The advantage of this is obvious; we can store all the nodes of the associated data, saving the extra step of computing the connections between the data. A graphical data model will also help us capture the relationships between attributes. When trying to explore linked data, rich attribute relationships are absolutely key. Graph databases support ACID rules and automatic indexing.

Again, our requirement is to achieve usability and scalability. We may have hundreds or thousands of concurrent transactions writing to the database at the same time, and hundreds and thousands of query requests at the same time. It should be able to handle many bytes on a dataset, over a billion reads per second.

We're going to need a system that helps us scale writes and reads automatically. Other factors to consider are data sharding, which is the key to system scalability.

Neo4j has been designed to be horizontally scalable and has data redundancy to ensure availability. But so far, it doesn't support data sharding. We may need more analysis to make a decision. Other alternative graph databases are FlockDB, AllegroGraph, and InfiniteGraph.

Binary data (UGC): We also have to deal with a large amount of user-related binary data. Working with binary data is not so easy, given their size. As discussed above, we need a system that can run fairly high performance, second-level (spikes), scalability and availability are the most critical factors when deciding where to store. We cannot rely on disk filesystems to store our binary data. We have to consider availability and scalability, the cache of the file system can consume a lot of CPU. Instead, we should rely on an existing available system, such as Amazon S3, which is a very popular object storage system with availability and elastic storage.

We could also consider Google Cloud Storage or Rackspace's Cloud Files, etc., but S3 seems to be the clear winner, offering a more premium service.

S3 already supports data partitioning. S3 can scale horizontally, split hot and cold data, and partition according to keys. But it's not enough to just store the data, the metadata associated with that content must be searchable, scalable, and fast. We can also try some new things, such as automatic dimension recognition of images, automatic tagging based on content, etc. This is a potential area of ​​intellectual property. We will discuss indexing requirements in the indexing section of the article. But for now, let's just note that we're going to store the content with an identifier, and it's indexed somewhere. It seems Amazon's S3 is best suited for this situation.

Session data

Correct knowledge and understanding of session data is very important. Session data will help us maintain the state of the user. Session data must be used in a server-independent way to facilitate scalable deployment of our server. This will help keep our design flexible and ensure that sessions are not tied to a specific node or server.

We have to use a new way to update the user's actual session, if the user's session is terminated, we can still help the user to restore information from a place, where he left.

This is especially important, in our scenario, where the connection is unreliable and packet loss is normal. Data must be accessible across nodes, so availability and scalability are required. We can well use MongoDB itself to store data. Later, we wanted to move to a pure key-value store like Redis.

Note: All recommended and offline jobs should only run on non-serving nodes.

index

Indexes are the key to our system. Users can search for anything, which is one of our main use cases. To improve search performance, we have to take indexing very seriously. There are two things to consider here: first, the creation of the index itself, and then the indexing system itself.

In order to make a meaningful search system, we must design a real-time index to process real-time data for a period of time. First, we can write a very simple system that does an inverted index on the generated content data. Later, as the input data increases, we can conveniently replace it with a real-time data processing engine such as Apache's Storm, which is a distributed, fault-tolerant and highly scalable system. It can be responsible for the logic of generating the index.

Indexing system: Lucene is an obvious choice due to its popularity and its performance; its performance is unmatched. We can use SolrCloud. It already transparently supports sharding, replication, and fault tolerance for reads and writes.

Queue & message push

Every time our app is triggered an event, we will need to push a message to his/her followers/friends. It is important that our systems cannot miss any of this information and, more importantly, are able to recover from these events in the event of a failure. In order to meet these requirements, we had to look for a queuing solution. We can use ActiveMQ which is the most reliable queuing software. It supports high availability of clusters and supports distributed queues.

Push messaging is another area where notifications are sent to our users. Here we need to estimate the size. We should be ready to support hundreds of millions of scales like nps. There are many options here, but perhaps pyapns, CommandIQ, and APP Booster are the most popular.

We need to manage some things ourselves, especially to ensure reliable messaging, even when the user's device is offline. I suggest that we implement a bidirectional system that keeps state notifications and persists to disk in the background. So every time a notification fails, its status is processed and marked with a status code, added to the retry queue. Finally, when the notification is delivered, the dequeue is retried.

caching strategy

In a system like ours, our goal is to make it support a billion RPS, so a good caching strategy is extremely important. Our business logic will be in multi-layer cache and can intelligently clear invalid cache. Let's look at the top-level cache.

Application Layer Caching (Content Caching): To minimize cache misses and ensure that the cache is always up to date, we must look for a cache that never expires and keeps the data at all times. This basically means that under normal usage we will never have to query our database, thus saving a lot of resources. We should also ensure that the data we cache is always in a format that requires no additional processing, ready to render. This basically means converting our online loads to offline loads, saving latency. To do this, we have to make sure that every time something is entered into the system, we do two things:

a) The original content is stored in the cache in denormalized form. To be on the safe side, we will always set an expiry period.

b) The original content is also written in our datastore.

We use Redis for this cache, which is an in-memory cache with good failover. It is highly scalable, and newer versions transparently support data sharding. Supports master-slave node configuration. The best part is that we are able to save data in any format which makes it easy to do incremental writes which is crucial we support content feeds

It's also worth pointing out that we need to support a lot of read-modify-write operations and a small number of reads on large content objects, for which Redis is known to be the best performance-wise.

Caching Proxy: Caching at the reverse proxy layer is also critical. It helps reduce the load of direct requests to our servers, thus reducing latency. To make proxy server caching more effective, HTTP response headers need to be set correctly. There are many kinds of proxy servers, but the most popular ones are nginx and ATS.

L2 cache (code-level cache): This is a local storage of entity data used to improve application performance. It helps improve performance by reducing expensive database calls, keeping entity data localized. EhCache is a popular choice.

Client-Side Cache: This is actually a device or browser cache. All static items should be cached as much as possible. If the API response HTTP cache header has been properly set, the content of many related resources will be cached. We should make sure it works as expected. Beyond that, we should cache other things as much as possible, either using the device's own memory, or using SQLite. All expensive objects should be cached. For example NSDateFormatter and NSCalendar are slow to initialize and should be reused as much as possible. iOS Lot can be tweaked and applied, but here, it's beyond the scope of our research.

data compression

Considering that our users are mainly dealing with a large number of images and videos and need to download a large amount of data, it is very important to optimize the download size. It will save the user's data volume and improve the performance experience of the application.

Other aspects to consider, like our network, our users are mostly on non-LTE networks, using 2.5G or 3G, bandwidth needs to be considered, and connections are often unreliable and data usage is expensive. In this case, intelligent compression is a key requirement.

But in fact, image compression and video compression are not as straightforward and simple as imagined, and in-depth analysis is often required. The images and videos we process can be lossless or lossy, depending on the quality of the user's device. So I recommend using multiple compression techniques to handle this situation. In this case, we can try intra-frame compression and inter-frame compression techniques.

But in general we can use zpaq and fp8 for all compression needs. We can also try WebP which is very suitable for our business scenario. In general, our API will use gzip, and our API response is always gzipped.

data transcoding

Given that we need to deal with multiple devices, multiple operating systems and screen resolutions, our content should be stored and processed device-agnostic. But the service layer should understand and adjust the content of the response based on the user's device. Therefore, transcoding of images and videos is essential.

Our application needs to collect device configuration such as memory, encoding and screen resolution as context for the API. Our API should use this context to modify/select content versions. Based on the device context we receive, we can prepend some of the most frequently requested versions of the content.

We can transcode using FFMPEG, the most reliable and widely used transcoding framework. We can modify FFMPEG to fit our needs. Transcoding is done at the data input.

Transfer Protocol

Given our network scenarios (non-LTE, unreliable connections, etc.), the key is to conserve resources as much as possible and make communications as lightweight as possible. I recommend to use okhttp client for all our HTTP requests, okhttp uses SPDY protocol and is able to handle connection failure resiliently and recover transparently.

All of our communication needs should switch to MQTT, a lightweight machine-to-machine connection protocol.

safe question

Keeping our application safe is very important. Our overall architecture must have security considerations. I'm only talking about changes to the architecture to meet security requirements here, we're not talking about changes to the implementation process.

Here are some that must be added to the schema:

1. All our user data must be encrypted. MongoDB and Neo4j already support storage encryption. On this basis, we can decide which key user information to encrypt. All database-related transport calls must have encryption enabled.

2. Secure Sockets Layer: All proxy server access should use SSLed. A proxy server can act as an SSL termination point.

3. All our API endpoints should run on non-default ports and must implement OAuth.

4. All DB reads should go through Rest endpoints.

5. The configuration of the password must be handled specially. Passwords must be hashed, and files should be restricted to read only at application startup. This allows us to control the application identity instance through file system permissions. Only application users can read, but not write, other users can not read. All similar configurations are packaged with keydb and require a password.

components

Here are the components used in our architecture:

1. Load Balancer: This layer is used to forward all requests to the proxy server, based on a custom policy. This layer will also help us ensure availability through capacity-based redirection.

2. Proxy server: All incoming calls must use this as the entry point. This is also the termination point for our SSL. It caches all HTTP requests based on policy definitions. FE layer: This layer runs a node server.

3. Data input engine: This component involves the input of all content, it does a series of work: denormalized models, transcoding, caching, etc. In the future, if possible, all content processing can be done here.

4. Rest service: This layer is responsible for interacting with all DBs and returning data. Its access is protected by OAuth. This can be achieved with the Tomcat container and edge cache.

5. Event processing: This layer handles all events and is mainly responsible for the distribution function. It reads ActiveMQ and uses the notification engine to generate notifications.

6. Recommendation engine: This component makes recommendations by analyzing all collected user dynamics. Depending on the actual collected dynamics, we can deploy various affinity-based algorithms. We can use various algorithm interfaces provided by Apache Mahout

Logical view of the system:

Epilogue

This article is more of a high level of abstraction analysis of key components. If recommendations for implementation are required, a phased approach can be done, but if we need scalability and support for real use cases, these specifications that I propose must be followed. I didn't mention anything related to the field of design. This is just the design phase and requires deeper analysis and understanding of the current state of the system.

Comment from HackerNews: https://news.ycombinator.com/item?id=9930752

Link to the original text: Architecting Backend For A Social Product

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325174432&siteId=291194637