[System Design Series] Asynchronous and Network Communication

The original intention of the system design series


System Design Primer: 英文文档 GitHub - donnemartin/system-design-primer: Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

Chinese version: https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md

The original intention is mainly to learn system design, but the Chinese version looks like machine translation, so I still take some simple notes manually, and compare the English version with the help of AI based on my own understanding of the difficult to understand parts. Translation and knowledge expansion.
 

asynchronous

Source: Introduction to Scalable System Architecture

Synchronous and asynchronous

Synchronous and asynchronous are two common data transmission and processing methods in computer communication and programming. They have different time relationships and characteristics when processing tasks and data exchange.

Synchronous Transmission means that during the data communication process, the sending and receiving parties must maintain time synchronization. In synchronous transmission, data is organized into fixed data blocks or characters, with a clear start time and end time between each data block or character. The main features of synchronous transmission are:

a. The transmission occurs at a specific time;

b. Synchronous transmission is possible even if no payload data is sent;

c. When there is no data transmission, empty packets will be sent.

Asynchronous transmission means that during the data communication process, the sending and receiving parties do not need to maintain strict time synchronization. Asynchronous transmission is usually used in scenarios where the communication rate is not fixed, such as telephone communication, online chat, etc. In asynchronous transmission, data is organized into irregular data blocks or characters, and the start and end time between each data block or character may not be fixed. The main characteristics of asynchronous transmission are:

a. Data transmission does not require a fixed time interval;

b. Data can be sent at any time;

c. The receiver needs to parse the data to determine the beginning and end of the data block.

Asynchronous workflows help reduce the time required for requests that would otherwise be executed sequentially. They can help reduce request times by doing some time-consuming work in advance, such as summarizing data on a regular basis.

Method to realize

message queue

Message queues receive, retain and deliver messages. If performing operations sequentially is too slow, you can use a message queue with the following workflow:

  • The application publishes the job to the queue and then notifies the user of the job status
  • A worker removes the job from the queue, processes it, and then displays the job as completed

Do not block user operations, the job is processed in the background. During this time, the client may perform some processing to make it appear that the task has been completed. For example, if you send a tweet, it may appear on your timeline right away, but it may take some time for your tweet to be pushed out to all of your followers.

Redis is a satisfyingly simple message broker, but messages can be lost.

RabbitMQ is popular but requires you to adapt to the "AMQP" protocol and manage your own nodes.

Amazon SQS is managed but can have high latency and messages may be delivered twice.

task queue

Task queues receive tasks and their associated data, run them, and then deliver their results. They can support scheduling and can be used to run computationally intensive jobs in the background.

Celery supports scheduling and is primarily developed in Python.

Back pressure

If the queue begins to grow significantly, the queue size may exceed the memory size, causing cache misses, disk reads, and even slower performance. Backpressure can help us by limiting the queue size, thus maintaining high throughput and good response times for the jobs in the queue. Once the queue fills up, the client will get a Server Busy or HTTP 503 status code to try again later. The client can retry the request at a later time, perhaps with exponential backoff .

Disadvantages of asynchronous:

  • Use cases such as simple calculations and real-time workflows may be better suited for synchronous operations, as introducing queues may increase latency and complexity.

communication

Source: OSI 7-layer model

Hypertext Transfer Protocol (HTTP)

HTTP is a method of encoding and transmitting data between a client and a server. It is a request/response protocol: requests and responses from the client and server for relevant content and completion status information. HTTP is self-contained, allowing requests and responses to flow through many intermediate routers and servers that perform load balancing, caching, encryption, and compression.

A basic HTTP request consists of a verb (method) and a resource (endpoint). The following are common HTTP verbs:

verb

describe

*Impotent

safety

cacheable

GET

Read resources

Yes

Yes

Yes

POST

Create resources or trigger processes that process data

No

No

Yes, if the response contains refresh information

PUT

Create or replace resources

Yes

No

No

PATCH

Some updated resources

No

No

Yes, if the response contains refresh information

DELETE

Delete resources

Yes

No

No

Executing it multiple times will not produce different results.

HTTP is an application layer protocol that relies on lower-level protocols such as TCP and UDP.

Transmission Control Protocol (TCP)

Source: How to Make a Multiplayer Game

TCP is  a connection-oriented protocol over IP networks . Use handshaking to establish and disconnect connections. All data packets sent are guaranteed to arrive at the destination in the original order, and the following measures are used to ensure that the data packets are not damaged:

If the sender does not receive the correct response, it will resend the packet. If there are multiple timeouts, the connection will be disconnected. TCP implements flow control and congestion control . These assurances cause latency and often result in less efficient transfers than UDP.

To ensure high throughput, a web server can maintain a large number of TCP connections, resulting in high memory usage. Having a large number of open connections between threads of a web server can be expensive and consume too many resources, that is, a  memcached  server. Connection pooling  can help in addition to switching to UDP where applicable.

TCP is useful for applications that require high reliability but are time-critical. Examples include web servers, database information, SMTP, FTP and SSH.

Use TCP instead of UDP in the following situations:

  • You need the data to be intact.
  • You want to automatically get the best estimate of network throughput.

User Datagram Protocol (UDP)

Source: How to Make a Multiplayer Game

UDP is connectionless. Datagrams (similar to packets) are guaranteed only at the datagram level. Datagrams may arrive at the destination out of order or may be lost. UDP does not support congestion control. Although not as guaranteed as TCP, UDP is generally more efficient.

UDP can send datagrams via broadcast to all devices on a subnet. This is useful for  DHCP  because the devices on the subnet have not yet been assigned IP addresses, which is required for TCP.

UDP is less reliable but suitable for Internet telephony, video chat, streaming and real-time multiplayer games.

Use UDP instead of TCP in the following situations:

  • You need low latency
  • Even worse than data loss is data delay
  • You want to implement your own error correction method

Remote Procedure Call Protocol (RPC)

Source: Crack the system design interview

In RPC, the client calls a method in another address space (usually a remote server). The calling code looks like it is calling a local method, and the specific process of interaction between the client and the server is abstracted. Remote calls are generally slower and less reliable than local calls, so it's helpful to distinguish between the two. Popular RPC frameworks include  Protobuf , Thrift  , and  Avro .

RPC is a "request-response" protocol:

  • client program 
  • Client stub program 
  • Client communication module 
  • Server communication module 
  • Server-side stub program 

RPC call example:

GET /someoperation?data=anId POST /anotheroperation { "data":"anId"; "anotherdata": "another value" }

RPC focuses on exposure methods. RPC is often used to handle performance issues with internal communications, so you can manually handle local calls to better suit your situation.

Choose the local library (that is, SDK) when:

  • You know your target platform.
  • You want to control how your "logic" is accessed.
  • You want to have control over errors that occur in your library.
  • Performance and end-user experience are your top concerns.

RESTful HTTP APIs tend to be more suitable for public APIs.

Disadvantages: RPC

  • The RPC client is tightly bundled with the service implementation.
  • A new API must be defined for every operation or use case.
  • RPC is difficult to debug.
  • You may not be able to easily modify existing technology. For example, if you want  Squid   RPC to be cached correctly

Representational State Transfer (REST)

REST is a mandatory client/server architecture design model, where the client operates a series of resources based on server management. The server provides an interface for modifying or obtaining resources. All communication must be stateless and cacheable.

RESTful interfaces have four rules:

  • Flag resource (URI in HTTP) 
  • Representation changes (HTTP actions) 
  • Self-describing error message (status code in HTTP) 
  • HATEOAS (HTML interface to HTTP) 

Example of REST request:

GET /someresources/anId PUT /someresources/anId {"anotherdata": "another value"}

REST focuses on exposing data. It reduces the degree of client/server coupling and is often used in public HTTP API interface design. REST uses a more general and standardized approach to exposing resources through URIs, expressed through headers and manipulated through the actions GET, POST, PUT, DELETE, and PATCH. Because of its stateless nature, REST is easy to scale out and isolate.

Disadvantages: REST

  • Because REST focuses on exposing data, it may not adapt well when resources are not naturally organized or have complex structures. For example, returning updated records from the past hour that match a specific set of events is difficult to represent as a path. With REST, this might be done using a URI path, query parameters and possibly a request body.
  • REST generally relies on several actions (GET, POST, PUT, DELETE, and PATCH), but sometimes these alone don't meet your needs. For example, moving expired documents to an archive folder may not be simply expressed using the above verbs.
  • In order to render a single page, retrieving complex resources nested in a hierarchy requires multiple round-trip communications between the client and server. For example, get blog content and its associated comments. For mobile applications using uncertain network environments, these multiple round-trip communications can be very cumbersome.
  • Over time, more fields may be added to the API response, older clients will receive all new data fields, even those they don't need, and as a result it will increase the payload size and causing greater delays.

RPC vs REST comparison

operate

RPC

REST

register

POST

/signup

POST

/persons

Log out

POST

/resign

{

"personid": "1234"

}

DELETE

/persons/1234

Read user information

GET

/readPerson?personid=1234

GET

/persons/1234

Read user item list

GET

/readUsersItemsList?personid=1234

GET

/persons/1234/items

Add an item to the user's items list

POST

/addItemToUsersItemsList

{

"personid": "1234";

"itemid": "456"

}

POST

/persons/1234/items

{

"itemid": "456"

}

update an item

POST

/modifyItem

{

"itemid": "456";

"key": "value"

}

PUT

/items/456

{

"key": "value"

}

Delete an item

POST

/removeItem

{

"itemid": "456"

}

DELETE

/items/456

Guess you like

Origin blog.csdn.net/u013379032/article/details/132834739