Quick Rough Estimation: System Capacity and Performance Requirements

This article was first published on the official account: More AI (power_ai), welcome to pay attention, programming and AI dry goods will be delivered in time!

The original title is "Back-of-the-envelope estimation", the literal translation is the meaning of the back of the envelope estimation, "Back-of-the-envelope estimation" is a phrase used to describe a rough or Quick calculations, usually done on the back of an envelope or whatever paper is available. It is a method of making rough estimates or approximations without detailed analysis or complex calculations.
The purpose of a back-of-the-envelope estimate is to provide a rough figure or an idea of ​​the size or feasibility of a concept without investing a lot of time or resources in precise calculations. It can be used to assess the viability of an idea, assess the potential impact of a decision, or quickly compare different options.
While back-of-the-envelope estimates may lack the precision of detailed analysis, they are valuable for making quick decisions, initiating further investigation, or conveying ideas in a concise and understandable manner.

During system design interviews, you'll sometimes be asked to use back-of-the-envelope estimates to estimate system capacity or performance needs. According to Jeff Dean, a senior researcher at Google, "Back of the envelope calculations are estimates you create using thought experiments and combinations of common performance numbers to get a good understanding of which designs will meet your needs" [1].

You need to have a good understanding of scalability fundamentals to do back-of-the-envelope estimates effectively. The following concepts should be fully understood: the power of two [2], latency numbers every programmer should know, and availability numbers.

power of two

While data volumes can become very large when dealing with distributed systems, all computation boils down to basic computation. In order to get correct calculation results, it is important to know the units of data volume expressed in powers of two. A byte is a sequence of 8 bits. One ASCII character uses one byte of memory (8 bits). Below is a table (Table 2-1) explaining the data volume units.

image-20230520205201026

Latency Numbers Every Programmer Should Know

Dr. Dean of Google revealed the length of a typical computer operation in 2010 [1]. As computers get faster and more powerful, some numbers become obsolete. However, these numbers should still give us a general idea of ​​how fast or slow different computers operate.

image-20230520205234204

note


ns = nanosecond, μs = microsecond, ms = millisecond
1 ns = 10^-9 seconds
1 μs = 10^-6 seconds = 1,000 ns
1 ms = 10^-3 seconds = 1,000 μs = 1,000,000 ns

A Google software engineer built a tool to visualize Dr. Dean's numbers. This tool also takes time into account. Figure 2-1 shows the visualized latency numbers as of 2020 (Source: Reference [3]).

image-20230520205300147

By analyzing the numbers in Figure 2-1, we get the following conclusions:

  • Memory is fast, but disk is slow.
  • Avoid disk seeks where possible.
  • Simple compression algorithm is fast.
  • If possible, compress data before sending it over the Internet.
  • Datacenters are usually located in different regions and it takes time between sending data.

availability figures

High availability is the ability of a system to run continuously for an expected long period of time. High availability is measured as a percentage, where 100% means no downtime for the service. The availability of most services is between 99% and 100%.

Service Level Agreement (SLA) is a term commonly used by service providers. This is an agreement between you (as the service provider) and your customer that formally defines the level of uptime your service will provide. Cloud service providers Amazon [4], Google [5] and Microsoft [6] set their SLAs at 99.9% or above. Runtime is traditionally measured in units of nines. The higher the number of nines, the better the performance. As shown in Table 2-3, the number of nines correlates with expected system downtime.

image-20230520205315686

Example: Estimating QPS and Storage Requirements for Twitter

Note that the numbers below are for this exercise only, as they are not real numbers from Twitter.

Assumptions:

  • There are 300 million monthly active users.
  • 50% of users use Twitter every day.
  • Users post an average of 2 tweets per day.
  • 10% of tweets contain media content.
  • Data is stored for 5 years.

Estimate:

Queries per second (QPS) estimates:

  • Daily Active Users (DAU) = 300 million * 50% = 150 million
  • Tweet QPS = 150 million * 2 tweets / 24 hours / 3600 seconds = ~3500
  • Peak QPS = 2 * QPS = ~7000

We only estimate media storage here.

  • Average tweet size:
    • tweet_id 64 bytes
    • Text 140 bytes
    • Media 1 MB
  • Media storage: 150 million * 2 * 10% * 1 MB = 30 TB per day
  • 5 years of media storage: 30 TB * 365 * 5 = ~55 PB

hint

Back of the envelope estimating is all about the process. Solving problems is more important than getting results. The interviewer may test your problem-solving skills. Here are some tips to follow:

  • Rounding and approximation. Doing complex math during an interview is difficult. For example, what is the result of "99987 / 9.1"? There is no need to spend your precious time solving complex math problems. No need to be precise. Use rounding and approximation. This division problem can be simplified to: "100,000 / 10".
  • Write down your assumptions. It's a good idea to write down your assumptions for later reference.
  • Tag your unit. When you write "5", do you mean 5 KB or 5 MB? You might confuse yourself with this. Write down the units, as "5 MB" helps remove ambiguity.
  • Commonly asked back-of-the-envelope estimates: QPS, peak QPS, storage, cache, number of servers, etc. You can practice these calculations as you prepare for your interview. Practice makes perfect.

Congratulations you've made it this far! Give yourself a compliment now. here you go!

References

[1] J. Dean. Google Pro Tip: Use the back-of-the-envelope calculation to choose the best design:

http://highscalability.com/blog/2011/1/26/google-pro-tip-use-back-of-the-envelope-
calculations-to-choo.html

[2] Getting Started with System Design: https://github.com/donnemartin/system-design-primer

[3] Latency numbers every programmer should know:

https://colin-scott.github.io/personal_website/research/interactive_latency.html

[4] Amazon Compute Service Level Agreement:

https://aws.amazon.com/compute/sla/

[5] Compute Engine Service Level Agreement (SLA):

https://cloud.google.com/compute/sla


[ 6] SLA summary for Azure services: https://azure.microsoft.com/en-us/support/legal/sla/summary/

Hello, I am Shisan, a veteran driver who has been developing for 7 years, and a foreign company for 5 years in the Internet for 2 years. I can beat Ah San and Lao Mei, and I have also been ruined by PR comments. Over the years, I have worked part-time, started a business, took over private work, and mixed upwork. Made money and lost money. Along the way, my deepest feeling is that no matter what you learn, you must keep learning. As long as you can persevere, it is easy to achieve corner overtaking! So don't ask me if it's too late to do what I do now. If you still have no direction, you can follow me [public account: More AI (power_ai)], where I will often share some cutting-edge information and programming knowledge to help you accumulate capital for cornering and overtaking.

Guess you like

Origin blog.csdn.net/smarter_AI/article/details/131818989