Scalability Best Practices: Lessons from eBay

At eBay, scalability is one of the architectural pressures we struggle with every day. Every architectural and design decision we make can be seen in front of us and behind us. When we are dealing with hundreds of millions of users all over the world, with more than 1 billion page views per day, the amount of data in the system is calculated in picobytes (1015 or 250) - scalability is a matter of life and death The problem.

In a scalable architecture, resource consumption should increase linearly (or better) with load, which can be measured by user traffic, data volume, etc. Whereas performance measures the resource consumption required per unit of work, scalability measures the change in resource consumption as the number or size of units of work increases. In other words, scalability is the shape of the entire price-performance curve, not the value of a point on the curve.

Scalability has many facets - transactional, operational, and developmental. We have learned a lot in the process of improving the transaction throughput of a web system, and this article summarizes some of the key best practices. You may feel familiar with many best practices, or you may have never met before. These are the collective experience of the people who develop and operate the eBay website.


Best Practice #1: Split by function
Related functional parts should be kept together, and unrelated functional parts should be separated - whether you call it SOA, functional decomposition, or engineering recipes. Also, the looser the coupling between unrelated functions, the more flexibility there is to scale some of them independently.

At the coding level, we apply this principle all the time. JAR files, packages, bundles, etc., are all mechanisms used to isolate and abstract functionality.

At the application level, eBay divides different functions into several application pools. The sales function is run by one set of application servers, the bidding function is run by another set, and the search function is run by another set of servers. We divided a total of about 16,000 application servers into 220 pools. This allows one of the pools to be scaled independently based on the resource consumption of a function. This allows us to further isolate and rationalize resource dependencies—for example, the sales pool only needs to access a relatively small subset of backend resources.

At the database level, we do the same thing. eBay does not have an all-encompassing single database, instead we have a set of database hosts for user data, one for product data, one for purchase data... A total of 1000 logical databases spread across 400 physical hosts. Again, this approach allows us to scale our database facilities individually for a certain type of data.

Best Practice #2: Slicing Horizontally

Splitting by function helps us a lot, but it alone is not enough to get a fully scalable architecture. Even if the functions are decoupled one by one, the resource requirements of a single function may still exceed the capacity of a single system over time. We often remind ourselves that "there is no scaling without segmentation". Within a single function, we need to be able to break down the workload into many small units that we can handle, so that each unit maintains a good price/performance ratio. This is where split horizon comes into play.

At the application level, since eBay designed all interactions to be stateless, split horizons were a breeze. Use standard load balancing servers to route incoming traffic. All application servers are equal, and no server maintains transactional state, so the load balancer can choose any application server. If more processing power is required, simply add new application servers.

Problems at the database level are more challenging because data is inherently stateful. We horizontally shard (or "sharding") the data according to major access paths. For example, user data is currently divided into 20 hosts, and each host stores 1/20 of the users. As the number of users grows and the amount of data per user grows, we will add more hosts and spread users over more machines. Product data, purchase data, account data, etc. are all handled in the same way. Depending on the use case, we have different schemes for dividing data: some are simply modulo the primary key (the one with the ID ending in 1 is put on the first host, the one with an ID ending in 2 is placed on the next host, and so on), some are based on the ID The interval division (1-1M, 1-2M, etc.), some use a look-up table, and some are a combination of the above strategies. However, what is the specific partitioning scheme? The general idea is that the infrastructure that supports data partitioning and re-partitioning is far superior in scalability than those that do not.

Best Practice #3: Avoid Distributed Transactions

At this point, you may be wondering how the practice of partitioning data by function and horizontally satisfies transaction requirements. After all, just about any meaningful action updates more than one entity—users and items are examples immediately. The orthodox, well-known answer is: set up distributed transactions across resources, with two-part commits to ensure that either all resources are updated, or none of them are updated. Unfortunately, the cost of this pessimistic approach is considerable. Scaling, performance, and response latency are all negatively impacted by coordination costs, which deteriorate exponentially as the number of dependent resources and clients increases. Availability is also limited because all dependent resources must be in place. The pragmatic answer is to relax guarantees on unrelated systems for their cross-system transactions.

It is impossible to do both. Guaranteed instant consistency across multiple systems or partitions is often neither necessary nor practical. The CAP axiom proposed by Inktomi's Eric Brewer ten years ago says this: Three important metrics of distributed systems—Consistency, Availability, and Partition-tolerance—are At the moment, only two things can be established at the same time. For high-traffic sites, we must choose partition tolerance, because it is fundamental to achieve scalability. For a website running 24x7, choosing availability is also a matter of course. So had to give up immediate consistency (immediate consistency).

At eBay, we absolutely don't allow any kind of client-side or distributed transactions - so two-part commits are never required. In some well-defined situations, we bundle several statements acting on the same database into a single transactional operation. For most operations, a single statement is automatically committed. While we deliberately relax orthodox ACID properties so that instant consistency cannot be guaranteed everywhere, the realistic result is that most systems are available most of the time. Of course we also employ some techniques to help the system achieve eventual consistency: careful ordering of database operations, asynchronous recovery of events, and data reconciliation or settlement batches. Which technology to choose depends on the need for consistency in a particular use case.

The key for architects and system designers is to understand that consistency is not a "yes" and "no" question. Most real-world use cases do not require instant consistency. Just as we often weigh availability based on cost and other pressures, consistency can likewise be tailored to ensure an appropriate level of consistency based on the needs of a particular operation.

Best practice #4: Decoupling programs with async strategies

Another key measure to improve scalability is to aggressively adopt async strategies. If component A calls component B synchronously, then A and B are tightly coupled, and the scalability feature of a tightly coupled system is that each part must advance and retreat together - to scale A, you must scale B at the same time. Components that are called synchronously face the same problem with usability. We go back to the most basic logic: if A pushes B, then non-B pushes non-A. That is, if B is unavailable, then A is also unavailable. If in turn A and B are connected asynchronously, whether through queues, multicast messages, batching, or whatever, they can scale separately. Moreover, the availability characteristics of A and B are independent of each other at this time - even if B is trapped or dies, A can still move on.

This principle should be applied across the entire infrastructure from top to bottom. Asynchrony can be achieved even within a single component through techniques such as SEDA (Staged Event-Driven Architecture), while maintaining an easy-to-understand programming model. The same principle is followed between components - avoid coupling caused by synchronization as much as possible. In most cases, the two components will not have a direct business connection in any event. At all levels, breaking the process into stages or phases and connecting them asynchronously is the key to scaling.

Best Practice #5: Turn Processes into Asynchronous Streams

Decouple programs with async principles and make processes as asynchronous as possible. For systems that require fast responses, doing so can radically reduce the response latency experienced by requesters. For a website or transactional system, it is worth sacrificing data or execution latency (the practice of getting the whole job done) for user latency (the time it takes for the user to get a response). Processes such as activity tracking, billing, settlement, and reporting should obviously all be background activities. There are often many steps in the main use case process that can be further broken down to run asynchronously. Anything that can be done later should be done later.

There is an equally important aspect that not many people recognize: asynchrony can radically reduce the cost of infrastructure. Performing operations synchronously forces you to provision your infrastructure for peak loads—even during the heaviest second of the day when the task is the heaviest, the facility must be able to complete processing immediately. By turning expensive processing into asynchronous streams, the infrastructure doesn't need to be provisioned for peaks, just load averages. Moreover, it is not necessary to process all requests immediately, and the asynchronous queue can distribute the processing tasks over a longer period of time, thus playing the role of peak shaving. The larger the load variation of the system and the more spiky the curve, the more you can benefit from asynchronous processing.

Best Practice #6: Virtualize all layers

Virtualization and abstraction are ubiquitous, and there is an old saying in computer science: all problems can be solved by adding a level of indirection. The operating system is an abstraction of the hardware, and the virtual machine used by many modern languages ​​is an abstraction of the operating system. The object-relational mapping layer abstracts the database. Load balancers and virtual IPs abstract network termination. As we improve the scalability of our infrastructure by partitioning data and programs, adding additional layers of virtuality to various partitions becomes a priority.

At eBay, we virtualize the database. The application interacts with the logical database, which is then mapped to a specific physical machine and database instance according to the configuration. The application also abstracts the routing logic that performs the data partitioning, which assigns specific records (eg user XYZ) to the specified partitions. Both types of abstraction are implemented on the O/R layer that we developed ourselves. With this virtualization, our operations team can redistribute logical hosts across the physical host farm as needed—split, merge, move—without touching application code at all.

Search engines are also virtualized. To get search results, an aggregator component executes parallel queries on multiple partitions, but this highly partitioned search grid appears to the client as a single logical index.

The above measures are not only for the convenience of programmers, but also for operational flexibility. Both hardware and software systems fail and requests need to be rerouted. Components, machines, and partitions are added, removed, and moved from time to time. Smart use of virtualization can make it difficult for high-level facilities to be confused by these changes, and you will have room to maneuver. Virtualization enables infrastructure scaling because it makes scaling manageable.

Best Practice #7: Use Cache

Properly Finally, use cache appropriately. The advice given here does not necessarily apply universally, as whether caching is efficient or not depends greatly on the specifics of the use case. In the final analysis, the ultimate goal of an efficient cache system is to maximize the cache hit rate under conditions such as storage constraints, availability requirements, and tolerance for stale data. Experience has shown that it is extremely difficult to balance a multitude of factors, and even if the goal is temporarily achieved, the situation is likely to change over time.

Best suited for caching is rarely-changing, read-heavy data—such as metadata, configuration information, and static data. At eBay, we aggressively cache this type of data and use a combination of "push" and "pull" methods to keep the system updated to a certain degree. Reducing repeated requests for the same data can have a very significant effect. Frequently changing, read-write data is difficult to cache effectively. At eBay, we mostly consciously sidestep such dilemmas. We have never done any caching of session data that is ephemeral between requests. It also does not cache shared business objects, such as commodity and user data, at the application layer. We intentionally sacrifice the potential benefits of caching this data for usability and correctness. It must be pointed out here that other sites have taken different paths and made different trade-offs, and have been equally successful.

Good things can go too far. The more memory allocated for the cache, the less memory is available to serve a single request. The application layer is often under memory pressure, so this is a very real trade-off. More importantly, when you start relying on the cache, then the primary system only needs to meet the processing requirements of cache misses, and naturally you think you can cut the primary system. But when you do this, the system is completely inseparable from the cache. Now the main system can't directly handle all the traffic, which means that the availability of the website depends on whether the cache is 100% functioning properly - a potential crisis. Even routine operations, such as reconfiguring cache resources, moving caches to other machines, and cold-starting cache servers, can cause serious problems.

Done well, a caching system bends the scalability curve downward, i.e. better than linear growth—subsequent requests to fetch data from cache are cheaper than fetching data from main storage. In turn, poorly done caching can introduce considerable additional overhead and hinder usability. I haven't seen a system that doesn't have a chance to use the cache to its full potential. The key is to find the appropriate cache strategy according to the specific situation.

Summary

Scalability is sometimes referred to as a "non-functional requirement," which means that it has nothing to do with functionality and is therefore less important. To say that is simply wrong. My point is that scalability is a prerequisite for functionality - a requirement with priority 0 takes precedence over everything else.

Hopefully the above best practices will be useful to you, and hopefully help you see your system from a new perspective, no matter its size.


Go to http://www.infoq.com/cn/articles/ebay-scalability-best-practices


Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327036750&siteId=291194637