Things about Weibo Database: The Design Ideas Behind the Three Changes

Things about Weibo Database: The Design Ideas Behind the Three Changes

Editor's note: High-availability architecture sharing and dissemination are articles of typical significance in the architecture field. This article is shared by Xiao Peng in the high-availability architecture group. Please indicate that it is from the high-availability framework public account "ArchNotes".

Xiao Peng, Technical Manager of Weibo R&D Center, is mainly responsible for business guarantee, performance optimization, architecture design, and surrounding automation system construction related to Weibo database (MySQL/Reids/HBase/Memcached). Experienced various stages of microblog database architecture transformation, including service guarantee and SLA system construction, microblog multi-computer room deployment, microblog platform transformation and other projects, 10 years of Internet database architecture and management experience, focusing on database high performance and high availability Technical support direction.

Feelings of growth of database experts

Things about Weibo Database: The Design Ideas Behind the Three Changes"My first job with MySQL was mainly from interest. The first job was in a small company. Due to the limited number of people, I had to be in touch with work in various fields. In contrast, I found that I was most interested in databases, so I have been engaged in and Database-related technical work. With the increase of working years, the experience accumulated in the database has gradually increased. It is increasingly felt that the database administrator (DBA) is a more practical type of work, and many theoretical things are in reality. There will be various changes, such as "anti-paradigm" design, etc. Therefore, if you want to become an expert in database, it is recommended that you must choose a good environment. Large platforms often cause many challenging problems due to quantitative changes and qualitative changes. These questions are the only way to become a technical expert."-Xiao Peng

Changes experienced by the Weibo database

First, I will share with you the important stages that the Weibo database has experienced.

In the initial stage of the
initial stage , Weibo is an internal innovation product with relatively simple functions. The database structure adopts a standard 1M/2S/1MB structure. It is designed according to the separation of reading and writing. The main library is responsible for writing and the secondary library is responsible for access. If the access pressure is too high, the ability to scale out can be obtained by expanding the number of libraries.
Things about Weibo Database: The Design Ideas Behind the Three Changes
In the above figure, red represents writing, green represents reading, and black is mapped to the internal structure. As can be seen from the figure, the business is only split vertically, that is, distinguished by business modules such as users, content, relationships, etc., and the database is used separately . In the initial stage, this is actually a very good architecture. It provides a basis for decoupling on functional modules. Problems can also be easily located. At the beginning, it can be downgraded according to different functional modules.

Personally, I think that in the early stage, this architecture can actually meet the growth of the business, there is no need to over-design, and making it too complicated at the beginning may lead to the loss of agility.

Outbreak phase

With the increase in user activity after the launch of Weibo, the pressure on the database has also increased. We first scale up the performance of a single machine by purchasing high-performance hardware equipment to meet the needs of supporting rapid business development. Then, through the use of high-performance equipment to obtain the time to vertically split the overall business of Weibo, the user, relationship, blog post, forwarding, comment and other functional modules are stored separately, and on the basis of vertical splitting, Some business modules that are expected to generate massive amounts of data have been split again.

Let me say a few more words about the use of hardware. Since Weibo had a very high user growth peak at the very beginning, we did not have very rich technical accumulation at this stage, and the most important thing was that there was no time for architectural transformation. , So many core businesses supported by purchasing PCIE-Flash devices, I still remember clearly that the initial feed system relied heavily on MySQL. On the day of the Spring Festival Gala in 2012, MySQL writes QPS to 35,000. I still remember it. new.

Although it seems that the price of high-performance hardware will be much higher than that of ordinary hardware, the time gained is the most precious. It is very likely that some performance problems will cause product failures in the early stages of the product’s life, which will directly lead to the loss of users and even more losses than gains. . Therefore, I personally think that violent investment of funds to solve the problem is actually the most cost-effective in the early outbreak stage.

Continue to talk about database splitting, taking blog posts as an example. Blog posts are the main content generated by Weibo users. It is foreseeable that the dimension will increase over time and eventually become very huge. How to use less cost storage as much as possible while meeting business performance requirements? We are facing a more challenging problem.

  • First of all, we split the index from the content, because the index requires less storage space, but the content storage requires more space, and the usage requirements of the two are not the same, and the access frequency will be different, requiring distinction treat.
  • Then, the index and content are respectively hashed first, and then split horizontally according to the time dimension splitting, and try to ensure that the capacity of each table is within a controllable range to ensure query performance indicators.
  • Finally, the business first obtains the id of the actual content needed through the index, then obtains the actual content through the content library, and accelerates the entire process by deploying memcached. Although it seems that there are more steps, the actual effect can fully meet the business needs.

Things about Weibo Database: The Design Ideas Behind the Three Changes
At first glance, the above picture looks the same as the previous picture, but this is actually just a database architecture diagram of the blog post function module. We can see that the index and content are divided into many ports, and each port is divided into many DBs. The tables under each DB are hashed first and then split according to the time dimension, so that we can choose to archive or adjust the deployment structure when we encounter capacity bottlenecks or performance bottlenecks in the later stage, no matter which one we choose is very Convenience. In addition, after archiving, you can also choose to use different hardware to undertake different businesses, improve hardware utilization and reduce costs.

At this stage, we have split and reformed many Weibo functions, such as users, relationships, blog posts, forwarding, comments, and likes. Basically, we have split the core functions into data to ensure that they encounter bottlenecks. It can be modified and adjusted according to the plan.

Precipitation stage

In the last stage, Weibo's database has undergone a lot of splitting and transformation, which directly caused the scale to grow exponentially, and after the rapid growth of the business, it began to stabilize. At this stage, we began to focus on the construction of automation, using automation tools to realize the experience accumulated during the rapid expansion period, and forming a standardized and streamlined platform service externally. We have successively constructed and transformed the backup system, monitoring system, AutoDDL system, MHA system, inspection system, slow inspection system, and Maya middleware system. And in order to improve the efficiency of business use and reduce the doubling of communication, compared with the internal management system, the iDB system was re-developed for users of the database platform. Through the iDB system, users can easily understand the operating status of their business database, and can directly submit the DDL modification requirements of the database. The DBA only needs to click on the approval to pass it to Robot for online execution, which not only improves work efficiency, It also improves safety and standardization.

Since there are many automation systems involved, I will not describe them one by one. In fact, I understand that after the product has developed to a certain stage, the operation and maintenance will enter the automation stage, because there are few initial tasks and enough labor to support changes and operations. , And there are many special situations that require human intervention, especially the human brain to judge and deal with.

Here is an extra emphasis on the importance of norms. For the MySQL development specification, if you make an agreement in advance and make a good restriction, although the developer will feel constrained during use, this can avoid completely uncontrollable failures on the line, and some problems are due to the specification The existence of will never happen.

for example. MySQL's slow search is the culprit for slow online performance, but in many cases there is no index, but the code is written incorrectly, causing problems such as implicit conversion. In this case, we generally recommend that all where conditions be enclosed in double quotation marks, so that the possibility of implicit conversion can be eliminated directly, and developers do not have to deliberately consider whether it is a character type or an int when writing code type.

Continue to talk about automation. After the initial stage and scale expansion, there will be more lives and fewer people. This kind of pressure will prompt everyone to automatically seek solutions, and will naturally undergo automation transformation. Of course, there is time to develop after the business is stable, which is actually a more important reason.

Personally think that automation is divided into two stages. The first stage is the replacement of labor by machines, which means that most of the mechanical labor is handed over to the program, which solves the problems of batch operation and repetitive labor; the second stage is for machines to replace people, that is, machines can perform for people. Self-selection after a certain amount of judgment liberates manpower. However, the second stage is the ideal state we have been pursuing. So far we have only completed some very simple small functions, such as dynamic adjustment of max mem and other very simple functions.

Optimization and design of Weibo database

Next, I will introduce some recent improvements and optimizations of the Weibo database platform.

The database platform is not only MySQL, but also database services such as Redis, Memcached, and HBase. With the trend that cache is king, Weibo focused its research and development efforts on Redis in 2015.

Weibo used Redis earlier, and the volume was large at the beginning, so many practical problems were encountered in the actual use process. Our internal branch version is optimized for these practical problems, and there are more characteristic ones. The following are several.

  • Add synchronization function based on pos. In version 2.4, once the synchronization of Redis is interrupted, it will retransmit "all" the data of the master database to the slave database, which will cause instantaneous network bandwidth peaks, and for businesses with larger data volumes, the slave database The recovery time is slow. For this reason, our students in the joint architecture group learn from the master-slave synchronization replication mechanism of MySQL, transform the AOF of Redis to record the pos bit, and let the slave database record the synchronized pos bit, so that fluctuations occur in the network Even if it is retransmitted, it is only part of the data and will not affect the business.
  • Online hot upgrade. In the early days of use, the Redis version was continuously upgraded due to the addition of many new features. In order not to affect the business, the main library switch was required for each upgrade, which brought great challenges to operation and maintenance, so a hot upgrade mechanism was developed. Dynamic loading of libredis.so to achieve version changes, no need to switch the main library, greatly improving the efficiency of operation and maintenance, and also reducing the risk of changes.
  • Customized transformation. In the later stage of using Redis, due to the many technical requirements on Weibo products, redisscounter compatible with Redis was specially developed to store technical data. By using array instead of hash table, memory usage was greatly reduced. After that, a phantom based on bloom filter was developed to solve the needs of judgment scenarios.

Redis middleware
completed the development and launch of our self-developed Redis middleware tribe system in 2015. Tribe adopts a proxy architecture design with a central node, manages the cluster nodes through the configer server, and draws on the official Redis cluster slot sharding design ideas To complete the data storage, the functions such as routing, fragmentation, automatic migration, and fail over are finally realized, and API interfaces for operation and monitoring are reserved to interface with other automated operation and maintenance systems.
Things about Weibo Database: The Design Ideas Behind the Three Changes

The main purpose of our development of tribe is to solve the problem of automatic migration. Because Redis memory usage will show volatility, it is still 10% the day before, and it may become 80% the next day. At this time, manual migration It is certainly unable to respond to business changes, and if you happen to encounter a physical memory bottleneck at this time, it will be even more troublesome. The data hash involved in business reconstruction may cause failures.

Slot-based dynamic migration firstly has no sense of business, and secondly, the entire server is no longer needed. You only need to find a server with available memory to migrate part of the slot to directly solve the problem of expansion and migration, which can greatly improve the utilization rate of the server. ,lower the cost.

The routing function provided can lower the development threshold. You no longer need to write the resource logic configuration into the code or the front-end configuration file, and you no longer need to go online every time you make changes. This greatly improves development efficiency and reduces To avoid the risk of failure caused by online changes, after all, 90% of failures are caused by active changes.

One thing to add is that with regard to reinventing wheels, I personally think that every company has its own scenario. Open source software can provide us with a good solution, but it cannot be 100% adapted to the application scenario, so refactoring is not unacceptable. Yes, there are some things you have to compromise.

Databus
Since we have MySQL first and then Redis and HBase databases, there is a scenario where the data has been written into MySQL, but these data need to be synchronized to other databases. For this reason, we developed Databus, which can be based on MySQL The binlog synchronizes data to other heterogeneous databases and supports custom business logic. At present, the data flow from MySQL to Redis and MySQL to HBase has been realized. The next step is to develop the data flow from Redis to MySQL.
Things about Weibo Database: The Design Ideas Behind the Three Changes

Our original intention for developing databus was to solve the problem of writing Redis, because some data needs to be written to MySQL as well as Redis. If double writing is enabled on the front end, it can be solved, but this will cause code complexity; if a data link is implemented on the back end, the code will be clearer and the final consistency of the data can also be guaranteed. Later in practical applications, databus gradually began to assume the function of guiding data.

Let's talk about the current database design habits accumulated on Weibo. Generally speaking, we will adopt some "anti-paradigm" design ideas. While the "anti-paradigm" design brings convenience, it also brings some problems, especially when the scale of data becomes larger. There are several solutions as follows.

  • Pre-split. When receiving the demand, evaluate the capacity in advance, and split it vertically and then horizontally. If it can be designed according to the time dimension, it will be included in the archiving mechanism. Solve the problem of capacity storage by splitting the database tables of the database.
  • Introduce a message queue. Use the one-write-multiple-read feature of the queue or multiple queues to meet the multiple write requirements of redundant data, but only the final consistency can be guaranteed, and data delay may occur in the middle.
  • Introduce the interface layer. The data is summarized through the interfaces of different business modules and then returned to the application layer, which reduces the coding complexity of application layer development.

Another point is that if the estimated amount of the database is relatively large, we will refer to the design ideas of the blog post to separate the index and content at the very beginning, and design the hash and time dimension sub-tables to minimize subsequent splits. Problems and difficulties that may be encountered in time sharing.

Future plans for Weibo database platform

Finally, I want to share some thoughts on the development of the Weibo database platform. I hope that I can provide you with some ideas. Of course, I also hope that you will also give me some suggestions and opinions so that I can avoid detours and pitfalls.

As the business develops, more and more scenarios will be encountered. We hope to introduce the most suitable database to solve the scenario problem, such as PostgreSQL, SSDB, etc. At the same time, using the features of the new version of MySQL, such as parallel replication of MySQL 5.7, GTID, and dynamic adjustment of BP, the performance and stability of existing services are continuously optimized.

In addition, promote the servitization of existing NoSQL services, organize storage nodes through the use of proxies and provide external services to externally reduce the development complexity of developers and the fineness of resource acquisition, internally increase the single-machine utilization and solve the horizontal resource layer The bottleneck of expansion.

At the same time, try to use the major cloud computing resources to realize the dynamic expansion and contraction of the cache layer, make full use of the elastic resources of cloud computing, and solve the problem of business access fluctuations.
Things about Weibo Database: The Design Ideas Behind the Three Changes
Q & A

  1. Is the separation of data and index done by the business layer or by middleware? I feel that middleware has done a lot of work. Can you expand this part a bit?
    Since the middleware plan was not considered when splitting and reforming, the current separation of index and content is realized in business logic. In my personal experience, even if you use middleware solutions, you should still separate the index and content at the business logic layer.

The core function of middleware is to isolate the program from the back-end resources. No matter how many resources there are in the back-end, it is a unified entry point for the program. Therefore, middleware solves the problem of horizontal splitting, while the index and content are separated. It belongs to the scope of vertical split, and I personally think that it should not be solved by middleware.

  1. Can you recall the most memorable database service failure and say a few points for attention?
    To say that the most impressive time was the failure of the database service. At that time, a colleague accidentally executed the drop table command. Anyone who knows the database knows how powerful this command is. When we used the architecture to fight for it, we urgently carried out a single table Recovery, although it was downgraded for a while, it did not affect users as a whole.

The precautions I want to say about this are norms. Since then, we have revised the process of all table deletion requirements and ensured strict implementation. No matter how urgent the deletion requirements are, they must be cooled for 24 hours, as follows.
Perform the rename table operation to rename the table into table—will-drop.
Wait 24 hours before performing the drop operation.

  1. During the booming stage of Weibo, the part of splitting the table, "hash the index and content, and then split it according to the time latitude", can you expand on this part of hashing?
    This is actually not that complicated. First, we will estimate the approximate number of levels in a year, and then calculate the number of tables that need to be split, and try to control each table within 30 million rows of records (of course this is just hope, reality proves that the plan cannot keep up with the changes). For example, we use the modulo 1024 method to divide all generated blog posts into 1024 tables according to the blog post id (this blog post id also involves our uuid global issuer and will not be expanded).

Since most of the content generated by Weibo users will be linked to time, the time dimension is a strong attribute for us, and it will basically have it. Assuming that tables are built every month, this is equivalent to 1024 tables per month. If the capacity of the database becomes a bottleneck, we can solve it according to the time dimension, such as migrating all the tables in 2010 to other databases.

  1. Linkedin launched a databus-like project many years ago, and there are also some open source projects that support data synchronization from MySQL to HBase, ES, etc. Can we expand the Weibo approach?
    This synchronization of heterogeneous data does exist in many companies. As far as I know, Ali's DRC does the same thing. Our realization idea is mainly dependent on MySQL binlog. Everyone knows that when MySQL binlog is set to row format, it will record all affected data in the log, thus providing all data changes.

We read the data changes into the databus by parsing the MySQL binglog in row format, and then pass the actual business logic required. Load the so file into the databus, and the databus will reprocess the data changes according to the business logic, and then output to downstream resources.

We have open sourced the databus project. You can search "MyBus" directly on GitHub.

  1. What is the index mentioned in Question 1, and how to find the algorithm for the corresponding content?
    The index actually does not refer to the content algorithm. For example, if you need to store a blog post, you will inevitably store a unique id to distinguish it, and then store some of the status of the blog post when it was published, such as who posted it and when We believe that these statuses and ids are indexes; and the content of the blog post is the actual content. Because the content is relatively large, storing with the index will cause the performance of MySQL to decrease, and many queries only need to finally get the post id. It is equivalent to getting the actual blog post.

We can filter the index to get an index list that will eventually be output to the user, and then look up the actual content from the content library based on this list, so that it conforms to the rule that the smaller the return result set of MySQL, the higher the performance. To optimize the effect.

  1. What role does NoSQL play?
    NoSQL has played an increasingly important role in Weibo. For example, rediscounter, our self-developed counting service, originally counts are stored in MySQL, because all counts are update set count = count +1 such high concurrency for single row The write operation of data will cause MySQL multiple locks, and the higher the concurrency, the stronger the lock.
    As can be seen from the figure below, if the concurrent operation of a single row reaches more than 500, the tps will change from tens of thousands to hundreds, so MySQL cannot support this business scenario well no matter how optimized it is, and Redis can.
    Things about Weibo Database: The Design Ideas Behind the Three Changes

I personally think that NoSQL is like a Swiss army knife. It is the best solution where it fits best. I personally think that this is also the future development direction of NoSQL. Each has an optimal scenario.

  1. Our company will have a lot of smart devices in the future. This year there may be more than 20,000 devices, and it may be doubled in a year. The data to be collected are all in JSON message format. The fields in JSON identify different types of messages. See It seems that it is not suitable to store varchar or longtext fields as strings. Can DB storage make full use of MYSQL 5.7's native JSON and just save one column instead of using unexpandable "wide columns" for storage, or POSTGRES or other DB Can you provide more convenient storage? When it was still used in the early stage, MySQL 5.7 was not FINAL. Using mariadb multi-master to share the N multi-device access, the load of the DB writing layer, what guidance is there for the writing scenario where a large number of devices have to transmit data?
    We are actually looking at the new features of MySQL 5.7, among which JSON will have a greater impact on database design. According to your statement, you really should not use fields such as varchar or text, but I personally recommend the scenario of smart devices , If possible, it is best to go directly to HBase, because sooner or later the amount of data will become difficult for MySQL to support, which is inherently no advantage.

  2. What does "separate data and index" mean? Are the data files and index files stored separately on different machines?
    It should be content and index, so that it is better to understand, everyone understands this as a vertical split on the business layer. Due to the vertical split, it must exist in different database instances, so it can be placed on one physical machine or on different physical machines. We are all placed on different physical machines according to our habits to avoid mutual interference.

  3. What information is stored in MySQL? Which NoSQL storage? What kind of NoSQL are you using?
    This involves a hierarchical storage problem. At present, our mainstream is MySQL + Redis + mc. Both mc and Redis are used to resist hotspots and peaks, while MySQL is for data landing to ensure that the original data can be checked eventually. Most requests will be returned to the mc or Redis layer, and less than 1% of the data will go to MySQL.

Of course, there are also special ones. For example, the counting service mentioned earlier, we used rediscounter to store and use it at the beginning, and no more data was stored in MySQL later.

Finally, I personally think that the biggest advantage of NoSQL is the convenience of development. It is easier for developers to use NoSQL than MySQL because it is a KV structure. All queries are primary key queries. There is no need to consider index optimization and how to build tables. , This is fatal to the current Internet speed.

  1. What are the advantages and disadvantages of the global unique issuer and the automatic ID generation for the business?
    There are many implementation schemes for the global only issuer. We use Redis modified by the mc protocol, and the main pursuit is performance. I have also seen the use of MySQL's self-incrementing id as the issuer, but because the MySQL lock is too heavy, problems often occur after the business is started, so I gave up.

Let’s talk about another benefit. The development of a globally unique issuer is not difficult. You can encapsulate some of the attributes you want into uuid. For example, the format of id can be "timestamp + business flog + self-incrementing sequence number", this Things about Weibo Database: The Design Ideas Behind the Three Changeskind of id you get this time directly can know the time and the corresponding service, and a simple issue with MySQL itself does not make sense to the serial number compared to, you can do more things.

Learn about other articles shared by the Weibo technical team

  • Upsync: Weibo open source dynamic traffic management solution based on Nginx container
  • A lightweight RPC framework that supports hundreds of billions of calls on Weibo: Motan
  • Weibo Docker-based hybrid cloud platform design and practice
  • Discussion on the deployment experience of Weibo "multiple lives in different places"
  • Weibo's Docker container-based hybrid cloud migration combat
  • MySQL optimization and operation and maintenance of 6 billion records in a single table
  • Troubleshooting methods for Weibo in large-scale and high-load systems

The Weibo technical team recruits various technical talents, including database MySQL/NoSQL DBA, big data engineer, C/C++, Java, operation and maintenance and other technical positions. The engineers are all equipped with MacBook Pro and DELL large-screen displays, and have rich development secrets. And training documents are the best choice for engineers to embody technical value and improve personal ability. Welcome to scan the QR code for job details.
Things about Weibo Database: The Design Ideas Behind the Three Changes

This article is planned by Li Qingfeng, edited by Liu Yun, broadcasted by Ye Qing, and reviewed by Tim Yang. For more database design and optimization articles, please pay attention to this public account. Please indicate that it is from the high-availability framework "ArchNotes" WeChat official account and include the following QR code.
Things about Weibo Database: The Design Ideas Behind the Three Changes

Guess you like

Origin blog.51cto.com/14977574/2547932