Baidu Technical Salon (the first one) - 2. watercress data storage practices (reprint)

 

Source Address : http://www.infoq.com/cn/presentations/liuhongqing-data-store

 

Speakers and topics

Speaker Profile: Huang Fangrong
WEB development senior engineer, graduated from college in 1998, began in 2000 in WEB development. Now working in Baidu, engaged in large-scale WEB project (front-end) technology architecture. WEB solve various problems developed with the most suitable solution!

Topic: Art WEB data interaction 
keynote content: interactive data format; several difficult to achieve data exchange (cross-page interactive, cross-domain interactive, real-time interaction, etc.), some of these solutions are already in the existing product line the realization, some of it is being implemented; the significance of interactive data and other ......; WEB development is the source for the needs of users, user needs and are implemented by the interactive data, that is what determines what kind of interaction data the WEB. The contents of this exchange is to come from a technical resolve to achieve data exchange, allowing each solution are like works of art, simple, beautiful!

Guests Description: Liu Hongqing
system programmer in 2007, graduated from Tsinghua University Department of Electrical Engineering, now at the watercress network, engaged in work related to architecture and platform, BeansDB author of more than thirty million watercress to improve the user experience efforts. Love technology, with particular attention to achieve high-capacity server, distributed, high-performance, high availability, and other related technologies. And the ultimate pursuit of simple things, solve complex problems with simple techniques.

Topic: distributed database application BeansDB watercress 
over thirty million Douban users upload photos, diaries and comments, and blog content group topics and subscriptions are very large, traditional relational databases and networks file storage technology is difficult to meet the continued growth in demand data and available 24 hours. The author will lecture together watercress in the development and use of BeansDB with you to solve the above problems of practical experience.

It is a large-capacity high-availability distributed Key-Value database, used to solve many Web 2.0 sites now facing massive user data is stored and available 24 hours and other issues. Toastmasters beginning to face problems starting it, step by step to explore solutions, and how BeansDB is evolving into what it is today, to solve the problems faced by the watercress network. Finally, talk to other Kye-Value Database features were compared, their advantages and disadvantages as well as the appropriate application scenarios, discuss problems and development direction BeansDB the next step.

About QClub: QClub technical exchange activities next line InfoQ Chinese station on a regular basis by the organization, the purpose is to make high-end technical personnel have a relatively free exchange of ideas and a platform for making friends. Every QClub focus on one theme, inviting domestic and foreign technical experts to share their experience in this subject area, but QClub pay more attention to the discussion, because it is the truth that the discussions, the participants from the discussion in order to stimulate wisdom and sparks.

 

===================================================================================

Liu Hongqing is the watercress network systems programmer, but also watercress open source data storage system BeansDB , this time he introduces the data needs to address how to apply BeansDB more than thirty million users watercress growing. Douban users now have more than 38 million, there are 150,000 group, 4.3 million entries, comments, etc. 3 million, in terms of background data is structured data 200G, 800G text data, 10T pictures, 6T music and more. To ensure data stability, security and availability 24 hours is not an easy task. Watercress measures taken is to classify data, such as user information, such as friendship classified as structured data types, text, images and other file types classified as small, and log and backup data classified as large file types. Then use different techniques to solve problems, such as structured data with MySQL, small files with BeansDB, and other large files MooseFS.

In the question and answer session, a reader implementation of the watercress is very interested in broadcasting, Liu Hongqing said:

Realization of ideas watercress and other broadcasting with twitter micro blog inboxes are not the same, is only one point to keep a copy of the broadcast, real-time view of the user when combined, relying on sophisticated caching and data flow design, the user can be acceptable in time complete complex broadcast within the scope of consolidation. This approach can greatly reduce the amount of data in the database and pressure, also fit quite well with some of the characteristics of our products. The premise of this implementation is feasible, the number of the user's attention is limited, usually around hundred studies on the social network is similar conclusions.

MySQL how to achieve the dual Master, and how to avoid problems such as increment possible data conflicts ID, etc., Liu Hongqing also introduced the practice of watercress:

Watercress current dual Master mainly in convenience consider switching, data reading and writing is actually Master-Slave structure, controlled by the operation and maintenance of the way, but only a Master is writable, such as modifying user access and so on, so no data collision problem.

 

 

 

Reproduced in: https: //www.cnblogs.com/licheng/archive/2010/09/09/1822052.html

Guess you like

Origin blog.csdn.net/weixin_33940102/article/details/92626914