Vision China's road of NoSQL: from MySQL to MongoDB

Due com) is the largest crowd of creative professional website. By 2009, with many companies, our communities and CMS products are built on PHP + Nginx + MySQL; MySQL uses a Master + Master of deployment; front-end to use their own PHP framework for development; Memcached as a cache; Nginx Web services and load balancing; Gearman asynchronous tasking. Based on the traditional static content (such as articles, information, posts) products, this system works well. A hierarchical caching, load the database side actually very light. In early 2009, we had to develop new products. At this point, we encountered some problems as follows. User data surges: MySQL a line on our information table one month of data to reach tens of millions. Before we ignore a lot of data, in the new situation requires a track record, which also led to a surge in the amount of data; user requirements for real-time information later: speed of response and the frequency of updating the information on the more demanding. Simply cease to exist addressed through the cache panacea; higher for Scale-out requirements: Some innovative product growth rate is staggering. Thus requiring upgrades can extend painless, or once the shutdown, then the churn rate is amazing; a large number of backup files: We are creative people-oriented, image-based content is generated. We need to be able to effectively manage the backup of these pictures and thumbnails of different sizes. Linux inotify we previously used poor + incremental backup of rsync program effectiveness; frequent changes in demand: to be more agile development, development costs and maintenance costs are lower, to be able to quickly update the evolution of new functions to be within the shortest possible period online. Initially, we tried to fully optimize existing technology infrastructure to solve the above problem: the data timeliness further classified hierarchical cache, reducing cache size; improved cache update mechanism (online and offline real-time asynchronous update) improve the cache hit rate ; attempts to divide the data traffic characteristics in accordance with the horizontal and vertical tables; MogileFS using distributed memory; Mysql further optimize performance, while increasing the like MySQL node. But soon we found that even the implementation of the above scheme, it is difficult to completely solve the problem: over-reliance on Memcached lead to maintain data consistency of the surface is too complex, application developers need to be careful, a lot of time Memcached failure will lead to an instant after end database pressure is too large; the different characteristics of different types of data, the amount of data varies widely; the mechanism and modalities of the points table difficult trade-offs in efficiency balance; MogileFS for us is big foot small shoe, maintenance costs far more than the actual benefits; introduce more MySQL database node to increase our maintenance, how to effectively monitor and manage these nodes became a new problem. While virtualization can solve part of the problem, but still not satisfactory; in addition to MySQL, can find a simpler, lighter Swiss Army knife of it? Our sights NoSQL solution. Candidate Initially, for NoSQL the candidate, I according to attention and familiarity, and in particular the development at the time of screening and selecting the appropriate program a number of principles: whether to save system resources for the CPU and other resources if excessive consumption; client / API support, which directly affects the efficiency of application development; documentation is complete, whether active in the community; whether the deployment is simple; future expansion capabilities. According to the above points after a period of testing, we shortlist the rest Redis, MongoDB and Flare. Redis is very attractive for rich data types of operations, you can easily solve some scenarios, read and write performance is quite high, the only drawback is linked to memory and storage capacity, so that if a large amount of data storage consumes too much memory (the latest version no longer have this problem). Flare cluster management capability is impressive, it can support dynamic deployment of nodes, the nodes to support the weight-based load balancing, data partitioning support. While allowing a large data store, its key length is also not limited Memcached. And these are transparent to the client, the client uses Memcached protocol linked to the proxy node Flare on it. Due to the use of cluster, Flare supports fail-over, when a data node dawdle away, access to this node will automatically forward proxy node to the backup node corresponding recovery can also automatically synchronize. Flare drawback is less practical cases, the document is simple, is currently only used in Geek. The above program are intended as an optimization program, I never thought of giving up entirely MySQL. However, after doing prototype products with MongoDB, I was completely conquered, had decided to migrate from MySQL to MongoDB. Why MongoDB can replace MySQL? MongoDB is a document-oriented database, currently developed by 10gen and maintenance, its rich functionality, complete, you can replace MySQL. The use of MongoDB in the process of doing the prototype, we summarize some of the highlights MonogDB: Using JSON-style syntax, easy to grasp and understand: MongoDB using JSON variants BSON as internal storage format and syntax. Data submitted or received for the use of MongoDB operations are JSON-style syntax, the client uses JSON form to show. SQL relative terms, more intuitive, easier to understand and master. Schema-less, child support embedded documentation: MongoDB is a Schema-free document database. A database can have multiple Collection, each of Documents Collection is set. Table and Row Collection and Document and traditional databases are not equal. Without prior definition of Collection, you can be created at any time. Collection can be included in the documentation has a different schema. This means that you record on a document has three attributes, the next record document may have attribute 10, attribute type can be either basic data type (e.g., numeric, string, date, etc.), but also can be an array or hash, it can even be a sub-document (embed document). In this way, we can achieve denormalization (denormalizing) data model, improve the speed of queries. Figure 1 MongoDB is a document Schema-free database Figure 1 MongoDB is a document database Figure 2 Schema-free is an example of the work and comments can be designed as a collection, document review as a child in the comments embedded attribute of art, comments, reply to the comment as a subdocument subdocument embedded in replies property. In this design pattern, only you need to retrieve once, you can get all the relevant information in accordance with the work id. In MongoDB, do not emphasize certain data Normalize, on many occasions have suggested De-normalize, developers can throw all kinds of limitations of traditional relational database paradigm, does not require all entities are mapped to a Collection, simply define the most top of the class. MongoDB's document model allows us to easily be able to map their collection Object to achieve storage. Figure 2 MongoDB supports embedded sub-document Figure 2 MongoDB supports embedded sub-document user-friendly query: query in MongoDB people very comfortable, not hard to remember SQL syntax directly using JSON, quite intuitive. For different development languages, you can use it to query the basic array or hash format. With additional operator, MongoDB supports range queries, regular expressions, query in the sub-document properties, most tasks can replace the original SQL query. CRUD more simple, supports in-place update: just define an array, and then passed to the MongoDB insert / update method can be automatically inserted or updated; for update mode, a upsert MongoDB support options, namely: "If the record exists then update, otherwise insert. " MongoDB update method also supports Modifier, Modifier can be achieved through real-time updates on the server, eliminating the need for messaging client and server. These have modifer allows MongoDB and Redis, Memcached KV and other similar features: compared to MySQL, MonoDB easier and faster. Modifier MongoDB also can be used as a container for user behavior tracking. Modifier to use in practice the interactions of users quickly save in order to MongoDB late for statistical analysis and customization. All property types support indexing, even arrays: This allows certain tasks very easy to implement. In MongoDB, "_ id" attribute is the primary key, default MongoDB will create a unique index _id. Server-side scripting and Map / Reduce: MongoDB allow execution of scripts on the server side Javascript can be used to program a function is executed directly on the server, you can put the function definitions are stored in the server, you can directly call the next time. MongoDB does not support transaction-level locking for some "atomic" operations need to customize, you can use Server side scripts to achieve, at this time the entire MongoDB is locked. Map / Reduce MongoDB also the more attractive features. Map / Reduce may be large amounts of statistical data tables, classification, consolidation of work to complete the function of the original SQL GroupBy and other aggregate functions. Mapper and Reducer and definitions are used to define server-side scripting with Javascript. Performance, efficiency, speed: MongoDB using c ++ / boost writing, in most cases, its contrast MySQL query speed much faster for the CPU usage is very small. Deployment is very simple, for most systems, simply download the binary package extract can be run directly, almost zero configuration. It supports multiple copy mode: MongoDB support replication between different servers, fault tolerance scheme comprises two planes prepared. Master-Slave is the most common. The backup data may be implemented by a Master-Slave. In our practice, we are using a Master-Slave mode, Slave used only for back-up, the actual reading and writing are executed from the Master node. Replica Pairs / Replica Sets MongoDB allows two mutual monitoring, fault tolerance of the bis-active preparation. MongoDB can only support a limited dual master mode (Master-Master), the actual availability is not strong and can be ignored. Built-in GridFS, support for high-capacity storage: This feature is most caught my eye, but also let me give a reason for the other NoSQL. GridFS specific implementation is very simple, is still essentially block after the file is stored in the files.file and files. chunk 2 th collection, the mainstream of each driver implementation, encapsulates the operation for GridFS. Since GridFS itself is a Collection, you can file's properties directly defined and managed, these properties can quickly find the desired file and easily manage massive files, without having to bother how hash in order to avoid file system retrieval performance problems, combined the following Auto-sharding, GridFS expansion of capacity is enough for us to use. In practice, we use MongoDB thumbnails of stored pictures and GridFs various sizes. FIG Auto-sharding of structure 3 MongoDB FIG 3 MongoDB the Auto-sharding Sharding built structure, the Auto Sharding Range-based mechanisms: a collection may be recorded according to the range, divided into several sections, cut assigned different Shard. Shards can be copied and combined, can be realized with Replica sets Sharding + fail-over, load balancing between the different can Shard. Inquiry is transparent to the client. The client execute the query, statistics, MapReduce and other operations, which will be automatically routed to the MongoDB back-end data nodes. This allows us to focus on their own business, when appropriate can be painless upgrade. The design capacity of the largest Sharding MongoDB supports about 20 petabytes, is sufficient to support general application. Third-party support rich: MongoDB community is very active, many developers are rapidly framework provides support for the MongDB. Many well-known large companies and websites are also used in production environments MongoDB, more and more companies turn to the use of innovative and MongoDB as Django, RoR to match technical solutions. The results of MonoDB implementation process is enjoyable. We own PHP development framework has been modified to accommodate MongoDB. In PHP, for MongoDB queries, updates are carried out around Array, implementation code becomes very simple. By eliminating the need to build the table, time MonoDB run unit tests needed to be greatly shortened, the efficiency of TDD agile development also increased. Of course, since MongoDB's document model and relational databases are very different, in practice there are a lot of confusion. Fortunately, MongoDB open source community has given us great help. In the end, we used two weeks to complete shorten code migration from MySQL to MongoDB than expected development time. From our test results is also very alarming, data of about 20 million, in the case of database 300G, reading and writing 2000rps, the system consumes CPU and so on is quite low (our data volume is also small, some companies also shows the current succession their classic case: the amount of data stored in MongoDB has more than 5 billion,> 1.5TB). At present, we will deploy MongoDB and other services together with a significant reduction in resources. Some tips Good Grasp Document model of MongoDB, proceed from reality, to throw away the definition of a relational database paradigm thinking, re-design class; avoid the use of such a time-consuming operation through the records in the server-side JavaScript code that runs contrary to use the Map / Reduce to complete such a data processing table; type should coincide with the beginning of insertion and query properties. If the string is inserted "1", with the number 1 are not matching the query; MongoDB performance optimization can proceed from the disk speed and memory; MongoDB limit for each Document is not more than the maximum 4MB; subject to the above conditions multi enabled Embed Document, avoid using DatabaseReference; internal cache to avoid N + 1 queries problem (MongoDB does not support joins). To address applications requiring high-speed writing of a Capped Collection, such as real-time log; under the large amount of data, the new synchronization to increase oplogSize size, and their own pre-generated data file, to avoid client timeouts; Collection + Index total number of default can not exceed 24000; the current version space (<v1.6) deleted data can not be recovered, if you frequently delete data, you need to perform repairDatabase regularly release these spaces. Conclusion MongoDB version 1.6 milestone is expected to release in July this year, when, for the first time with MongoDB's Sharding conditions used in a production environment. As a beneficiary of MongoDB, we are also actively involved in community activities MongoDB, improve Perl / PHP technical solutions for MongoDB. In the 1.6 version will also be launched during the year based on the MongoDB open source projects. For those just starting out, or are developing innovative Internet applications company, MongoDB's fast, flexible, lightweight and powerful scalability, it is suitable for us to quickly develop products, rapid iteration and quickly adapt to change and update the user's various needs. All in all, MongoDB is one of the most full-featured replacement for the MySQL, NoSQL products, used in combination MongoDB + Perl / PHP / Django / RoR will soon become the best combination of product development Web2.0,3.0, just as MySQL replace Oracle / DB2 / Like Informix, history is always surprisingly similar, so we'll see!

Reproduced in: https: //my.oschina.net/wzlee/blog/262194

Guess you like

Origin blog.csdn.net/weixin_34387468/article/details/91716711