Several NoSQL Database Overview

A relational database

There are powerful relational database SQL functions and properties of ACID. The advantages of relational database about the following:

① can maintain data consistency through the transaction, and, as can be done by bank transfer database locks

② can be connected to a multi-table join

③20 years of development history, mature

④ can be applied to a variety of systems

But there are several drawbacks

① is stored in a relational database rows, you can not store the data results. Attention to micro-blog, for example, people are concerned about is a user ID list, go through this user ID table query, and then stitching data, the last show. Another example, do some reporting, need to get information from different tables, different databases, then stitching. You can only use a relational database to store the list split into multiple lines, then check out the assembly, it can not be stored directly in a list

② relational database schema extensions inconvenient. Schema relational database table structure is strong constraint, if the operation does not exist in the error column, the column is to expand the business change too much trouble, you need to perform DDL

(Data definition language, such as CREATE, ALTER, DROP, etc.) to modify the statement, and may lock table for a long time (e.g., MySQL may be locked in Table 1 hour) is modified.

③ large relational database data for the scene of high I / O

If large amounts of data on some tables calculates statistics like, relational database I / O will be high, because even if only operation, relational database for which a column from a storage device will also read an entire row of data memory. Such as Taobao Lynx double eleven of spike activity, if the relational database, can not be realized.

Full-text search capabilities ④ relational database is relatively weak

Full-text search relational databases can only be use like full table scan matching, performance is very low, in the Internet search this complex scenes can not meet the business requirements.

Two, Nosql (non-relational) database

In response to these problems, are born different NoSQL solutions, these programs compared to the relational database, in some scenarios perform better. But there are no free lunches, NoSQL benefits of the program, in essence, is the sacrifice of one or the ACID properties of a few, so we can not blindly superstitious NoSQL is a silver bullet, but should be NoSQL as a strong complement to SQL , NoSQL! = No SQL, but NoSQL = Not Only SQL.

Common NoSQL program is divided into four categories.

①K-V storage: a relational database to solve the problem can not store data structures to Redis represented.

② Document Database: strong problem solving relational database schema constraints to MongoDB represented.

③ database column: solution I / O issues in large relational databases scene data to HBase represented.

④ full-text search engine: full-text search capability to solve the problem of relational databases to Elasticsearch represented.

 

KV storage

KV storage stands Key-Value store, wherein the identification data is a Key, and the primary key of a relational database as meaning, Value is the specific data.

Redis is a typical representative of KV store, it is an open source (based on BSD license) KV high-performance caching and storage systems. Value of Redis specific data structure, including string, hash, list, set, sorted set, bitmap and hyperloglog, it is often referred to as server data structures.

To List data structure, for example, Redis provides these typical operation (Please refer to the link: http: //redis.cn/commands.html#list):

LPOP key from the left queue out of an element.

LINDEX key index Gets an element by its index list.

LLEN key obtained the queue length (List) a.

RPOP key from the right queue out of an element.

 

redis storage limit:

String type: a maximum value of type String can be stored 512M

List type: number of elements in list of up to 2 ^ 32-1, which is 4,294,967,295.

Set type: the number of elements up to 2 ^ 32-1, which is 4,294,967,295.

Hash type: the number of key-value up to 2 ^ 32-1, which is 4,294,967,295.

Sorted set Type: Set with a similar type.

 

These features, if the relational database is implemented, it will become very complicated. For example, LPOP operation is to remove and return the first element of the list corresponding to the key. If a relational database to store, to achieve the same purpose, the following operations:

In addition to data number of each data (e.g., line ID), but also the position number, otherwise no way to determine which of the first data. Note that the line can not be used as a location ID number, because we will be inserting data into the head of the list.

a, the first query data.

b, delete the first data.

c, the location update number of all data from the second starting.

It will be seen relational database can be cumbersome and require multiple SQL operations, performance is very low.

Redis shortcomings mainly in does not support the full ACID transactions, while providing Redis transaction capabilities, but the transaction is quite different Redis transaction and relational databases, Redis transaction isolation and consistency can only be guaranteed (I and C) , we can not guarantee atomicity and persistence (A and D).

Although Redis did not strictly follow the ACID principle, but in fact most of the business does not need to strictly follow the ACID principle. Microblogging attention to the above operation, for example, even if the system is not a fan of B A is added to the list, in fact, the business impact is very small, so we designed the program, we need to determine whether you can use Redis based on business characteristics and requirements, and not because ACID Redis does not follow the principle of direct give up.

Document database

In order to solve the problems caused by a relational database schema, document database came into being. The greatest feature of the document database is no-schema, you can store and read arbitrary data. At present most of the document data stored in database format is JSON (or BSON), because JSON data is self-describing, without the use of pre-defined fields, read a JSON field that does not exist and will not lead to that kind of SQL syntax error .

no-schema properties document database, to business development has brought several significant advantages.

1. Add a simple field

Increase in new business fields, no longer the same as the first implementation of a relational database DDL statements that modify table structure, the program code can be directly read and write.

2. The historical data can not go wrong

For historical data, even if no new fields, it will not cause an error, it will only return a null value, then code-compatible process can be.

3. You can easily store complex data

JSON is a powerful description language that can describe complex data structures. For example, we designed a user management system, user information have ID, name, gender, hobbies, email, address, education information. Which is interested in the list (because you can have multiple hobbies); the address is a structure, including provinces, municipalities, real estate address; education including schools, major, graduation year enrollment information. If we use a relational database to store, need to design multiple tables, including basic information (columns: ID, name, gender, email), hobbies (columns: ID, hobby), address (column: provincial, city, district, full address ), education (column: Intake, graduation date, school name, professional), and using a document database, a JSON can describe all of them.

{                   

    "id": 10000,

    "name": "James",

    "sex": "male",

    "hobbies": [ 

        "football",

        "playing",

        "singing"

    ],

    "email": "[email protected]",

    "address": { 

        "province": "GuangDong",

        "city": "GuangZhou",

        "district": "Tianhe",

        "detail": "PingYun Road 163"

    },

    "education": [ 

        { 

            "begin": "2000-09-01",

            "end": "2004-07-01",

            "school": "UESTC",

            "major": "Computer Science & Technology"

        },

        { 

            "begin": "2004-09-01",

            "end": "2007-07-01",

            "school": "SCUT",

            "major": "Computer Science & Technology"

        }

    ]

}

With this sample we have seen, to describe the use of JSON data than using a relational database table to describe the data much more convenient and easy, but easier to understand.

This feature document database, especially for electricity providers and business scenarios kind of game. With electricity suppliers, for example, a large difference in the properties of different commodities. For example, attributes and attribute differences laptop refrigerator is very large, as shown in FIG.



Even similar products have different properties. For example, LCD, and LED displays, the two indicators have different parameters. This business scenario if you use a relational database to store data, it will be very troublesome, and using the document database, will be simple, a lot easier, expand the new property is also easier.

Features Issue no-schema database brings these advantages also come at a price, the price is not the most important support transactions. For example, to create an order using MongoDB to store merchandise inventory system when you first need to deduction inventory before creating an order. This is a transactional operations, relational database to achieve is very simple, but if MongoDB is implemented, you can not do transactional. Under unusual circumstances may arise inventory is deducted, but the order did not create the situation. So some strict requirements of the business transaction scenario can not use the document database.

Another drawback is that the database file can not be achieved join relational database operations. For example, we have a list of user information and an order form, the Orders table a buyer user id. If you want to query "buy a female users of Apple laptop users in" relational database to achieve a simple join operation to get; and with the document database is unable to join a query, you need to check twice: once query the Orders table Apple notebook users purchased, and then query the user which are female users.

 

Columnar database

As the name suggests, is a database column in columns to the database storing data corresponding to a traditional relational database is called "line database", because the line is in accordance with a relational database to store data.

Relational database to store data in accordance with the line, mainly in the following advantages:

A plurality of columns are simultaneously read operations aging rate, since these columns are stored together in a row, a disk operation will be able to respective data columns in a row are read into memory.

To complete a one-time write operations on multiple columns in a row, to ensure consistency and atomicity of write operations for the rows of data; otherwise, if the column storage, there may be a write operation, some columns successful, there the column failed, resulting in inconsistent data.

We can see that the line is in order to realize the advantages of storage in a particular business scenarios, if there is no such business scenarios, then the line stored advantage will cease to exist, and even become a disadvantage, the typical scenario is massive data statistics. For example, to calculate personnel data a city overweight, in fact, only need to read this column each person's weight and statistics can be, and even if the final line storage use only one will also remove all rows are read out . If a single-line subscriber information with a 1KB, which weighed only four bytes, or the line memory will all 1KB entire row of data is read into memory, it is obvious waste. If using a column storage, each user only needs to read 4 bytes of data can be weight, I / O will be greatly reduced.

In addition to saving I / O, the column further comprising a storage storing a higher compression ratio, it is possible to save more memory space. Common line database typically compression ratio of about 3: 1, and the compression ratio is generally in the database column 8:: 1 to 5 1 to 30: 1 or so, since the single column data line is higher compared to the degree of similarity , can achieve higher compression ratio.

Similarly, if a scene change, columnar storage strengths turn into weaknesses. A typical scenario is the need to update multiple columns frequently. Because different columns columnar storage will be stored on disk is not contiguous space, resulting in random disk write operation to update multiple columns; and when the line is stored in the same row multiple columns are stored in contiguous space on a disk write operation can be completed, random write columnar storage efficiency is much lower than the write efficiency line store. In addition, columnar storage high compression ratio in the update scenario will become a disadvantage, because of the need to update the stored data decompression when an update and then compressed and finally written to disk.

Based on the above listed advantages and disadvantages of storage, generally columnar storage applications in large data analysis and statistical scene offline because this scenario is mainly operated for some columns, single row, after the update and delete data no longer need to write.

Full-text search engine

Traditional relational database through an index to achieve the purpose of fast query, but in the full-text search business scenarios, the index can not do anything, mainly reflected in:

Full text search conditions can be arbitrarily permutations and combinations, if satisfied by the index, the index number will be very much.

Fuzzy matching mode full-text search, index can not be met, only queries like, and like a full table scan query, efficiency is very low.

Let me give a concrete example to see why relational databases can not meet the requirements of full-text search. Suppose we make a dating websites whose primary purpose is to help programmers to find a friend, but a different mode of traditional dating sites is the "programmers to publish their own information, the user to search for a programmer." Programmer's information sheet designed as follows:

ID Name Sex Place units interested in the language of self-introduction

More than one cat Takao Beijing factory to write code, travel, marathon Java, C ++, PHP technical experts, simple, and his enthusiasm

2 female flower goose Shanghai factory tours, food, singing PHP, Java flowery beauty, absolute beauty, beautiful flower

 

Let's look at this simple business search scenario:

Beauty 1: I heard that PHP is the world's best language, PHP programmers is certainly the most money, but my mother insisted that I look for in a Shanghai.

Beauty search condition 1 is "Sex + PHP + Shanghai", where "PHP" use the fuzzy matching query "Language" column, "Shanghai" to query "Location" column, if the index support, you need to establish a "place" this index.

Beauty 2: I'm so adore these technologies brother, ah, if you can find a goose Factory brother to accompany me to travel better.

2 beauty search condition is "sex + travel + goose factory", where "tour" to use fuzzy matching query "hobby" column, "goose factory" need to query the "unit" column, if you use the index support, you need to establish " unit "index.

Beauty 3: I am a "female programmer", I would like to find a cat plant in Beijing's Java technology experts.

3 beauty search condition is "sex + cat + factory Beijing + Java + technology experts", which "Beijing + cat plant" can be queried by the index, but the "Java" "technical experts" can only be queried by fuzzy matching.

Handsome 4: Programmer's sister has not it beautiful? Try to see.

4 search criteria guy is "Sex + beautiful + beauty", only through fuzzy matching search for "self-introduction" column.

These are just a simple example, the search condition is not actually complete list, so many various permutations and combinations, by this simple example, we can see when the lack of full-text search in a relational database support.

1. The basic principle of full-text search

Technical principle of full-text search engine called "inverted index" (Inverted index), also often referred to as an inverted index, or reverse files into archives, is an indexing method, the basic principle is to establish a word document index of. They are called "inverted" index, and the "positive displacement" the basic principles of index relative, "forward index," the document is indexed to the word. We illustrate differences between the two indices by a simple example.

Suppose we have a website a technical article, which collected a variety of technical articles, the user can browse or search for articles on the site.

Forward index Example:

Article ID

Article name

Article Content

1

Agile architecture design principles

Omitting details, the document content includes: architecture, design, architects and other words

2

Java programming must know will be

Omitting details, the document content includes: Java, programming, object-oriented, class, architecture, design and other words

3

What is object-oriented holy canon

Omitting details, the document content includes: design, pattern, objects, classes, Java and other words

(Note: The article is provided only for demonstration, the article content is actually stored contents are thousands of words.)

Forward index applies to query the contents of the document according to the document name. For example, a user on the site, click the "holy canon is what object-oriented", the website displayed to the user query based on the content of the article title of the article.

Inverted index Example:

word

Document ID list

Architecture

1,2

design

1,2,3

Java

2,3

(Note: the table is merely exemplary, not a full inverted index table, actually inverted index have thousands of lines, because each word is an index.)

Inverted index applies to a keyword query document content. For example, users just want to see the "design" related articles, website content needs to be included in the article "Design" search out the word articles are presented to the user.

2. The use of full-text search

Full-text search engine indexing and document the object is the word, and the index is the key object-relational database and the line, a big difference between the two terms, can not be simply equated. Therefore, in order to allow full-text search engine supports full-text search relational data, need to do some conversion operation, is about to relational data into document data.

The most commonly used way is to convert relational data conversion in accordance with the form of objects as JSON document, and then enter the JSON document full-text search engine to index. I also programmers with basic information table an example of how the conversion.

Convert the previous sample in the form of a JSON document programmers, programmers can get three information-related documents, I am a programmer 1 as an example:

{

  "id": 1,

  "Name": "Dolon"

  " Sex: Male",

  "Location": "Beijing"

  "Unit": "cat plant"

  "Hobby": "write code, tourism, Marathon"

  "Language": "Java, C ++, PHP",

  "Self-introduction": "technical experts, simple, and his enthusiasm."

}

Full-text search engine can establish full-text index based on JSON document and fast full-text search. To Elasticsearch for example, the index basic principle is as follows:

Elastcisearch distributed document storage. It can store and retrieve complex data structures - to be serialized JSON document - real-time manner.

In Elasticsearch, all data for each field are indexed by default. That is, each field has to quickly retrieve a dedicated set of inverted index. And, unlike most other databases, you can use all the inverted index in the same query, and returns the result at an alarming rate.

Excerpt: https: //www.elastic.co/guide/cn/elasticsearch/guide/current/data-in-data-out.html

Advantages and disadvantages of Redis and MongoDB

https://blog.csdn.net/weixin_43160039/article/details/83544228

Guess you like

Origin www.cnblogs.com/niwa/p/11265835.html