Getting Started and Overview of NoSql

1. Big opportunities in the context of the Internet era, why use NoSQL

1) The beautiful era of stand-alone MySQL
In the 1990s, the traffic of a website was generally not large, and a single database could easily cope with it. At that time, there were more static web pages, and there were not many dynamic interactive websites. Under the above architecture, let's take a look at what is the bottleneck of data storage?
1) When the total size of the data volume cannot fit in one machine
2) When the data index (B+ Tree) cannot fit in the memory of a machine
3) The access volume (mixed read and write) cannot be tolerated by one instance
If the above 1 or 3, evolution...

2) Memcached (cache) + MySQL + vertical split
Later, with the increase in traffic, almost most websites using MySQL architecture began to experience performance problems in the database. Web programs no longer only focus on functions, but also in pursuit of performance. Programmers began to use a lot of caching technology to relieve the pressure on the database and optimize the structure and index of the database. At the beginning, it is more popular to use file cache to relieve database pressure, but when the traffic continues to increase, multiple web machines cannot share through file cache, and a large number of small file caches also bring relatively high IO pressure. At this time, Memcached has naturally become a very fashionable technology product.
As an independent distributed cache server, Memcached provides a shared high-performance cache service for multiple web servers. On the Memcached server, the expansion of multiple Memcached cache services based on the hash algorithm has been developed, and then appeared again. Consistent hashing is used to solve the drawbacks of a large number of cache invalidations caused by re-hash caused by adding or reducing cache servers.

3) Mysql master-slave read-write separation
As the write pressure of the database increases, Memcached can only relieve the read pressure of the database. The concentration of reading and writing on one database makes the database overwhelmed, and most websites begin to use master-slave replication technology to achieve read-write separation, in order to improve the read-write performance and the scalability of the read database. Mysql's master-slave mode has become standard for websites at this time.

4) On the basis of table and sub-database + horizontal split + mysql cluster
based on Memcached cache, MySQL master-slave replication, and read-write separation, at this time, the write pressure of MySQL main database begins to become a bottleneck, and the amount of data continues to grow. Soaring, because MyISAM uses table locks, there will be serious lock problems under high concurrency, and a large number of high-concurrency MySQL applications begin to use the InnoDB engine instead of MyISAM. At the same time, it has become popular to use sub-tables and sub-libraries to alleviate the problem of writing pressure and data growth. At this time, sub-table sub-database has become a hot technology, a hot question in interviews and a hot technical issue discussed in the industry. It was at this time that MySQL launched table partitions that were not yet stable, which also brought hope to companies with average technical strength. Although MySQL has launched MySQL Cluster, its performance cannot meet the requirements of the Internet very well, and it only provides a very large guarantee in terms of high reliability.

5) MySQL scalability bottleneck
MySQL database often stores some large text fields, resulting in very large database tables, which leads to very slow database recovery, and it is not easy to quickly restore the database. For example, 10 million 4KB texts are close to 40GB in size. If these data can be omitted from MySQL, MySQL will become very small. Relational databases are powerful, but they are not well suited for all application scenarios. MySQL's poor scalability (requiring complex technologies to implement), high IO pressure under big data, and difficulty in changing the table structure are exactly the problems faced by developers currently using MySQL.

6) Why use NoSQL?
Today we can easily access and scrape data through third-party platforms (eg: Google, Facebook, etc.). Users' personal information, social networks, geographic locations, user-generated data, and user action logs have grown exponentially. If we want to mine these user data, then the SQL database is no longer suitable for these applications, but the development of NoSQL database can also handle these large data very well.

2. What is NoSQL?

NoSQL (NoSQL = Not Only SQL), which means "not only SQL",
generally refers to non-relational databases. With the rise of Internet Web 2.0 websites, traditional relational databases have been unable to cope with Web 2.0 websites, especially the ultra-large-scale and high-concurrency SNS (Social Networking Services social networking site) type Web 2.0 purely dynamic websites, exposing a lot of difficulties. However, the non-relational database has developed very rapidly due to its own characteristics. The NoSQL database was created to solve the challenges brought by large-scale data collection and multiple data types, especially the application of big data, including the storage of ultra-large-scale data . (eg Google or Facebook collect terabits of data per day for their users). These types of data stores do not require a fixed schema and can be scaled out without redundant operations.

3. Features of NoSQL

1). Easy to expand
There are many types of NoSQL databases, but a common feature is to remove the relational features of relational databases. There is no relationship between the data, which makes it very easy to expand. Invisibly, it brings scalable capabilities at the architectural level.
2). High-performance
NoSQL databases with large data volumes have very high read and write performance, especially in large data volumes, they also perform well. Thanks to its non-relational nature, the structure of the database is simple. Generally, MySQL uses Query Cache, which is invalid every time a table is updated. It is a large-grained Cache. In applications with frequent interactions with Web 2.0, the Cache performance is not high. The NoSQL Cache is record-level, a fine-grained Cache, so NoSQL has much higher performance at this level.
3). Diverse and flexible data model
NoSQL does not need to establish fields for the data to be stored in advance, and can store custom data formats at any time. In relational databases, adding and deleting fields is a very troublesome thing. If it is a table with a very large amount of data, adding fields is a nightmare.
4). Compare the characteristics of traditional RDBMS and NOSQL

Name RDBMS NOSQL
1 Highly organized structured data represents more than just SQL
2 Structured Query Language (SQL) No declarative query language
3 Both data and relationships are stored in separate tables no predefined schema
4 Data Manipulation Language, Data Definition Language key-value store, column store, document store, graph database
5 strict consistency Eventual consistency, not ACID properties
6 Basic affairs Unstructured and unpredictable data
7 CAP theorem
8 High performance, high availability and scalability

3V+3 High 3V in
the Big Data Era: Volume, Variety, Real-time Velocity
, 3 Highs of Internet Requirements: High Concurrency and High Scalable Performance

4. Four major categories of NoSQL databases

Aggregate data model: is to put frequently accessed data together (aggregate together).
The benefits of this are obvious. For a query request, all data can be fetched in one interaction with the database; of course, storage in this way will inevitably have duplication, and duplication is for fewer interactions.

Four major categories of NoSQL databases

Types of illustrate database
KV key value is a key-value pair say again
Documentation There are many bson formats. BSON is a json-like storage format in binary form, referred to as Binary JSON. Like JSON, it supports embedded document objects and array objects. CouchDB MongoDB
column family As the name suggests, data is stored in columns. The biggest feature is that it is convenient to store structured and semi-structured data, easy to do data compression, and has a very large IO advantage for queries against a certain column or a few columns. Cassandra HBase
graphics It is not about graphics, but about relationships, such as Moments, social networks, advertising recommendation systems, etc. It focuses on building relationship graphs. Neo4J InfoGrid

5. CAP principle CAP+BASE in distributed database

What is the ACID of a traditional relational database?

short name name illustrate
A (Atomicity) Atomicity Atomicity is easy to understand, that is to say, all operations in a transaction are either completed or not performed. The condition for the success of a transaction is that all operations in the transaction succeed. As long as one operation fails, the entire transaction fails and needs to be returned. roll. For example, bank transfer, transferring 100 yuan from account A to account B, is divided into two steps: 1) withdraw 100 yuan from account A; 2) deposit 100 yuan to account B. These two steps are either completed together or not together. If only the first step is completed and the second step fails, the money will be inexplicably reduced by 100 yuan.
C (Consistency) Consistency is also easier to understand, which means that the database must always be in a consistent state, and the operation of the transaction will not change the original consistency constraints of the database.
I (Isolation) Independence The so-called independence means that concurrent transactions do not affect each other. If the data to be accessed by one transaction is being modified by another transaction, as long as the other transaction is not committed, the data it accesses will not be affected by the uncommitted transaction. . For example, there is a transaction that transfers 100 yuan from account A to account B. If the transaction has not been completed, if B queries his own account at this time, he will not see the newly added 100 yuan.
D (Durability) Persistence Persistence means that once a transaction is committed, its modifications will be permanently stored in the database, even if there is a downtime, it will not be lost.

What are the CAPs of distributed databases?

short name name illustrate
C Consistency (strong consistency)
A Availability
P Partition tolerance

CAP's 3-in-2 theory
CAP theory means that in a distributed storage system, at most the above two points can only be achieved, and because the current network hardware will definitely have problems such as delayed packet loss, partition tolerance is what we must achieve. of. Therefore, we can only make a trade-off between consistency and availability. No NoSQL system can guarantee these three points at
the same time . The core of the CAP theory is that a distributed system cannot satisfy the three requirements of consistency, availability and partition fault tolerance at the same time. At most, two needs can be better satisfied at the same time. Therefore, according to the CAP principle, NoSQL databases are divided into three categories: those that satisfy the CA principle, those that satisfy the CP principle, and those that satisfy the AP principle:

grouping Features represent
THAT Single-point clusters, systems that satisfy consistency and availability, are usually not very robust in terms of scalability. Traditional Oracle Database
AP A system that satisfies availability and partition tolerance may generally have lower requirements for consistency. The choice of most website architectures
CP A system that satisfies consistency and partition tolerance is necessary, usually the performance is not particularly high. Redis、Mongodb

Note: trade-offs must be made when using a distributed architecture. There is a balance between consistency and availability. Most web applications don't actually need strong consistency. Therefore, sacrifice C in exchange for P, which is the current direction of distributed database products.

The choice between consistency and availability

For web 2.0 websites, many of the main features of relational databases are often useless

short name illustrate
Database transaction consistency requirements 很多web实时系统并不要求严格的数据库事务,对读一致性的要求很低, 有些场合对写一致性要求并不高。允许实现最终一致性。
数据库的写实时性和读实时性需求 对关系数据库来说,插入一条数据之后立刻查询,是肯定可以读出来这条数据的,但是对于很多web应用来说,并不要求这么高的实时性,比方说发一条消息之 后,过几秒乃至十几秒之后,我的订阅者才看到这条动态是完全可以接受的。
对复杂的SQL查询,特别是多表关联查询的需求 任何大数据量的web系统,都非常忌讳多个大表的关联查询,以及复杂的数据分析类型的报表查询,特别是SNS类型的网站,从需求以及产品设计角 度,就避免了这种情况的产生。往往更多的只是单表的主键查询,以及单表的简单条件分页查询,SQL的功能被极大的弱化了。

BASE是什么

BASE就是为了解决关系数据库强一致性引起的问题而引起的可用性降低而提出的解决方案。

BASE其实是下面三个术语的缩写:
基本可用(Basically Available)
软状态(Soft state)
最终一致(Eventually consistent)

它的思想是通过让系统放松对某一时刻数据一致性的要求来换取系统整体伸缩性和性能上改观。为什么这么说呢,缘由就在于大型系统往往由于地域分布和极高性能的要求,不可能采用分布式事务来完成这些指标,要想获得这些指标,我们必须采用另外一种方式来完成,这里BASE就是解决这个问题的办法。

分布式+集群简介

分布式系统(distributed system)
由多台计算机和通信的软件组件通过计算机网络连接(本地网络或广域网)组成。分布式系统是建立在网络之上的软件系统。正是因为软件的特性,所以分布式系统具有高度的内聚性和透明性。因此,网络和分布式系统之间的区别更多的在于高层软件(特别是操作系统),而不是硬件。分布式系统可以应用在在不同的平台上如:PC、工作站、局域网和广域网上等。

简单来讲:

序号 名称 说明
1 分布式 不同的多台服务器上面部署不同的服务模块(工程),他们之间通过Rpc/Rmi之间通信和调用,对外提供服务和组内协作。
2 集群 不同的多台服务器上面部署相同的服务模块,通过分布式调度软件进行统一的调度,对外提供服务和访问。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324690453&siteId=291194637