1. Introduction and overview of NoSQL
1.1 Overview
What is NoSQL
NoSQL (Not only SQL) generally refers to non-relational databases. These types of data storage do not require a fixed schema and can be scaled out without requiring more operations.
Advantages of NoSQL
Easy to expand
Large amount of data and high performance
Diverse and flexible data types
Related words: "3V+3 high"
3Vs in the era of big data
Massive Volume, Diverse Vareity, and Real-time Velocity
3 Highs of Internet Needs
High concurrency, high controllability, high performance
Related words: "go to IOE"
Refers to the removal of IBM minicomputers, Oracle databases and EMC storage devices
1.2 Introduction to NoSQL data model
Relational and non-relational databases
aggregate model
(1) KV key-value pair
(2) BSON
BSON is a json-like binary storage format, referred to as Binary JSON.
Like json, it supports embedded document objects and array objects.
(3) column family
(4) graphics
1.3 Four major classifications of NoSQL databases
(1) kv key value
Typical application introduction
Sina: BerkeleyDB+redis
Meituan: redis+tair
Alibaba, Baidu: memcache+redis
(2) Document database
CouchDB
MongoDB
MongoDB is a database based on distributed file storage, written in C++ language, designed to provide scalable high-performance data storage solutions for WEB applications.
(3) Column storage database
Cassandra、HBse
Distributed file system
(4) Graph relational database
Neo4J、InfoGrid
1.4 CAP principle CAP+BASE in distributed database
Traditional ACID
A (Atomicity) atomicity
C (Consistency) consistency
I (Isolation) Independence
D (Durability) persistence
CAP
C: Consistency (strong consistency)
A: Availability
P: Partition tolerance (partition fault tolerance)
CAP's 3 into 2
The CAP theory is that in a distributed storage system, it is best to realize the above two points (choose two out of three).
Since the current network hardware will definitely have problems such as delay and packet loss, partition fault tolerance is what we must achieve.
BASE
BASE is a solution proposed to solve the problem of reduced availability caused by the strong consistency of relational databases.
BASE is actually the abbreviation of the following three terms:
Basically Available
Soft state
Eventually consistent
Introduction to Distributed + Cluster
Distributed: Different service modules (projects) are deployed on different servers, and they provide external services and intra-group collaboration through communication and calls between Rpc/Rmi.
Cluster: The same service module is deployed on different servers, and the distributed scheduling software is used for unified scheduling to provide external services and access.
2. Introduction to Redis
2.1 what is
Redis: REmote DIctionary Server (remote dictionary server).
It is completely open source and free. It is written in C language and complies with the BSD protocol. It is a high-performance (key-value) distributed memory database. It runs on memory and supports persistent NoSQL databases. It is also known as a data structure server.
Redis and other kv cache products have the following three characteristics:
(1) Redis supports data persistence, which can keep data in memory on disk and can be loaded again for use when restarting.
(2) Redis not only supports simple key-value type data, but also provides storage of data structures such as list, set, zset, and hash.
(3) Redis supports data backup, that is, data backup in master-slave mode
2.2 What can you do
memory storage and persistence
The operation of getting the latest N data
Simulate functions similar to HttpSession that require setting expiration time
Publish and subscribe message system
timer, counter
3. Explanation of miscellaneous basic knowledge after Redis startup
single process
The single-process model handles client requests, and the response to events such as reading and writing is achieved by wrapping the epoll function. The actual processing speed of Redis depends entirely on the execution efficiency of the main process.
Epoll is an improved epoll in the linux kernel to handle a large number of file descriptors. It is an enhanced version of the multiplexed IO interface select/poll under Linux. It can significantly improve the situation that the program is only active in a large number of concurrent connections. System CPU utilization below.
The default is 16 databases. The following table is similar to an array. It starts from zero. The initial default is to use the zero database.
Select command switches database
select + corner mark select corner mark-No. 1 library
select 6 # 选择5号库
Dbsize checks the number of keys in the current database
Flushdb: Clear the current library
Flushall: kill all libraries
Unified password management: 16 libraries all have the same password, either all are OK or none of them can be connected.
Redis indexes all start from 0
Why is the default port 6379
4. Redis data type
4.1 Five major data types of Redis
String
String is the most basic type of redis, which is exactly the same as the mamcached type. One key corresponds to one value.
The string type is binary safe, which means that the string of redis can contain any data, such as jpg pictures or serialized objects.
The maximum string value in a redis can be 512M.
Hash (Hash, similar to Map in Java and Dict in python)
Redis hash is a collection of key-value pairs.
Redis hash is a mapping table of string type field and value, and hash is especially suitable for storing objects.
List
Redis list is the simplest list of strings, sorted in insertion order, and can add an element to the head (left) or tail (right) of the list.
Its bottom layer is actually a linked list.
Set
The set of Redis is an unordered collection of string type, which is realized by HashTable.
Zset (sorted set: ordered set)
Like set, Zset is also a collection of string type elements, and duplicate members are not allowed.
The difference is that each element will be associated with a score of type double.
Redis uses scores to sort the members of the set from small to large. The members of zset are unique, but the scores (score) can be repeated.