Three paradigms of relational databases
What is a paradigm? Rule is data modeling paradigm.
- The first paradigm: ensuring atomicity of each column.
All the fields in the database table are indivisible atomic value. - The second paradigm: to ensure that each row in the table are the primary key and relevant.
A database table can only preserve a data, can not put a variety of data stored in the same database table, such as order-related information will design three. table: orders table, table line items, merchandise table. - The third paradigm: Make sure all directly related to the primary key and each column, rather than indirectly related.
For example, an order table just save userId, do not need to save the entire user information.
Three relational database paradigm simplifies the write operation, a read operation but performance is not high (join consuming operation performance), and scalability is poor, while the anti-paradigm design data stored redundancy in the document, without having to deal join operation, data read performance is very good, but the anti-paradigm design is not suitable for frequent changes of scene data.
There is data in the association process Elasticsearch
Non-relational data storage engine Elasticsearch use, namely anti-paradigm design, there is data that Elasticsearch how to deal with relationships it? There are three methods, namely three types of data.
- Object type (Object)
- Nested type (the Nested)
- Join type (Join)
Object type (Object)
Object data type to use the information to store movies and actors in a doc.
(1) Mapping defined
PUT /my_movies
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"actors": {
"properties": {
"first_name": {
"type": "keyword"
},
"last_name": {
"type": "keyword"
}
}
}
}
}
}
(2) Adding data
PUT /my_movies/_doc/1
{
"title": "Speed",
"actors": [
{
"first_name": "Keanu",
"last_name": "Reeves"
},
{
"first_name": "Dennis",
"last_name": "Hopper"
}
]
}
(3) Search
GET /my_movies/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"actors.first_name": "Keanu"
}
},
{
"match": {
"actors.last_name": "Hopper"
}
}
]
}
}
}
result:
"hits" : [
{
"_index" : "my_movies",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.723315,
"_source" : {
"title" : "Speed",
"actors" : [
{
"first_name" : "Keanu",
"last_name" : "Reeves"
},
{
"first_name" : "Dennis",
"last_name" : "Hopper"
}
]
}
}
]
We want the search results should be returned empty, but Elasticsearch has returned a result, Why is this so because the array of objects to be processed become the key to the flat structure?:
"title":"Speed"
"actors.first_name":["Keanu","Dennis"]
"actors.last_name":["Reeves","Hopper"]
So when a search is performed not return the results we want. That is not suited to handle the type of object relationship.
Nested type (the Nested)
We know from the above example, an array of objects when building inverted index object is not independent, eventually leading to inaccurate results, and Nested data types when creating an index for the array of objects, each object is independent, through nested query you can get the results we want.
(1) Definition Maping
PUT /my_movies
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"actors": {
"type": "nested",
"properties": {
"first_name": {
"type": "keyword"
},
"last_name": {
"type": "keyword"
}
}
}
}
}
}
(2) Adding data
PUT /my_movies/_doc/1
{
"title": "Speed",
"actors": [
{
"first_name": "Keanu",
"last_name": "Reeves"
},
{
"first_name": "Dennis",
"last_name": "Hopper"
}
]
}
(3) Search
GET /my_movies/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "actors",
"query": {
"bool": {
"must": [
{
"match": {
"actors.first_name": "Keanu"
}
},
{
"match": {
"actors.last_name": "Hopper"
}
}
]
}
}
}
}
]
}
}
}
result:
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
Join type (Join)
Nested type association process has a limitation, i.e., each update need to re-index the entire object (and nested objects including a root object).
Providing Elasticsearch similar Join in a relational database implemented, i.e. Join data type. Join the data type defines the parent-child relationship between the document to separate the two objects.
- Parent document and subdocuments are two separate documents.
- Update parent document without having to re-index sub-documents.
- Sub-document is added, updated or deleted will not affect the parent document and other sub-documents.
An example of blog comments and we look.
(1) Mapping defined
PUT /my_blogs
{
"settings": {
"number_of_shards": 2
},
"mappings": {
"properties": {
"title": {
"type": "keyword"
},
"content": {
"type": "text"
},
"comment": {
"type": "text"
},
"username": {
"type": "keyword"
},
"blog_comments_relation" : {
"type": "join",
"relations": {
"blog": "comment"
}
}
}
}
}
Note that the master slice is defined as the number 2, and between blog comment paternity.
(2) Adding data
a. Adding blog data
PUT /my_blogs/_doc/blog1
{
"title": "Learning Elasticsearch",
"content": "learning ELK @ tyshawn",
"blog_comments_relation": {
"name": "blog"
}
}
PUT /my_blogs/_doc/blog2
{
"title": "Learning Hadoop",
"content": "learning Hadoop @ tyshawn",
"blog_comments_relation": {
"name": "blog"
}
}
blog1 and blog2 is _id, pay attention to _id is not necessarily the numbers.
b. Add comment data
PUT /my_blogs/_doc/comment1?routing=blog1
{
"comment": "I am learning ELK",
"username": "Jack",
"blog_comments_relation": {
"name": "comment",
"parent": "blog1"
}
}
PUT /my_blogs/_doc/comment2?routing=blog2
{
"comment": "I like Hadoop!!!!!",
"username": "Jack",
"blog_comments_relation": {
"name": "comment",
"parent": "blog2"
}
}
When you add a comment to specify the route, make sure his son to the same document index fragmentation. The purpose is to ensure the performance of join queries.
(3) query
Join the unique type of inquiry:
- parent_id
by querying the parent document id, returns all related child documents. - has_child
sub-document query returns the parent document with the relevant sub-documents. Documents father and son on the same slice, so high Join efficiency. - has_parent
the parent document query returns all related child documents.
a. parent_id
GET /my_blogs/_search
{
"query": {
"parent_id": {
"type": "comment",
"id": "blog2"
}
}
}
result:
"hits" : [
{
"_index" : "my_blogs",
"_type" : "_doc",
"_id" : "comment2",
"_score" : 0.6931472,
"_routing" : "blog2",
"_source" : {
"comment" : "I like Hadoop!!!!!",
"username" : "Jack",
"blog_comments_relation" : {
"name" : "comment",
"parent" : "blog2"
}
}
}
]
b. has_child
GET /my_blogs/_search
{
"query": {
"has_child": {
"type": "comment",
"query": {
"match": {
"username": "Jack"
}
}
}
}
}
result:
"hits" : [
{
"_index" : "my_blogs",
"_type" : "_doc",
"_id" : "blog1",
"_score" : 1.0,
"_source" : {
"title" : "Learning Elasticsearch",
"content" : "learning ELK @ tyshawn",
"blog_comments_relation" : {
"name" : "blog"
}
}
},
{
"_index" : "my_blogs",
"_type" : "_doc",
"_id" : "blog2",
"_score" : 1.0,
"_source" : {
"title" : "Learning Hadoop",
"content" : "learning Hadoop @ tyshawn",
"blog_comments_relation" : {
"name" : "blog"
}
}
}
]
c. has_parent
GET /my_blogs/_search
{
"query": {
"has_parent": {
"parent_type": "blog",
"query": {
"match": {
"title": "Learning Hadoop"
}
}
}
}
}
result:
"hits" : [
{
"_index" : "my_blogs",
"_type" : "_doc",
"_id" : "comment2",
"_score" : 1.0,
"_routing" : "blog2",
"_source" : {
"comment" : "I like Hadoop!!!!!",
"username" : "Jack",
"blog_comments_relation" : {
"name" : "comment",
"parent" : "blog2"
}
}
}
]
(4) update the child documents
Update the child documents will not affect the parent document.
POST /my_blogs/_update/comment2?routing=blog2
{
"doc": {
"comment": "Hello Hadoop??"
}
}
Queried via id and routing
GET /my_blogs/_doc/comment2?routing=blog2
result:
{
"_index" : "my_blogs",
"_type" : "_doc",
"_id" : "comment2",
"_version" : 2,
"_seq_no" : 4,
"_primary_term" : 1,
"_routing" : "blog2",
"found" : true,
"_source" : {
"comment" : "Hello Hadoop??",
"username" : "Jack",
"blog_comments_relation" : {
"name" : "comment",
"parent" : "blog2"
}
}
}
Nested Types contrast type Join
Object data types are not suitable for processing data associated with a relationship, and that Nested type Join types are suitable for what scene it? Contrast between the two is that we look at.
Compared | Nested | Join |
---|---|---|
advantage | Since ⼀ documents stored in read performance ADVANCED | Documents can be updated independently father and son |
Shortcoming | When updating nested sub-document, you need to update the entire document | Require additional memory to maintain the relationship, the read performance is relatively poor |
Applicable scene | Sub-document occasional updates to the query-based | Sub-document update frequently |
Other ways
We can also not be used in the actual development and Nested Join type to handle data having an association, we can directly ES database tables and indexes to establish one to one relationship, then check out the ES data in the application-side processing relationship or directly the data table associated with the establishment of a relationship between the ES index combined, this approach is the simplest.