In layman's language Principles of Computer Composition: Design type zoomed DMP system (under) -SSD saved all (Lecture 53) DBA

First, the section summarizes Review

Last lecture, the DMP system according to various scenarios, we abstract from the level of principle, chosen as the KV AeroSpike database, Kafka as a data conduit, Hadoop / Hive as a data warehouse.

But then, there is certainly nothing and engineers will ask why MongoDB, MySQL and even such a document database or a traditional relational database adapt it? Why can not optimize SQL, add the cache so that tuning means to address this problem?

DMP second half today, we have from the database implementation of the principle, take a look at the reasons behind this. If you can get out of these table today deeper and more detailed principles for what the scene what database to use, will be more confidence, not only

It ran a large number of performance tests before we know. Next time do database selection, you can "reasoning" was.

Second, the relational database: one has to do random read and write

Let's think about it, if we let you write a simple relational database, how to store your data on the hard disk? The simplest and most intuitive idea is that

1, to how your data is stored on the hard disk?

1, with a CSV format file. A file is a data table.

2, each line of the file which is a record in this table inside.

3, if you want to modify one of the logs inside the database, then we must first find that record,

4, and then go directly to modify the data in this line. Read data is the same.

To find this data, of course, it is the most stupid way to read line by line, which is to traverse the entire CSV file. But this is the case, the equivalent of just reading any piece of data must scan the entire table ,

2, a waste of hard disk throughput. then what should we do?

We can try this CSV file to add an index. For example, the number of lines of data plus an index. If you have studied or database theory algorithms and data structures, then you should know that mostly can be indexed by the establishment of such a B + tree.

1, there is no index an entire row of data, there is only one mapping relationship, mapping this relationship allows the line number to read directly from the hard drive somewhere.

2, so the data is much smaller than the index. We can index loaded into memory inside.

3, even if there is not in memory, when looking at the entire data quickly traverse the index does not need to read too much data.

After the addition of the index, we want to read specific data, you will not have to scan the entire table data files. Directly from the specific location of the hard drive, you can read the line you want . Not only can index the index line number, you can index a field.
We can create a number of different independent index. When writing SQL, query where the latter statement can be used in these indexes.

However, in this case, when writing data will take some work. We should not only write data in the data table which, for all the indexes also need to be updated. This time, a write of data will trigger an update of several random writes.

In such a data model, query operation is very flexible. No matter which field is based on the query, as long as the index, we can pass a random read, quickly read the corresponding data. However, this flexibility also brings a big problem,

3, plus index query operation is very flexible, but what to do with whether it has a large number of random read and write requests

That is, no matter what to do, there are a large number of random read and write requests. The random read and write requests, if the request is ultimately to fall on hard disks, especially the hard disk HDD, then we can hardly achieve a high concurrency. After all, HDD hard drive only about 100 QPS.

And this at any time to add an index, you can query based on any field, so to show flexibility, but also our DMP system which is not needed. KV DMP's main database application scenarios, based on random queries the primary key,
do not need to filter queries based on other fields. Demand data pipeline, only the need to keep additional sequential read and write just fine. Even if the data analysis of the data warehouse, data filtering is not usually based on the field, but the whole amount of scan data analysis and summary.

The latter two scenarios, better said, big deal we let the program go full table scan or additional writing. However, in this database KV on demand, just the simplest design of a relational database, will face a lot of challenges random write and random read.

So, in practical large-scale systems, we will use special KV distributed database to meet this demand. Then the following, we take a look, Facebook open source Cassandra data storage and how to read and write is done,
these designs is how to solve the problem of high random read and write in concurrent.

Three, Cassandra: sequential write and random read

1, Cassandra data model

KV is a distributed database, Cassandra keys generally referred Row Key. In fact, a 16 to 36-byte string. Each value corresponds to a Row Key is actually a hash table, which you can use key-value pairs, then into a lot of data you need.

Cassandra itself not as a relational database, there are strict Schema, a database created in the beginning it is a good definition of what a column (Column). However, it is a design concept called column family (Column Family), we need to put frequently used with the
field, in the same column family column family. For example, DMP demographic attributes inside information, we can use it as a column family. The user interest information, can be another column family. In this way, while maintaining the flexibility does not require such strict Schema, preserving the
spatial locality can be used to store data often played together.

To read and write data inside Cassandra, in fact particularly simple, as if in a giant distributed hash table which write data. We specify a Row Key, and then insert or update the Row Key data just fine.

2, Cassandra writes

Cassandra solve random write data solutions, in simple terms, this is called "not random write, just write the order." For write operations Cassandra database typically contains two actions. The first is to submit a written log (Commit Log) to disk. Another operation

It is up to update the data directly in memory data structures. Back to the memory data structure inside data updates, only after successfully submitting written log will enter. On each machine, there is a reliable hard drive allows us to write commit log. Write commit log are sequential
write (Sequential Write), rather than random write (Random Write), which allows us to maximize the throughput of the writing.

If you do not understand why this is, you can talk back to the 47th, hard look at the performance evaluation. Whether HDD hard drive or SSD hard drive, sequential write much faster than random writes.

The memory space is limited, the amount of data inside the memory once or a female strip exceeds a certain limit, the Cassandra will dump data structure inside the memory to the hard disk. Dump this operation, but also the order of writing rather than random write, so performance will not be a problem. In addition to
the data structure of the file Dump, Cassandra also generates an index file in accordance with row key, based on the index to facilitate subsequent quick reference.

As more and more out of the Dump file on the hard disk, Cassandra will compare merge files in the background. In many other database systems KV inside, this merger has a similar action, such as AeroSpike or Google's BigTable. These operations we generally call
Compaction. The combined operation is also sequentially read multiple files in memory inside the merge is complete, then a new file Dump out. Throughout the operation, the level is still hard sequential read and write.

3, Cassandra read operation

 

 

When we read data from Cassandra, you will find inside data from memory, and then read the data from the hard disk, and then the data is merged into two portions final result. These files on the hard disk, in which there will be a corresponding Cache memory, which can not be found only in Cache, we will go to request
a hard disk inside the data.

If you have to access the hard disk, hard disk because there may be a snapshot Dump many different points of time the memory data. So, time to find data, we are also looking to the old from the new inside in time.

This also brings another problem, we might want to query a number Dump file to find the data we want. So, Cassandra at this point had another optimization. That is, it will all generate a Row Key for each file inside a Dump
BloomFilter, then this BloomFilter in memory inside. Thus, if you want to query Row Key does not exist in the data file inside, then the next more than 99% of the cases, it will be BloomFilter filtered out without the need to access the hard disk.

In this way, only when there is no data in memory, and when a particular file on the hard drive, will trigger a read request to the hard disk.

Four, SSD: DBA's savior

Cassandra is an open source Facebook in 2008. At that time, SSD hard drive has not been so popular. We can see that it's designed to read and write full account of the characteristics of the hardware itself. In writing data on persistence, Cassandra no random write requests, whether
Commit Log or Dump, are all written sequentially.

1, Cassandra on the data read request with the optimization

In the data read request, the data will be updated most recently written to memory. If you want to read the data, priority will be read from memory. This is the equivalent of using a LRU caching mechanism. Only in the case of desperation, will have random read requests to the hard drive. Even in such a situation
under the situation, Cassandra also file before adding a layer of BloomFilter, originally because of the need to bring Dump file to read the hard drive problem many times, many times reduced to a memory read and hard to read.

The design makes Cassandra even on HDD hard drive can also have access to a good performance. Because all writes are written order or written to memory, so it can write achieve high concurrency. HDD hard drive throughput is still very good, you can write more than 100MB per second
data, data only if a 1KB, then the 100,000 WPS (Writes per seconds) is able to do. This is sufficient to support our DMP expect the pressure to write.

For data read, there are some challenges. If the data read request has a strong locality, then we can get the amount of memory required to access the DMP.

2, the problem lies in this locality

However, the problem lies in this locality. DMP data access distribution, in fact, is the lack of locality. You think about DMP application scenarios will understand. DMP inside of Row Key is a unique identifier for the user. How long there will be localized when ordinary Internet users do? Each
number of times and visited web pages on the Internet so many people. Internet and more people, up to 24 hours a day also. Most Internet users have one day 2 to 3 hours. We can not say that these user data in memory inside, those users hold.

Well, we may not have time ⼀ given locality it? If Facebook is a global social ⽹ kind of network, it may also ⼀ given temporal locality. After all, with the time zone is not the same in different countries. We can say that the day of the Indian people, Indian people to load data into the
memory inside, data of the American people to put on the hard disk. In the evening the Indian people, then the data of the American people to change into memory. If your main business is in the country, that this time there will be no locality. Everyone's Internet peak hours, all the way to work in the morning, in the
afternoon rest time after time and night work, there is no discrimination.

Faced with this situation, if your CEO or CTO ask you, is it possible to solve this problem by optimizing the program? If you do not have to think carefully about this issue from the level of data distribution and principles, and promised to look directly down, and after that you might want to headache, because the problem is most
likely not handle.

Because of the lack of temporal locality, we cache memory can play a very small role, most of the requests ultimately have to fall on the hard disk of HDD random read. However, HDD hard disk random read performance is bad, and we've seen speaking at the 45th, which is about 100QPS. And if all
handling a memory, it is too expensive, the cost is more than 100 times the HDD hard drive.

3, 2010 SSD drive large-scale commercial solve local problems

But, fortunately, starting in 2010, SSD hard drives of large-scale commercial help us solve this problem. Its price in the HDD hard drive 10 times, but the random read access hard disk HDD capacity in more than a hundred times. In other words, spend a SSD hard drive, we can use to 1/10 of
this and get the same hard disk HDD QPS. The same price of SSD hard disk, capacity is 10 times the memory, but also to meet our needs, relatively low cost able to save the entire Internet Network Information.

No exaggeration to say that the past few years of "big data" and "high concurrency" "thousand thousand faces", half of the credit should go in the SSD capacity rising, declining prices of hard disk drive industry. We see the back of Cassandra's designed to read and write, you will find, Cassandra write mechanism perfectly match the strengths and weaknesses we are talking about 46 and 47 said SSD hard drive.

In the data write level, Cassandra Commit Log data is written is written order, that is, continue to back the additional content on the hard disk, rather than modify existing file content. Once inside the data memory exceeds a certain threshold, Cassandra Dump will complete a
new file on the file system. It is also an additional writing.

Peer data and compact (Compaction), also in an existing plurality of files, and then write out a new file. Adding only the writing operation does not modify the characteristics just naturally meet the SSD can only write block erase operation . In this write mode,
Cassandra used in the SSD hard drive, does not require Compaction back frequently, to maximize the life of the SSD. This is why, Cassandra after the SSD hard disk popularity, can get further rapid development.

Fifth, extended summary

Well, with regard to the content of DMP and memory, talking about this on the same subject. I hope that today's talk about this, you can make from the level of detail of the Cassandra database implementation, a thorough understanding of how to use performance characteristics and principles of good memories.

Traditional relational database, we have a section of data stored in one place, and then the index is stored in another place. Such storage is actually very easy to us to make a single random read and random write, stored data can also be very compact. But the problem is also here, most of the SQL request, will bring a lot of random read and write requests. This makes the traditional relational database, is not really suitable for use in highly concurrent scenarios.

Our DMP need access to the scene, in fact, needs no complex index, but there will be relatively high concurrency. I'll take you read the Facebook open a read and write this design Cassandra distributed database KV. Commit Log by writing additional and updated in memory,
Cassandra avoid the problem of random writes. Dump contrast and background data memory consolidation, also have avoided the problem of random writes, making concurrent Cassandra write performance is extremely high.

In the data reading level, through memory cache and BloomFilter, Cassandra has been reduced as much as possible to read the hard drive inside the case requires random data. But the challenge is that local DMP system is not strong, making our final request for random read or to go to the hard disk. Fortunately
that, SSD hard drives in massive data growth in the years that prices continued to fall, so that we end up by the SSD solves this problem. The mechanism itself after erasing the SSD can write, just very suitable for data read-write mode Cassandra eventually make Cassandra
has been greater development after the SSD hard disk popularity.

Guess you like

Origin www.cnblogs.com/luoahong/p/11514616.html