Comparison of hadoop and relational database systems

Hadoop provides a stable shared storage and analysis system. The storage is implemented by HDFS and the analysis is implemented by MapReduce. For every query, every database set (at least a large portion) is processed.

Why not do large-scale batch analysis by using the database plus more disk? Why do we still need MapReduce?

1. The speed of addressing time of a disk drive is much slower than the speed of increasing the transmission rate. Addressing is the process of moving the magnetic head to a specific position for read and write operations. It is characterized by a delay in disk operations, and the transmission rate corresponds to the bandwidth. If access to data is limited by the
addressing of the disk, it will inevitably cause it to take longer to read or write most of the data.

2. In the case of updating a small part of the data, the traditional B-tree works well, but when updating most of the data, the efficiency of the B-tree is not as high as that of MapReduce, because it needs to use sort/merge to rebuild the database.

In many cases, MapReduce can be seen as a complement to an RDBMS, and MapReduce is well suited for problems that require analysis of entire datasets, in batch mode, especially Ad Hoc (autonomous or on-the-fly) analysis. RDBMSs are suitable for point queries and updates
(where the dataset has been indexed to provide low-latency retrieval and small data updates for short periods of time). MapReduce is suitable for applications where data is written once and read many times, while RDBMS is more suitable for continuously updated datasets.

Relational Database vs MapReduce

  traditional relational database MapReduce
data size GB PB
access Interactive and batch batch processing
renew read and write multiple times write once read many
structure static mode dynamic mode
Integration high Low
Scalability nonlinear linear

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326357592&siteId=291194637
Recommended