Comparison of Mongodb and Hbase

Comparison of Mongodb and Hbase

1. Mongodb bson document database, the entire data is stored in the disk, hbase is a column database, each familycolumn is stored in a separate hdfs file during cluster deployment.

2. The primary key of Mongodb is "_id", and no index can be built on the primary key. The order of record insertion is the same as the order of storage. The primary key of hbase is the row key, which can be any string (the maximum length is 64KB, and in practical applications, the length is generally 10-100bytes), inside hbase, the row key is saved as a byte array. When storing, the data is sorted and stored according to the lexicographical order (byte order) of the Row key. When designing keys, store this feature in sufficient order and store rows that are frequently read together.

The result of lexicographical sorting of int is 1,10,100,11,12,13,14,15,16,17,18,19,2,20,21,…,9,91,92,93,94,95, 96, 97, 98, 99. To preserve the natural ordering of integers, row keys must be left-padded with 0s.

3.Mongodb supports secondary indexes, while hbase itself does not support secondary indexes

4.Mongodb supports set lookup, regular lookup, range lookup, skip and limit, etc. It is the nosql database most like mysql, while hbase only supports three kinds of lookups: access through a single row key, through the range of row keys, the whole table scanning

5. The update of mongodb is update-in-place, that is, in-place update, unless the updated data record cannot be accommodated in place. The modification and addition of hbase are the same command: put, if the row key passed in by put already exists, the original record will be updated. , the default history number of hbase saved versions is 3.

6. The delete of mongodb will mark the data of the row as deleted, because mongodb does not really remove the record from the memory or file when deleting the record, but empties the deleted record data (write 0 or a special number plus At the same time, put the address of the record in a list list "release list", the good thing is that if a user wants to insert a record, mongodb will first get the appropriate size from the "release list". Deleted record" address is returned, this method will improve performance (avoid malloc memory operation), and mongodb also uses bucket size array to define multiple lists of different sizes, which are used to place the records to be deleted according to their size. to the appropriate "release list". Hbase's delete is to create a new tombstonemarkers first, and then merge with tombstonemarkers when reading, and delete data records will be really deleted when major compaction occurs.

7. Both mongodb and hbase support mapreduce, but mongodb's mapreduce support is not strong enough. If mongodb sharding is not used, mapreduce is not actually executed in parallel

8. mongodb supports shard sharding, and hbase automatically balances the load according to the row key. Here, the selection of shard key and row key should use non-incremental fields as much as possible, and try to use fields with balanced distribution, because sharding selects the corresponding storage according to the range. For the server, if the incremental field is used, it is easy to generate a hot server, because it is automatically divided according to the range of the key, if the key distribution is not balanced, some keys will not be able to be divided at all, resulting in load imbalance.

9. The read efficiency of mongodb is higher than that of writing. By default, hbase is suitable for writing more and reading less. It can be configured through hfile.block.cache.size. The read cache of the configuration storefile occupies the percentage of the heap size, 0.2 means 20%. This value directly affects the performance of data read. If writing is much less than reading, it is no problem to open to 0.4-0.5. If the reading and writing are more balanced, it is about 0.3. If there are more writes than reads, the default is 0.2. When setting this value, you should also refer to hbase.regionserver.global.memstore.upperLimit, which is the maximum percentage of memstore in the heap. One of the two parameters affects reading and the other affects writing. If the two values ​​add up to more than 80-90%, there is a risk of OOM, so set it carefully.

10. The LSM idea (Log-Structured Merge-Tree) adopted by hbase is to hold the changes to the data in memory. After reaching the specified threadhold, the batch changes are merged and then written to the disk in batches, so that a single write becomes The batch write method greatly improves the write speed, but in this case, it is laborious to read, and the data on the merge disk and the modified data in the memory are required, which obviously reduces the read performance. Mongodb adopts the idea of ​​mapfile+Journal. If the record is not in memory, load it into memory first, then record the log after changing it in memory, and then write the data file in batches at intervals, which requires high memory, at least it needs to accommodate Download Hotspot Data and Indexes.

 

 

1. What are the benefits of using Redis?

(1) Fast, because the data is stored in memory, similar to HashMap, the advantage of HashMap is that the time complexity of search and operation is O(1)

(2) Support rich data types, support string, list, set, sorted set, hash

(3) Transactions are supported, and operations are atomic. The so-called atomicity means that all changes to the data are either executed or not executed at all.

(4) Rich features: can be used for cache, message, set expiration time according to key, it will be automatically deleted after expiration

 

2. What are the advantages of redis over memcached?

(1) All values ​​of memcached are simple strings, and redis, as its replacement, supports richer data types

(2) redis的速度比memcached快很多

(3) redis可以持久化其数据

 

3. redis常见性能问题和解决方案:

(1) Master最好不要做任何持久化工作,如RDB内存快照和AOF日志文件

(2) 如果数据比较重要,某个Slave开启AOF备份数据,策略设置为每秒同步一次

(3) 为了主从复制的速度和连接的稳定性,Master和Slave最好在同一个局域网内

(4) 尽量避免在压力很大的主库上增加从库

(5) 主从复制不要用图状结构,用单向链表结构更为稳定,即:Master <- Slave1 <- Slave2 <- Slave3...

这样的结构方便解决单点故障问题,实现Slave对Master的替换。如果Master挂了,可以立刻启用Slave1做Master,其他不变

 

 ==========

二面的面试官是做Java研发的。

刚开始,还是讨论比赛和分布式。后面问了我一道大量文本处理的问题“找出一个大文本中的Top3的字符串”。没有做过这方面的研究,答的不好。

1. Java问题:"讲一下JVM的结构。"

2. 网络问题:"TCP/IP的三次握手、四次挥手"

3. "淘宝用户的数据(购物车……)存在那里?怎么满足高并发?"

4. "输入两个整型数组,返回一个数组:两个数组中的公共值。"

其实,到这个时候,自己有点思路混乱了,有点紧张。我从快速排序算法开始写,然后用的方法也不是最好的。面试官不是很满意。

后面吃午饭的时候,我想到其实可以为O(nlgn)排序后,再O(m+n)就可以了。跟别人讨论的时候,我想到,针对某些特殊的情况;其实这个可以用哈希的思想来做。复杂度O(K),K为数组中的最大值。

面试官3:“考一下你对递归的掌握。写一个函数,输入int型,返回整数逆序后的字符串。如:输入123,返回“321”。 要求必须用递归,不能用全局变量,输入必须是一个参数,必须返回字符串。”

当时,只做到了逆序输出(打印),并没有做到逆序返回字符串。

吃完午饭,跟别人讨论的时候,我突然想到这个要用到二叉树递归求解深度、叶子数等问题的一些思想:每次返回的时候加上上一次的返回值。

这个时候,面试官不是很满意,正好也该吃饭了。然后,面试官3说:“这样吧。我帮你找个数据研发的,你再去面试一下。让他来做评价吧。”虽然不情愿,但是还是同意了。毕竟这次表现太差了。

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326513805&siteId=291194637