A comprehensive understanding of HBase: a NoSQL database worth owning (1)

Foreword: Speaking of HBase technology, for people who have a little contact with or used it, it may be just a common library in hundreds of databases. It is probably just like my knowledge of Redis: cache ! But for HBase, I do have certain feelings. Today, an interesting idea suddenly emerged. I want to put aside the technical perspective, from an emotional perspective, like writing a novel, writing this old friend. This may be a little funny, but I feel very relaxed. "A comprehensive understanding of HBase: a NoSQL database worth having": From today on, we will temporarily consider this to be the name of a novel! Haha ~

In fact, one thing I particularly want to do is to let more people know and use HBase, an out-of-the-box big data stack technology. Of course, nothing else. The main reason is that HBase is really good I feel really good with myself, how can I recommend you to a bad product? After all, this guy from HBase won't give me a penny for advertising costs ~

First of all, what I want to share with you is: some things that I did n’t want to see when I first met HBase, an old friend. what? In fact, it is a very boring and esoteric soul that seems to have to ask three questions: Who am I? Where do I come from? Where do I go?

Why do you want to write this? It ’s really boring ~ Of course it ’s not that I am too boring. To be honest, it ’s because I really have feelings for it, so I want to introduce all of its past and present to you. This kind of awe may just be afraid that those who rushed by will forget who it is.

Where do I come from?

We know that HBase appears in the context of big data, so when it comes to this problem, we have to mention the three popular Google papers that laid the foundation of the big data algorithm, also known as Google ’s troika: Google FS [2003], MapReduce [2004], BigTable [2006]. The link to the Chinese version of the three papers is here for everyone. If you have nothing to do, you can take a look.

链接:https://pan.baidu.com/s/1EIhGR6gADm2BnEh5hW4KUA 
提取码:c1wb 

Why are these three papers popular all over the world? We said that with the advent of the era of big data, we are also facing the core second question brought by big data:

1、海量数据如何存储?
2、海量数据如何计算?
3、海量结构化数据如何高效读写?

However, the three papers published by Google from 2003 to 2006 provide ideas for solving two problems.

"We designed and implemented the Google GFS file system, a scalable, distributed file system for large-scale data-intensive applications.
Although GFS runs on inexpensive universal hardware devices, it still provides the ability to provide disaster redundancy , Providing high-performance
services for a large number of clients .
...
GFS fully meets our storage needs. "

The advanced design ideas of the Google GFS file system provide solutions for solving the storage of massive data in the era of big data, and provide valuable guidance for the design of distributed systems in the future. The MapReduce framework solves the problem of how to calculate massive data in the era of big data. Although the current Spark is very hot, the digger must not forget the draft.

In 2006, Google released the third important paper. Bigtable is a distributed structured data storage system, which is designed to handle massive amounts of data: usually PB-level data distributed on thousands of ordinary servers. Bigtable is designed to reliably process petabytes of data and can be deployed on thousands of machines. It is used to solve the problem of storage and efficient reading and writing of massive structured data within Google.

It is precisely because of the publication of these three papers that we have HDFS, MapReduce and HBase, and we have the first year of big data in 2015. Let's take a closer look at the chronicle of the Hadoop family. Here you can probably also see the status of HBase in the Hadoop family.

*   2002年10月,Doug Cutting和Mike Cafarella创建了开源网页爬虫项目Nutch。

*   2003年10月,Google发表Google File System论文。

*   2004年7月,Doug Cutting和Mike Cafarella在Nutch中实现了类似GFS的功能,即后来HDFS的前身。

*   2004年10月,Google发表了MapReduce论文。

*   2005年2月,Mike Cafarella在Nutch中实现了MapReduce的最初版本。

*   2006年1月,Doug Cutting加入雅虎,Yahoo!提供一个专门的团队和资源将Hadoop发展成一个可在网络上运行的系统。

*   2006年2月,Apache Hadoop项目正式启动以支持MapReduce和HDFS的独立发展。

*   2006年3月,Yahoo!建设了第一个Hadoop集群用于开发。

*   2006年4月,第一个Apache Hadoop发布。

*   2006年11月,Google发表了Bigtable论文,这最终激发了HBase库的创建。

*   2007年10月,第一个可用的HBase发布了。

*   2008年1月,Hadoop成为Apache顶级项目。

*   2008年1月,HBase成为 Hadoop 的子项目。

*   2008年6月,Hadoop的第一个SQL框架——Hive成为了Hadoop的子项目。

*   2009年7月 ,MapReduce 和 HDFS成为Hadoop项目的独立子项目。

*   2009年7月 ,Avro 和 Chukwa 成为Hadoop新的子项目。

*   2009年10月,首届Hadoop World大会在纽约召开。

*   2010年5月 ,HBase脱离Hadoop项目,成为Apache顶级项目。

*   2010年9月,Hive 脱离Hadoop,成为Apache顶级项目。

*   2010年9月,Pig脱离Hadoop,成为Apache顶级项目。

*   2011年1月,ZooKeeper 脱离Hadoop,成为Apache顶级项目。

*   2012年8月,YARN成为Hadoop子项目。

*   2012年10月,第一个Hadoop原生MPP查询引擎Impala加入到了Hadoop生态圈。

*  2014年2月,Spark逐渐代替MapReduce成为Hadoop的缺省执行引擎,并成为Apache基金会顶级项目。

*   2015年10月,Cloudera公布继HBase以后的第一个Hadoop原生存储替代方案——Kudu。

*   2015年12月,Cloudera发起的Impala和Kudu项目加入Apache孵化器。

Okay, let ’s say good night to everyone in a picture, it ’s late, it ’s time to sleep ~ In the next chapter, we will ask the soul of "Who am I?"

Where do I come from?

Reference article

https://blog.csdn.net/lfq1532632051/article/details/53219558

Scan the QR code to follow the blogger's official account

Please indicate the source! Welcome to pay attention to my WeChat public account [HBase working notes]

Guess you like

Origin www.cnblogs.com/zpb2016/p/12723939.html