Ecosphere
Introduction to HBase
- Highly reliable, high performance, column-oriented, scalable, real-time read-write distributed database
- Use HDFS as its file storage system, support MR program to read data
- Store unstructured and semi-structured data
RowKey : unique data identification, sorted by dictionary
Column Family : column family, a collection of multiple columns, no more than 3
** TimeStamp timestamps: ** multiple versions of data are supported simultaneously
Spark
- Big Data Parallel Computing Framework Based on Memory
- Spark is an alternative to MapReduce, compatible with HDFS, HIVE and other data sources
- Abstract distributed memory storage data structure, elastic distributed data set RDD
- Based on event drive, improve performance by reusing threads in thread pool