Big Data Series (8) Introduction to Hadoop Ecology

EcosphereInsert picture description here

Introduction to HBase

  • Highly reliable, high performance, column-oriented, scalable, real-time read-write distributed database
  • Use HDFS as its file storage system, support MR program to read data
  • Store unstructured and semi-structured data

RowKey : unique data identification, sorted by dictionary
Column Family : column family, a collection of multiple columns, no more than 3
** TimeStamp timestamps: ** multiple versions of data are supported simultaneously
Insert picture description here

Spark

  • Big Data Parallel Computing Framework Based on Memory
  • Spark is an alternative to MapReduce, compatible with HDFS, HIVE and other data sources
  • Abstract distributed memory storage data structure, elastic distributed data set RDD
  • Based on event drive, improve performance by reusing threads in thread pool
Published 35 original articles · won 3 · views 3300

Guess you like

Origin blog.csdn.net/qq_43430261/article/details/105545115