What is big data? Large common data framework

1. What is big data?

In the current development stage of Internet technology, data transactions generated a lot of routine, work, etc. have been information technology, human-generated data volume compared to the past has been explosive growth in the previous traditional data processing technology has not qualified, the demand for birth technology, set for mass data processing software tools came into being, this is the big data !

 

 

2, data processing

Massive data processing core technology:

Mass data storage: distributed

Massive data computing: Distributed

 

 

3, a large common data framework

Achieve these core technologies is the user does not need to build from scratch wheels

Storage and computing, we have a large number of mature framework with

 

Storage framework :

HDFS - distributed file storage system (storage framework in Hadoop)

HBase-- distributed database system

Kafka-- (widely used real-time streaming data processing scenario) message distributed cache system

 

Operational framework :( to solve the core issue is to help the user to processing logic in parallel on many machines)

MapReduce - off-line calculation frame batch / Hadoop in

Spark - Batch offline / live streaming calculated

Storm - real-time streaming computing

 

Auxiliary tools like (Big Data Engineer liberation some of the tedious work):

Hive - Data warehousing tools: can receive sql, or translated into MapReduce running Spark

Flume-- data collection

Sqoop-- data migration

Elastic Search - Distributed Search Engine

.......

 

 

From another perspective, big data is:

  1. There are vast amounts of data
  2. There is a demand for massive data mining
  3. There are massive data mining software tools (Hadoop, Spark, Storm, Flink, tez, impala ......)

 

 

4, large data specific application in real life

The most typical application of data processing: Analysis of product operations

 

Electricity supplier recommendation system: based on browsing behavior massive shopping behavior data, a lot of calculation algorithm model, draw all kinds of conclusions recommended to the electricity supplier site pages to product recommendations for users

Accurate advertising push system: all types of data based on the mass of Internet users, statistical analysis, user portrait (users get the various attributes label), can then be targeted for advertisers accurate advertising

 

Published 162 original articles · won praise 237 · views 260 000 +

Guess you like

Origin blog.csdn.net/itcats_cn/article/details/88817534