Combat large-scale data

Jesse Anderson has done a study, a team of artificial intelligence and reasonable organizational structure, the data processing needs of engineers 4/5. Unfortunately, a lot of teams do not realize this. Why fall in love with the data processing

MR learn how to use design data processing from the top frame

Why so Apache Spark and Apache Beam Design

Google T6 level design


Separate front and rear ends decoupling process, the batch before digraph + compiler, there is a rear end to optimize the resource allocation in FIG. + Auto + automatic monitoring / tracking error

Looking cluster tok (K) sales task

首先我们忘掉所有的框架,我们想做的业务设计其实是就是一个count() 一个topK()

google processing framework - Google Level Platform

衡量指标很简单是sla  
工程一致性模型,强一致性,弱一致性,最终一致性
Cloud Spanner 就是强一致性,业务级的数据引擎

workflow design patterns

'' '
Replication was isolated by filtration combined
' ''

In response to non-real time data processing

可以使用发布订阅,进行解耦 削峰

cap

c linear consistency of distributed systems operate as stand-alone as
a Availability as long as not all nodes are linked, the data must return a response
p partitions fault tolerance, data can not be that there is only one node
cp storage system architecture uses the Google BigTable, Hbase, MongoDB
Ap amazon dynamo system data system
kafka system belonging ca

The lamdba architecture big data architecture

批处理层 速度处理层 服务层 
![](https://img2018.cnblogs.com/blog/1337375/201909/1337375-20190921094559411-1082918256.png)

The kappa architecture big data

spark

spark 不只能依赖于hadoop 才能使用,还可以运行在apache mesos ,kubernetes ,standalone 
![](https://img2018.cnblogs.com/blog/1337375/201909/1337375-20190921100517698-944731212.png)

平行等级设备  spark storm presto impala
flink 数据结构是 stream  ,基于条数据进行使用的数据

The technology to break those pain points

The latest knowledge, frame level before and after the end of the separation, the batch flow unify their thinking

Guess you like

Origin www.cnblogs.com/corx/p/11523546.html