Jesse Anderson has done a study, a team of artificial intelligence and reasonable organizational structure, the data processing needs of engineers 4/5. Unfortunately, a lot of teams do not realize this. Why fall in love with the data processing
MR learn how to use design data processing from the top frame
Why so Apache Spark and Apache Beam Design
Google T6 level design
Separate front and rear ends decoupling process, the batch before digraph + compiler, there is a rear end to optimize the resource allocation in FIG. + Auto + automatic monitoring / tracking error
Looking cluster tok (K) sales task
首先我们忘掉所有的框架,我们想做的业务设计其实是就是一个count() 一个topK()
google processing framework - Google Level Platform
衡量指标很简单是sla
工程一致性模型,强一致性,弱一致性,最终一致性
Cloud Spanner 就是强一致性,业务级的数据引擎
workflow design patterns
'' '
Replication was isolated by filtration combined
' ''
In response to non-real time data processing
可以使用发布订阅,进行解耦 削峰
cap
c linear consistency of distributed systems operate as stand-alone as
a Availability as long as not all nodes are linked, the data must return a response
p partitions fault tolerance, data can not be that there is only one node
cp storage system architecture uses the Google BigTable, Hbase, MongoDB
Ap amazon dynamo system data system
kafka system belonging ca
The lamdba architecture big data architecture
批处理层 速度处理层 服务层
![](https://img2018.cnblogs.com/blog/1337375/201909/1337375-20190921094559411-1082918256.png)
The kappa architecture big data
spark
spark 不只能依赖于hadoop 才能使用,还可以运行在apache mesos ,kubernetes ,standalone
![](https://img2018.cnblogs.com/blog/1337375/201909/1337375-20190921100517698-944731212.png)
平行等级设备 spark storm presto impala
lot
flink 数据结构是 stream ,基于条数据进行使用的数据
The technology to break those pain points