Why write the book "Big Data Architecture Explained"



 

It took almost a year and a half, sacrificed every weekend, and spent a lot of effort, and finally completed the first book in my life, "Detailed Explanation of Big Data Architecture: From Data Acquisition to Deep Learning". The whole process was actually quite painful, and I often wanted to give up, but fortunately I persevered.

 

Looking back on my 500 days, I often ask myself two questions:

 

1) I asked myself why I chose to write a book on big data technology, and what kept me going?

I feel that the more reason is that I have practiced big data architecture and technology for so many years, I have a certain level of understanding of the technology, and I have many words in my heart, so I need to find a place to express it completely.

 

2) Do big data practitioners or students and readers who are interested in big data need a book?

People often mistakenly think that big data is a single-point technology. In fact, big data technology is a technology family. Everyone needs a book that completely introduces the technology of big data.

 

With the answers to these two questions, the idea of ​​the book and the idea of ​​the theme are there. So I want to write a book that introduces the end-to-end knowledge points in the field of big data processing from the three dimensions of architecture, business, and technology.

 

The main content of the book consists of three parts: the first part introduces the origin, development, key technical points and future trends of big data technology from the perspective of data generation, collection, calculation, storage, and consumption end-to-end, combined with vivid industry latest products, and The latest research directions and achievements in academia make the esoteric technology easy to understand; the second part introduces practical cases from the perspective of business and technology, so that readers can understand the use of big data and the nature of the technology; the third part introduces that big data technology is not isolated , explain how to combine with cutting-edge cloud technology, deep learning, machine learning, etc.

 

(Finally, make an advertisement. If you are interested in this book, you can go to JD.com, Taobao, Dangdang, Amazon, Interactive Publishing House, etc. to book, JD.com has stock https://item.jd.com/10826699444.html , thank you for your support!)

 

The content of this book is as follows, let's see if there is any content that interests you:

 

Part 1 The nature of big data

Chapter 1 What is Big Data 2

1.1 Introduction to Big Data 2

1.1.1 A brief history of big data 2

1.1.2 Status Quo of Big Data 3

1.1.3 Big Data and BI 3

1.2 Enterprise Data Assets 4

1.3 Big Data Challenge 5

1.3.1 Cost challenges 6

1.3.2 Real-time challenges 6

1.3.3 Security challenges 6

1.4 Summary 6

Chapter 2 Operator Big Data Architecture 7

2.1 Architecture-Driven Factors 7

2.2 Big Data Platform Architecture 7

2.3 Platform Development Trend 8

2.4 Summary 8

Chapter 3 Operator Big Data Services 9

3.1 Common Big Data Services of Operators 9

3.1.1 SQM (Operation and Maintenance Quality Management) 9

3.1.2 CSE (Customer Experience Enhancement) 9

3.1.3 MSS (Market Operation and Maintenance Support) 10

3.1.4 DMP (Data Management Platform) 10

3.2 Summary 11

The second part of big data technology

Chapter 4 Data Acquisition 14

4.1 Data Classification 14

4.2 Data Acquisition Components 14

4.3 Probe 15

4.3.1 Probe principle 15

4.3.2 Key Capabilities of Probes 16

4.4 Web Scraping 26

4.4.1 Web Crawler 26

4.4.2 Simple crawler Python code example 32

4.5 Log Collection 33

4.5.1 Flume 33

4.5.2 Other log collection components 47

4.6 Data Distribution Middleware 47

4.6.1 The role of data distribution middleware 47

4.6.2 Kafka Architecture and Principles 47

4.7 Summary 82

Chapter 5 Stream Processing 83

5.1 Arithmetic 83

5.2 The concept of streams 83

5.3 Application Scenarios of Streams 84

5.3.1 Financial Sector 84

5.3.2 Telecommunications 85

5.4 Two Typical Streaming Engines in the Industry 85

5.4.1 Storm 85

5.4.2 Spark Streaming 89

5.4.3 Fusion Framework 102

5.5 CEP 108

5.5.1 What is CEP 108

5.5.2 Architecture of CEP 109

5.5.3 Esper 110

5.6 Combining Machine Learning in Real Time 110

5.6.1 Features of Eagle 111

5.6.2 Eagle Overview 111

5.7 Summary 116

Chapter 6 Interactive Analysis 117

6.1 The concept of interactive analysis 117

6.2 MPP DB Technology 118

6.2.1 The concept of MPP 118

6.2.2 A typical MPP database 121

6.2.3 MPP DB Tuning Practice 131

6.2.4 MPP DB applicable scenarios 162

6.3 SQL on Hadoop 163

6.3.1 Hive 163

6.3.2 Phoenix 165

6.3.3 Impala 166

6.4 Big Data Warehouses 167

6.4.1 The concept of a data warehouse 167

6.4.2 OLTP/OLAP comparison 168

6.4.3 Similarities and Differences in Big Data Scenarios 168

6.4.4 The query engine 169

6.4.5 Storage Engines 170

6.5 Summary 171

Chapter 7 Batch Processing Techniques 172

7.1 Concepts of Batch Technology 172

7.2 MPP DB technology 172

7.3 The MapReduce Programming Framework 173

7.3.1 The origin of MapReduce 173

7.3.2 Principles of MapReduce 173

7.3.3 Shuffle 174

7.3.4 The main reasons for poor performance 177

7.4 Spark Architecture and Principles 177

7.4.1 The origin and characteristics of Spark 177

7.4.2 Core Concepts of Spark 178

7.5 The BSP Framework 217

7.5.1 What is the BSP Model 217

7.5.2 Introduction to the Parallel Model 218

7.5.3 Fundamentals of the BSP Model 220

7.5.4 Features of the BSP Model 222

7.5.5 Evaluation of the BSP Model 222

7.5.6 Comparison of BSP and MapReduce 222

7.5.7 Implementation of the BSP Model 223

7.5.8 Introduction to Apache Hama 223

7.6 Key Technologies of Batch Processing 227

7.6.1 CodeGen 227

7.6.2 CPU Affinity Technology 228

7.7 Summary 229

Chapter 8 Machine Learning and Data Mining 230

8.1 Connections and Differences Between Machine Learning and Data Mining 230

8.2 Typical Data Mining and Machine Learning Processes 231

8.3 Overview of Machine Learning 232

8.3.1 Learning styles 232

8.3.2 Algorithmic Similarity 233

8.4 Machine Learning & Data Mining Application Cases 235

8.4.1 The story of diapers and beer 235

8.4.2 Decision tree for rapid fault location in telecommunication field 236

8.4.3 The field of image recognition 236

8.4.4 Natural Language Recognition 238

8.5 Interactive Analysis 239

8.6 Deep Learning 240

8.6.1 Overview of Deep Learning 240

8.6.2 The context of machine learning 241

8.6.3 The Human Brain Vision Mechanism 242

8.6.4 About Features 244

8.6.5 How Many Features Are Needed 245

8.6.6 The basic idea of ​​deep learning 246

8.6.7 Shallow and deep learning 246

8.6.8 Deep Learning and Neural Networks 247

8.6.9 The training process of deep learning 248

8.6.10 A framework for deep learning 248

8.6.11 Deep Learning and GPUs 255

8.6.12 Deep Learning Summary and Outlook 256

8.7 Summary 257

Chapter 9 Resource Management 258

9.1 Basic Concepts of Resource Management 258

9.1.1 Goals and Values ​​of Resource Scheduling 258

9.1.2 Use Restrictions and Difficulties of Resource Scheduling 258

9.2 A Resource Scheduling Framework in the Hadoop Domain 259

9.2.1 YARN 259

9.2.2 Borg 260

9.2.3 Omega 262

9.2.4 Summary of this section 263

9.3 Resource Allocation Algorithms 263

9.3.1 The role of algorithms 263

9.3.2 Analysis of Several Scheduling Algorithms 263

9.4 Data Center Unified Resource Scheduling 271

9.4.1 Mesos+Marathon Architecture and Principles 271

9.4.2 Mesos+Marathon Summary 283

9.5 Multitenancy Technologies 284

9.5.1 The concept of multi-tenancy 284

9.5.2 Multi-tenancy scenarios 284

9.6 Intelligent Scheduling Based on Application Description 287

9.7 Apache Mesos Architecture and Principles 288

9.7.1 Apache Mesos background 288

9.7.2 Overall Architecture of Apache Mesos 288

9.7.3 How Apache Mesos Works 290

9.7.4 Key Technologies of Apache Mesos 295

9.7.5 Comparison of Mesos and YARN 304

9.8 Summary 305

Chapter 10 Storage is the Basics 306

10.1 A long-term relationship must be combined, and a long-term relationship must be divided 306

10.2 Development of Storage Hardware 306

10.2.1 The working principle of mechanical hard disk 306

10.2.2 The principle of SSD 307

10.2.3 3DXPoint 309

10.2.4 Summary of Hardware Development 309

10.3 Storing key metrics 309

10.4 RAID Technology 309

10.5 Storage Interface 310

10.5.1 The file interface 311

10.5.2 Raw Devices 311

10.5.3 Object Interfaces 312

10.5.4 Block Interface 316

10.5.5 Convergence is a trend 328

10.6 Storage Acceleration Technology 328

10.6.1 Data Organization Techniques 328

10.6.2 Caching techniques 335

10.7 Summary 336

Chapter 11 Cloudification of Big Data 337

11.1 Cloud Computing Definition 337

11.2 Cloud application 337

11.2.1 Cloud Native Concepts 338

11.2.2 Microservice Architecture 338

11.2.3 Docker with Microservice Architecture 342

11.2.4 Summary of Cloud Application 348

11.3 Migrating Big Data to the Cloud 348

11.3.1 Two Models of Big Data Cloud Services 348

11.3.2 Cluster Mode AWSEMR 349

11.3.3 Service Mode Azure Data Lake Analytics 352

11.4 Summary 354

Part III Big Data Culture

Chapter 12 The Big Data Technology Development Culture 356

12.1 Open Source Culture 356

12.2 DevOps Philosophy 356

12.2.1 Combinations of Development and Operations 357

12.2.2 Impact on Application Release 357

12.2.3 Problems encountered 358

12.2.4 Coordinator 358

12.2.5 The key to success 359

12.3 Speed ​​is more important than you think 359

12.4 Summary 361 

 


 

 
 

WeChat scan and
follow the public account

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326571513&siteId=291194637