It took almost a year and a half, sacrificed every weekend, and spent a lot of effort, and finally completed the first book in my life, "Detailed Explanation of Big Data Architecture: From Data Acquisition to Deep Learning". The whole process was actually quite painful, and I often wanted to give up, but fortunately I persevered.
Looking back on my 500 days, I often ask myself two questions:
1) I asked myself why I chose to write a book on big data technology, and what kept me going?
I feel that the more reason is that I have practiced big data architecture and technology for so many years, I have a certain level of understanding of the technology, and I have many words in my heart, so I need to find a place to express it completely.
2) Do big data practitioners or students and readers who are interested in big data need a book?
People often mistakenly think that big data is a single-point technology. In fact, big data technology is a technology family. Everyone needs a book that completely introduces the technology of big data.
With the answers to these two questions, the idea of the book and the idea of the theme are there. So I want to write a book that introduces the end-to-end knowledge points in the field of big data processing from the three dimensions of architecture, business, and technology.
The main content of the book consists of three parts: the first part introduces the origin, development, key technical points and future trends of big data technology from the perspective of data generation, collection, calculation, storage, and consumption end-to-end, combined with vivid industry latest products, and The latest research directions and achievements in academia make the esoteric technology easy to understand; the second part introduces practical cases from the perspective of business and technology, so that readers can understand the use of big data and the nature of the technology; the third part introduces that big data technology is not isolated , explain how to combine with cutting-edge cloud technology, deep learning, machine learning, etc.
(Finally, make an advertisement. If you are interested in this book, you can go to JD.com, Taobao, Dangdang, Amazon, Interactive Publishing House, etc. to book, JD.com has stock https://item.jd.com/10826699444.html , thank you for your support!)
The content of this book is as follows, let's see if there is any content that interests you:
Part 1 The nature of big data
Chapter 1 What is Big Data 2
1.1 Introduction to Big Data 2
1.1.1 A brief history of big data 2
1.1.2 Status Quo of Big Data 3
1.1.3 Big Data and BI 3
1.2 Enterprise Data Assets 4
1.3 Big Data Challenge 5
1.3.1 Cost challenges 6
1.3.2 Real-time challenges 6
1.3.3 Security challenges 6
1.4 Summary 6
Chapter 2 Operator Big Data Architecture 7
2.1 Architecture-Driven Factors 7
2.2 Big Data Platform Architecture 7
2.3 Platform Development Trend 8
2.4 Summary 8
Chapter 3 Operator Big Data Services 9
3.1 Common Big Data Services of Operators 9
3.1.1 SQM (Operation and Maintenance Quality Management) 9
3.1.2 CSE (Customer Experience Enhancement) 9
3.1.3 MSS (Market Operation and Maintenance Support) 10
3.1.4 DMP (Data Management Platform) 10
3.2 Summary 11
The second part of big data technology
Chapter 4 Data Acquisition 14
4.1 Data Classification 14
4.2 Data Acquisition Components 14
4.3 Probe 15
4.3.1 Probe principle 15
4.3.2 Key Capabilities of Probes 16
4.4 Web Scraping 26
4.4.1 Web Crawler 26
4.4.2 Simple crawler Python code example 32
4.5 Log Collection 33
4.5.1 Flume 33
4.5.2 Other log collection components 47
4.6 Data Distribution Middleware 47
4.6.1 The role of data distribution middleware 47
4.6.2 Kafka Architecture and Principles 47
4.7 Summary 82
Chapter 5 Stream Processing 83
5.1 Arithmetic 83
5.2 The concept of streams 83
5.3 Application Scenarios of Streams 84
5.3.1 Financial Sector 84
5.3.2 Telecommunications 85
5.4 Two Typical Streaming Engines in the Industry 85
5.4.1 Storm 85
5.4.2 Spark Streaming 89
5.4.3 Fusion Framework 102
5.5 CEP 108
5.5.1 What is CEP 108
5.5.2 Architecture of CEP 109
5.5.3 Esper 110
5.6 Combining Machine Learning in Real Time 110
5.6.1 Features of Eagle 111
5.6.2 Eagle Overview 111
5.7 Summary 116
Chapter 6 Interactive Analysis 117
6.1 The concept of interactive analysis 117
6.2 MPP DB Technology 118
6.2.1 The concept of MPP 118
6.2.2 A typical MPP database 121
6.2.3 MPP DB Tuning Practice 131
6.2.4 MPP DB applicable scenarios 162
6.3 SQL on Hadoop 163
6.3.1 Hive 163
6.3.2 Phoenix 165
6.3.3 Impala 166
6.4 Big Data Warehouses 167
6.4.1 The concept of a data warehouse 167
6.4.2 OLTP/OLAP comparison 168
6.4.3 Similarities and Differences in Big Data Scenarios 168
6.4.4 The query engine 169
6.4.5 Storage Engines 170
6.5 Summary 171
Chapter 7 Batch Processing Techniques 172
7.1 Concepts of Batch Technology 172
7.2 MPP DB technology 172
7.3 The MapReduce Programming Framework 173
7.3.1 The origin of MapReduce 173
7.3.2 Principles of MapReduce 173
7.3.3 Shuffle 174
7.3.4 The main reasons for poor performance 177
7.4 Spark Architecture and Principles 177
7.4.1 The origin and characteristics of Spark 177
7.4.2 Core Concepts of Spark 178
7.5 The BSP Framework 217
7.5.1 What is the BSP Model 217
7.5.2 Introduction to the Parallel Model 218
7.5.3 Fundamentals of the BSP Model 220
7.5.4 Features of the BSP Model 222
7.5.5 Evaluation of the BSP Model 222
7.5.6 Comparison of BSP and MapReduce 222
7.5.7 Implementation of the BSP Model 223
7.5.8 Introduction to Apache Hama 223
7.6 Key Technologies of Batch Processing 227
7.6.1 CodeGen 227
7.6.2 CPU Affinity Technology 228
7.7 Summary 229
Chapter 8 Machine Learning and Data Mining 230
8.1 Connections and Differences Between Machine Learning and Data Mining 230
8.2 Typical Data Mining and Machine Learning Processes 231
8.3 Overview of Machine Learning 232
8.3.1 Learning styles 232
8.3.2 Algorithmic Similarity 233
8.4 Machine Learning & Data Mining Application Cases 235
8.4.1 The story of diapers and beer 235
8.4.2 Decision tree for rapid fault location in telecommunication field 236
8.4.3 The field of image recognition 236
8.4.4 Natural Language Recognition 238
8.5 Interactive Analysis 239
8.6 Deep Learning 240
8.6.1 Overview of Deep Learning 240
8.6.2 The context of machine learning 241
8.6.3 The Human Brain Vision Mechanism 242
8.6.4 About Features 244
8.6.5 How Many Features Are Needed 245
8.6.6 The basic idea of deep learning 246
8.6.7 Shallow and deep learning 246
8.6.8 Deep Learning and Neural Networks 247
8.6.9 The training process of deep learning 248
8.6.10 A framework for deep learning 248
8.6.11 Deep Learning and GPUs 255
8.6.12 Deep Learning Summary and Outlook 256
8.7 Summary 257
Chapter 9 Resource Management 258
9.1 Basic Concepts of Resource Management 258
9.1.1 Goals and Values of Resource Scheduling 258
9.1.2 Use Restrictions and Difficulties of Resource Scheduling 258
9.2 A Resource Scheduling Framework in the Hadoop Domain 259
9.2.1 YARN 259
9.2.2 Borg 260
9.2.3 Omega 262
9.2.4 Summary of this section 263
9.3 Resource Allocation Algorithms 263
9.3.1 The role of algorithms 263
9.3.2 Analysis of Several Scheduling Algorithms 263
9.4 Data Center Unified Resource Scheduling 271
9.4.1 Mesos+Marathon Architecture and Principles 271
9.4.2 Mesos+Marathon Summary 283
9.5 Multitenancy Technologies 284
9.5.1 The concept of multi-tenancy 284
9.5.2 Multi-tenancy scenarios 284
9.6 Intelligent Scheduling Based on Application Description 287
9.7 Apache Mesos Architecture and Principles 288
9.7.1 Apache Mesos background 288
9.7.2 Overall Architecture of Apache Mesos 288
9.7.3 How Apache Mesos Works 290
9.7.4 Key Technologies of Apache Mesos 295
9.7.5 Comparison of Mesos and YARN 304
9.8 Summary 305
Chapter 10 Storage is the Basics 306
10.1 A long-term relationship must be combined, and a long-term relationship must be divided 306
10.2 Development of Storage Hardware 306
10.2.1 The working principle of mechanical hard disk 306
10.2.2 The principle of SSD 307
10.2.3 3DXPoint 309
10.2.4 Summary of Hardware Development 309
10.3 Storing key metrics 309
10.4 RAID Technology 309
10.5 Storage Interface 310
10.5.1 The file interface 311
10.5.2 Raw Devices 311
10.5.3 Object Interfaces 312
10.5.4 Block Interface 316
10.5.5 Convergence is a trend 328
10.6 Storage Acceleration Technology 328
10.6.1 Data Organization Techniques 328
10.6.2 Caching techniques 335
10.7 Summary 336
Chapter 11 Cloudification of Big Data 337
11.1 Cloud Computing Definition 337
11.2 Cloud application 337
11.2.1 Cloud Native Concepts 338
11.2.2 Microservice Architecture 338
11.2.3 Docker with Microservice Architecture 342
11.2.4 Summary of Cloud Application 348
11.3 Migrating Big Data to the Cloud 348
11.3.1 Two Models of Big Data Cloud Services 348
11.3.2 Cluster Mode AWSEMR 349
11.3.3 Service Mode Azure Data Lake Analytics 352
11.4 Summary 354
Part III Big Data Culture
Chapter 12 The Big Data Technology Development Culture 356
12.1 Open Source Culture 356
12.2 DevOps Philosophy 356
12.2.1 Combinations of Development and Operations 357
12.2.2 Impact on Application Release 357
12.2.3 Problems encountered 358
12.2.4 Coordinator 358
12.2.5 The key to success 359
12.3 Speed is more important than you think 359
12.4 Summary 361
WeChat scan and
follow the public account