Big data has gradually warmed up in recent years, and people often ask why big data is important. We are in an era of data explosion,
A large number of emerging smartphones, tablets, wearable devices and IoT devices are generating new data every moment.
A big data solution usually contains multiple important components, from hardware layers such as storage, computing and networking, to data processing
The engine, and then the analysis layer that uses improved statistical and calculation algorithms and data visualization to obtain business insights. In the middle,
The data processing engine plays a very important role. It is no exaggeration to say that data processing engines treat big data like a CPU
To the computer, or the brain to humans.
Get information
Executive summary
This book is jointly created by Spark developers and core members, and explains the efficient and fast
A tool for quickly analyzing and processing data-Spark, which leads the reader to quickly master the collection, calculation, simplification and preservation of the sea with Spark
The method of measuring data, learn to interact, iterate and incremental analysis, solve the problems of partitioning, data localization and custom serialization.
This book is suitable for everyone who needs data analysis in the era of big data.
Features of this book
The structure of this book is clear, and the chapters are organized in the order of reading from front to back. At the beginning of each chapter, we will say
It is clear which subsections in this chapter are more important for data scientists, and which subsections are more useful for engineers. words
Nevertheless, we hope that all the content in the book will be helpful to both types of readers.
The first two chapters will take you to get started, let you build a basic Spark on your computer, and allow you to use Spark
There is a basic concept of what can be done. After we understand the Spark target and Spark installation, we will start
Re-introduce the Spark shell. The Spark shell is a very useful tool when developing prototypes of Spark applications. The following chapters will be detailed
Introduce the Spark API, how to run Spark applications on the cluster, and the higher-level library provided by Spark
Support, such as SQL (database support) and MLlib (machine learning library).
Directory screenshot
Advanced Spark programming
Run Spark on the cluster
Spark Streaming
Machine learning based on MLlib