"Spark fast big data analysis" finally got it, the correct way to open the source code reading.

Big data has gradually warmed up in recent years, and people often ask why big data is important. We are in an era of data explosion,

A large number of emerging smartphones, tablets, wearable devices and IoT devices are generating new data every moment.

A big data solution usually contains multiple important components, from hardware layers such as storage, computing and networking, to data processing

The engine, and then the analysis layer that uses improved statistical and calculation algorithms and data visualization to obtain business insights. In the middle,

The data processing engine plays a very important role. It is no exaggeration to say that data processing engines treat big data like a CPU

To the computer, or the brain to humans.

 

 

Get information

Executive summary

This book is jointly created by Spark developers and core members, and explains the efficient and fast

A tool for quickly analyzing and processing data-Spark, which leads the reader to quickly master the collection, calculation, simplification and preservation of the sea with Spark

The method of measuring data, learn to interact, iterate and incremental analysis, solve the problems of partitioning, data localization and custom serialization.

This book is suitable for everyone who needs data analysis in the era of big data.

Features of this book

The structure of this book is clear, and the chapters are organized in the order of reading from front to back. At the beginning of each chapter, we will say

It is clear which subsections in this chapter are more important for data scientists, and which subsections are more useful for engineers. words

Nevertheless, we hope that all the content in the book will be helpful to both types of readers.

The first two chapters will take you to get started, let you build a basic Spark on your computer, and allow you to use Spark

There is a basic concept of what can be done. After we understand the Spark target and Spark installation, we will start

Re-introduce the Spark shell. The Spark shell is a very useful tool when developing prototypes of Spark applications. The following chapters will be detailed

Introduce the Spark API, how to run Spark applications on the cluster, and the higher-level library provided by Spark

Support, such as SQL (database support) and MLlib (machine learning library).

Directory screenshot

 

 

Advanced Spark programming

 

 

 

Run Spark on the cluster

 

 

 

Spark Streaming

 

 

 

Machine learning based on MLlib

 

Guess you like

Origin www.cnblogs.com/yunxi520/p/12674555.html
Recommended