1. Installation
Linux environment
uses pyspark and jupyter notebook as interactive tools.
See Spark Getting Started specifically .
2. The first program
Calculate the pi:
import random
num_samples = 100000000
def inside(p):
x, y = random.random(), random.random()
return x*x + y*y < 1
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
pi = 4 * count / num_samples
print(pi)
sc.stop()
operation result:
3.1417056
reference: