机器学习框架Ray -- 2.2 Ray Core计算π

基于Ray Core创建并行计算节点，通过蒙特卡罗估算圆周率

本教程展示了如何使用蒙特卡罗方法来估算π的值。该方法通过在2x2正方形内随机采样点来实现。我们可以利用这些点中包含在以原点为中心的单位圆内的比例来估计圆的面积与正方形面积的比值。由于我们知道真实比值为π/4，因此可以将我们估算的比值乘以4来近似π的值。我们采样的点越多，计算出的近似值就越接近π的真实值。

本案例的环境为Anaconda建立的名为RayAIR的环境，具体创建过程以下链接的第2节：《机器学习框架Ray -- 2.1 Ray Clusters与Ray AIR的基本使用》

实现过程

1. 创建本地Ray实例

import ray
import math
import time
import random

ray.init()

将会出现以下提示，也可以在对应端口（例如下图中的http://127.0.0.1:8265/#/new/jobs/01000000）查看资源占用情况。

2. 定义Progress Actor

接下来，我们定义一个Ray actor，可以被采样任务调用以更新进度。Ray actor本质上是有状态的服务，任何拥有该Actor实例（句柄）的人都可以调用它的方法。

@ray.remote
# @ray.remote是一个装饰器，用于将Python类转换为Ray Actor类。通过在类定义前添加@ray.remote装饰器，
# 我们可以将该类标记为一个远程Actor类，该类可以在Ray集群的任何节点上创建实例，并可通过Actor句柄来访问。
class ProgressActor:
    def __init__(self, total_num_samples: int):
        self.total_num_samples = total_num_samples
        self.num_samples_completed_per_task = {}

    def report_progress(self, task_id: int, num_samples_completed: int) -> None:
        self.num_samples_completed_per_task[task_id] = num_samples_completed

    def get_progress(self) -> float:
        return (
            sum(self.num_samples_completed_per_task.values()) / self.total_num_samples
        )
# report_progress()方法用于更新每个采样任务的进度
# get_progress()方法用于获取整体进度。
# 这些方法将在集群上的Actor实例上调用，以便更新Actor的状态和返回更新后的结果。

3. 定义采样任务

在定义了我们的Actor之后，现在我们定义一个Ray任务来执行采样，采样数量为num_samples，并返回在圆内的样本数量。Ray任务是无状态的函数。它们异步执行，并可以并行运行。

# 将普通的Python函数转换为Ray任务，我们需要使用ray.remote装饰器。
# 采样任务将进度Actor句柄作为输入，并向其报告进度。上述代码展示了从任务中调用Actor方法的示例。
@ray.remote
def sampling_task(num_samples: int, task_id: int,
                  progress_actor: ray.actor.ActorHandle) -> int:
    num_inside = 0
    for i in range(num_samples):
        x, y = random.uniform(-1, 1), random.uniform(-1, 1)
        if math.hypot(x, y) <= 1:
            num_inside += 1

        # Report progress every 1 million samples.
        if (i + 1) % 1_000_000 == 0:
            # This is async.
            progress_actor.report_progress.remote(task_id, i + 1)

    # Report the final progress.
    progress_actor.report_progress.remote(task_id, num_samples)
    return num_inside

4. 创建Progress Actor

一旦Actor被定义，我们就可以创建它的实例。要创建进度Actor的实例，只需使用带有构造函数参数的ActorClass.remote()方法调用即可。这将在远程工作进程上创建和运行Actor。ActorClass.remote(...)的返回值是Actor句柄，可用于调用其方法。

NUM_SAMPLING_TASKS = 20 表示并行计算CPU为20个
NUM_SAMPLES_PER_TASK 表示蒙特卡洛随机采样次数

# Change this to match your cluster scale.
NUM_SAMPLING_TASKS = 20
NUM_SAMPLES_PER_TASK = 10_000_000 * 10
TOTAL_NUM_SAMPLES = NUM_SAMPLING_TASKS * NUM_SAMPLES_PER_TASK

# Create the progress actor.
progress_actor = ProgressActor.remote(TOTAL_NUM_SAMPLES)

其输出为过程进度：

Progress: 0%
Progress: 0%
Progress: 1%
Progress: 2%
...
Progress: 99%
Progress: 99%
Progress: 100%

5. 执行采样任务

现在，任务已经定义好了，我们可以异步执行它。通过调用remote()方法并传递参数来执行采样任务。这将立即返回一个ObjectRef作为future，然后在远程工作进程上异步执行函数。

# Create and execute all sampling tasks in parallel.
results = [
    sampling_task.remote(NUM_SAMPLES_PER_TASK, i, progress_actor)
    for i in range(NUM_SAMPLING_TASKS)
]

6. 调用Progress Actor

在采样任务运行时，我们可以定期调用Actor的get_progress()方法来查询进度。要调用Actor方法，请使用actor_handle.method.remote()。此调用将立即返回一个ObjectRef作为future，并在远程Actor进程上异步执行方法。要获取ObjectRef的实际返回值，我们使用阻塞的ray.get()方法。

# Query progress periodically.
while True:
    progress = ray.get(progress_actor.get_progress.remote())
    print(f"Progress: {int(progress * 100)}%")

    if progress == 1:
        break

    time.sleep(1)

7. 验证计算π

最后，我们从远程采样任务中获取圆内采样点数并计算π。除了单个ObjectRef外，ray.get()还可以接受ObjectRef列表并返回结果列表。

# Get all the sampling tasks results.
total_num_inside = sum(ray.get(results))
pi = (total_num_inside * 4) / TOTAL_NUM_SAMPLES
print(f"Estimated value of π is: {pi}")

得到的结果示例如下：

Estimated value of π is: 3.14154815

计算精度较高，达到了预期目的。