GitHub Actions' machine learning reasoning is online, promoting a high degree of automation in test deployment

GitHub Actions is an automated tool for building, testing, and deployment. Take an example to quickly understand what it is: every time you create a Pull Request (with a certain tag), it will trigger a new application to build, and then it can send a message to senior developers, allowing them to quickly view the code.

project address:

https://github.com/gaborvecsei/Machine-Learning-Inference-With-GitHub-Actions

What will we create?

Create a custom action and automated workflow on the repository, where you can use the trained model and trigger it when a new comment is made on a certain issue. You can also find model training and inference code. I want a super hardcore, so I chose the Iris dataset and random forest classifier. This tree ensemble model is trained to recognize flowers based on the length and width of the sepals and petals.

The training of this model is done on Jupyter Notebook. These codes train and serialize the model we will use for prediction. When the issue receives a comment, the GitHub Actions workflow will be triggered. If the comment contains the prefix /predict, then we start to parse the comment, and then we make a prediction and construct a response. In the last step, the message is sent back to the user by the robot under the same problem. In order to make things better, the entire custom operation will run in a Docker container.

image

We will find the steps in the workflow and create separate actions for some steps. A workflow can contain multiple operations, but in this project, we will use a single operation.

Create an operation

In the first step, we should create actions in the root folder named action.yaml. Here, we can describe the inputs, outputs, and operating environment.

name: 'Prediction GitHub Action Test'
description: 'This is a sample with which you can run inference on a ML model with a toy dataset'
inputs:
 issue_comment_body:
   required: true
   description: 'This is the Github issue comment message'
 issue_number:
   required: true
   description: 'Number of the Github issue'
 issue_user:
   required: true
   description: 'This user send the comment'
outputs:
  issue_comment_reply:
    description: 'Reply to the request'
runs:
  using: 'docker'
  image: 'Dockerfile'
   args:
       - ${{ inputs.issue_comment_body }}
       - ${{ inputs.issue_number }}
       - ${{ inputs.issue_user }}

From top to bottom, you can see 3 inputs and 1 output defined. Finally, the runs key describes the environment in which our code will run. This is a Docker container, and its input will be passed in as a parameter. Therefore, the entry point of the container should accept these 3 parameters in the defined order.

Container

如果我们仔细查看 Dockerfile,就可以看到我们的运行环境是如何构建的。首先,我们安装所有 Python 需要的东西。然后复制 entrypoint.sh 并使其可执行,这样它就可以在容器内运行了。最后,序列化的 sklearn 模型文件被复制到容器中,这样,我们就可以使用它来进行预测。(在真实的场景中,不应该将模型文件存储在存储库中。这只是为了可以快速演示。)

FROM python:3.6

# Install python requirements
COPY requirements.txt /requirements.txt
RUN pip install -r /requirements.txt

# Setup Docker entrypoint script
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

# Copy the trained model
COPY random_forest_model.pkl /random_forest_model.pkl

ENTRYPOINT ["/entrypoint.sh"]

定义工作流


没有工作流就不能使用操作。它定义了你希望在管道中采取的不同步骤。

name: Demo
on: [issue_comment]

jobs:
   my_first_job:
       runs-on: ubuntu-latest
       name: Just a simple demo job
       steps:
           - name: Checkout
             uses: actions/checkout@master
           - name: See full payload (for debugging)
             env:
                 PAYLOAD: ${{ toJSON(github.event) }}
             run: echo "FULL PAYLOAD:\n${PAYLOAD}\n"
           - name: Run the container and make a prediction
             if: startsWith(github.event.comment.body, '/predict')
             uses: ./
             id: make_prediction
             with:
                 issue_comment_body: ${{ github.event.comment.body }}
                 issue_number: ${{ github.event.issue.number }}
                 issue_user: ${{ github.event.comment.user.login }}
           - name: Print the output from the container(for debugging)
             run: echo "The reply message is ${{steps.make_prediction.outputs.issue_comment_reply}}"
           - name: Send reply to issue for user
             env:
               GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
             run: bash issue_comment.sh "steps.makeprediction.outputs.issuecommentreply""{{ github.event.issue.number }}"

首先,on: [issue_comment] 定义了我希望在接收到某个问题的评论(任何人提出的任何问题)时触发这个流。然后,我通过 runs-on: ubuntu-latest 定义了运行的 VM 类型(它可以是自托管的,也可以是由 GitHub 提供的)。接下来是有趣的部分,我之前提到的步骤。

  • 签出步骤:在这个步骤中,我们将移到存储库中所需的分支上(这也是一个 github 操作)。
  • 查看有效负载:我在这里把它用于调试。在问题下收到评论后,它显示整个有效负载,包括这个容器、评论、问题编号、留下评论的用户等等。
  • 做出预测:这是我们的自定义动作。代码行 if: startsWith(github.event.comment.body,'/predict') 确保只有在出现有效的预测请求时才运行这个步骤(包含前缀 /predict)。你可以看到,输入是在 with 关键字下定义的,而值是通过它们的键(如 github.event.comment.body)从负载中添加的。
  • 打印应答:构造的应答被回显到日志。它使用前面的步骤中定义的输出:steps.make_prediction.output .issue_comment_reply。
  • 发送应答:创建的应答中包含预测,将使用脚本 issue_comments .sh 作为应答发送。

Each step runs on the specified runner ubuntu-latest, but our operation runs in the created container. This container is built when the workflow is triggered. (I could have cached it so that the previously built image could be used every time the stream runs, but I was still too lazy to add it to this example).

Make predictions

There is one thing I haven't talked about: how are the predictions made? You can easily solve this problem by looking at the main.py script. model = load_model("/random_forest_model.pkl")

try:
   sepal_length, sepal_width, petal_length, petal_width = parse_comment_input(args.issue_comment_body)
   predicted_class_id = make_prediction(model, sepal_length, sepal_width, petal_length, petal_width)
   predicted_class_name = map_class_id_to_name(predicted_class_id)
   reply_message = f"Hey @{args.issue_user}!<br>This was your input: {args.issue_comment_body}.<br>The prediction: **{predicted_class_name}**"
except Exception as e:
   reply_message = f"Hey @{args.issue_user}! There was a problem with your input. The error: {e}"

print(f"::set-output name=issue_comment_reply::{reply_message}")

Seeing the above content, you may think this is too simple: input, data set, model, pattern storage, how to process the request, and so on. For example, for image input, you can decode from a base64 string and then run it through a deep learning model stored in GitLFS. So, let's get started.


Guess you like

Origin blog.51cto.com/15060462/2675558