kubeflow comes with pipeline [Demo] XGBoost - Iterative model training startup failure solution

Preface

This tutorial is only for students whose domestic servers cannot link to gcr.iothe Google library. Students whose server environment can directly access gcr.io can bypass it.
This tutorial is not only to solve this problem, but also to provide you with an idea. If you encounter similar problems, you will know how to start troubleshooting. Of course, some masters can read through the source code, so I am just a fool. .

Blogger kubeflow manifests installation, version: 1.6.1, the latest version.

Problem Description

gcr.ioI believe that everyone can successfully install the official version of kubeflow even if it is inaccessible . You should all understand that a large part of the container images on this platform come from gcr.io. As for how to ultimately solve the image problem, I believe everyone has their own tips (anyway, I believe you There is a way to get gcr.iothe image of the library).

After going through all kinds of hardships and hardships, I saw that all the pods were running. I was very excited and took the time to log in to the web.
oh? A demo was also provided? Try it out:
Insert image description here
After creating it and running it, you found: Huh? Why is it stuck?
Insert image description here
Click here to take a look:

Insert image description here
This step is in Pending state with this message: ImagePullBackOff: Back-off pulling image "gcr.io/google-containers/busybox"
Okay, another gcr.iopot.
I have experience with this. I have installed the entire kubeflow, but I still need a busybox image? Routine process, get the image, load it to the server, delete the pod, wait for initialization... After a while... Should I go? Why is this error still happening? There is evil spirit!
Click on the pod and take a look: Anyone
Insert image description here
who plays K8S will understand. The pull strategy is that they don’t use your local image at all, they just go to the image library to pull the latest one. Then I'll go to the server kubectl edit pod headquarters and I'll change it for you . Feel sorry! This pod is very arrogant and you are not allowed to change it. Ah... I can't connect to the mirror repository, and you won't let me change the pull strategy. I also can't change the mirror repository. What should I do?Always
IfNotPresent

solution

Since it is a pipeline, it must be created based on what, the configuration file? Pipeline file? As long as it is configured, it is well documented. First, I looked through the source code of this pipeline. The pipeline source code address is in the demo description. After looking over and over, I found no relevant description
Insert image description here
in the source code and its related files . That means this is the native configuration of kubeflow, not the pipeline. gcr.io/google-containers/busybox, then go through the source code, and the result is really in the kubeflow file
Insert image description here
. This is a configmap resource file namedpipeline-install-config

Insert image description here
Then we go to the server and look at this configmap

kubectl edit configmaps pipeline-install-config -n kubeflow
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
  ConMaxLifeTime: 120s
  appName: pipeline
  appVersion: 2.0.0-alpha.5
  autoUpdatePipelineDefaultVersion: "true"
  bucketName: mlpipeline
  cacheDb: cachedb
  cacheImage: gcr.io/google-containers/busybox
  cacheNodeRestrictions: "false"
  cronScheduleTimezone: UTC
  dbHost: mysql
  dbPort: "3306"
  defaultPipelineRoot: ""
  mlmdDb: metadb
  pipelineDb: mlpipeline
  warning: |
    1. Do not use kubectl to edit this configmap, because some values are used
    during kustomize build. Instead, change the configmap and apply the entire
    kustomize manifests again.
    2. After updating the configmap, some deployments may need to be restarted
    until the changes take effect. A quick way to restart all deployments in a
    namespace: `kubectl rollout restart deployment -n <your-namespace>`.
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"ConMaxLifeTime":"120s","appName":"pipeline","appVersion":"2.0.0-alpha.5","autoUpdatePipelineDefaultVersion":"true","bucketName":"mlpipeline","cacheDb":"cachedb","cacheImage":"gcr.io/google-containers/busybox","cacheNodeRestrictions":"false","cronScheduleTimezone":"UTC","dbHost":"mysql","dbPort":"3306","defaultPipelineRoot":"","mlmdDb":"metadb","pipelineDb":"mlpipeline","warning":"1. Do not use kubectl to edit this configmap, because some values are used\nduring kustomize build. Instead, change the configmap and apply the entire\nkustomize manifests again.\n2. After updating the configmap, some deployments may need to be restarted\nuntil the changes take effect. A quick way to restart all deployments in a\nnamespace: `kubectl rollout restart deployment -n \u003cyour-namespace\u003e`.\n"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"ml-pipeline","app.kubernetes.io/name":"kubeflow-pipelines","application-crd-id":"kubeflow-pipelines"},"name":"pipeline-install-config","namespace":"kubeflow"}}
  creationTimestamp: "2022-10-25T08:51:08Z"
  labels:
    app.kubernetes.io/component: ml-pipeline
    app.kubernetes.io/name: kubeflow-pipelines
    application-crd-id: kubeflow-pipelines
  name: pipeline-install-config
  namespace: kubeflow
  resourceVersion: "16139565"
  uid: 9ce56da0-48f1-497b-897d-876ffc974892

Now that we have found this, the rest is relatively simple. First of all, we have got the native image. gcr.io/google-containers/busyboxI only need to tag this image, upload it to my own image library, and then change the image in the configmap to my own image. .
Do you think the pipeline can be run after these are completed? No, you also need to check which deployments have loaded this configuration file. After
Insert image description here
sorting it out:
ml-pipeline-scheduledworkflow
ml-pipeline
metadata-grpc-deployment
kubeflow-pipelines-profile-controller
cache-server
Then open one and take a look. Well, it is loaded in the way of env.
Insert image description here
Students who are familiar with configmap should know that the contents of the configmap are loaded in the way of environment variables. , the new env will not take effect until the pod is restarted.
Restart these pods one by one.
Okay, let’s run the web interface and take a look.
Insert image description here

Guess you like

Origin blog.csdn.net/Mrheiiow/article/details/127964982