k8s hudi テーブルのクイック テスト ガイド

ubuntuにnfsサービスをインストールする

sudo apt-get install nfs-kernel-server

sudo vim /etc/exports

/data1/nfs/rootfs *(rw,sync,no_root_squash,no_subtree_check)

分析:

/data1/nfs/rootfs - NFS サーバー上のディレクトリ。NFS クライアントとの共有に使用されます。

*——すべてのネットワークセグメントからのアクセスを許可、または特定のIPを使用可能

rw - このディレクトリをマウントするクライアントには、共有ディレクトリに対する読み取りおよび書き込み権限があります。

sync - データはメモリとハードディスクに同期して書き込まれます

no_root_squash - root ユーザーはルート ディレクトリへの完全な管理アクセス権を持ちます。

no_subtree_check - 親ディレクトリの権限をチェックしません

NFSサービスを開始する

rpcbind サービスと nfs サービスを再起動します。nfs は RPC プログラムです。使用する前に、ポートをマップし、rpcbind を通じて設定する必要があります。

sudo /etc/init.d/rpcbind restart 
sudo /etc/init.d/nfs-kernel-server restart

k8s は、nfs サービスに対応する storageClass をデプロイします: nfs-client

helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/

helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
    --set nfs.server=192.168.49.1 \
    --set nfs.path=/data1/nfs/rootfs

NFS に基づいて minio をデプロイする

クラウド-PVC.yaml

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: minio-pvc
spec:
  storageClassName: nfs-client
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 102400Mi

minio-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: minio
  name: minio
  namespace: default # Change this value to match the namespace metadata.name
spec:
  containers:
  - name: minio
    image: quay.io/minio/minio:latest
    command:
    - /bin/bash
    - -c
    args: 
    - minio server /data --console-address :9090
    volumeMounts:
    - mountPath: /data
      name: nfsvolume # Corresponds to the `spec.volumes` Persistent Volume
  volumes:
  - name: nfsvolume
    persistentVolumeClaim:
      claimName: minio-pvc

minio-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    expose: "true"
    app: minio
  name: minio
  namespace: default
spec:
  type: NodePort
  ports:
  - name: http1
    port: 9000
    protocol: TCP
    nodePort: 30012
  - name: http2
    port: 9090
    protocol: TCP
    nodePort: 30013
  selector:
    app: minio

minio デプロイ Juicefs

Juicefs をデプロイするには、mysql もデプロイする必要があります

mysql-pvc.yaml

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: mysql-pvc
spec:
  storageClassName: nfs-client
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5G

mysql-deployment.yaml

apiVersion: v1
kind: ReplicationController
metadata:
  name: mysql-rc
  namespace: default
  labels:
    name: mysql-rc
spec:
  replicas: 1
  selector:
    name: mysql-rc
  template:
    metadata:
      labels:
        name: mysql-rc
    spec:
      containers:
        - name: mysql
          image: mysql:5.7.39
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 3306
          env:
            - name: MYSQL_ROOT_PASSWORD
              value: "root"
          volumeMounts:
            - name: mysql-persistent-storage
              mountPath: /var/lib/mysql          #MySQL容器的数据都是存在这个目录的,要对这个目录做数据持久化
      volumes:
        - name: mysql-persistent-storage
          persistentVolumeClaim:
            claimName: mysql-pvc                 #指定pvc的名称

mysql-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    expose: "true"
    name: mysql-rc
  name: mysql
spec:
  type: NodePort
  ports:
    - name: http
      port: 3306
      protocol: TCP
      nodePort: 30006
  selector:
    name: mysql-rc

Juicefs を使用して minio を初期化する

minio のデフォルトのアカウント パスワードは、minioadmin/minioadmin です。minio で Juicefs バケットを作成します。

Mysql のデフォルトのアカウントのパスワードは: root/root、minio で Juicefs データベースを作成します

juicefs format --storage=minio --bucket=http://192.168.1.2:9000/juicefs --access-key=minioadmin --secret-key=minioadmin mysql://root:root@(192.168.1.2:3306)/juicefs juicefsminio

フリンクタスクをデプロイする

core-site.xml を使用して、デフォルトの名前空間に configmap と core-site を作成します。

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

    <property>
        <name>fs.s3a.endpoint</name>
        <value>http://192.168.1.2:9000</value>
        <description>AWS S3 endpoint to connect to. An up-to-date list is
            provided in the AWS Documentation: regions and endpoints. Without this
            property, the standard region (s3.amazonaws.com) is assumed.
        </description>
    </property>

    <property>
        <name>fs.s3a.access.key</name>
        <value>PSBZMLL1NXZYCX55QMBI</value>
    </property>

    <property>
        <name>fs.s3a.secret.key</name>
        <value>CNACTHv4+fPHvYT7gwaKCyWR7K96zHXNU+f9yccJ</value>
    </property>

    <property>
        <name>fs.s3a.path.style.access</name>
        <value>true</value>
        <description>Enable S3 path style access ie disabling the default virtual hosting behaviour.
            Useful for S3A-compliant storage providers as it removes the need to set up DNS for virtual hosting.
        </description>
    </property>

    <property>
        <name>fs.s3a.aws.credentials.provider</name>
        <value>
            org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
        </value>
        <description>
            Comma-separated class names of credential provider classes which implement
            com.amazonaws.auth.AWSCredentialsProvider.

            When S3A delegation tokens are not enabled, this list will be used
            to directly authenticate with S3 and DynamoDB services.
            When S3A Delegation tokens are enabled, depending upon the delegation
            token binding it may be used
            to communicate with the STS endpoint to request session/role
            credentials.

            These are loaded and queried in sequence for a valid set of credentials.
            Each listed class must implement one of the following means of
            construction, which are attempted in order:
            * a public constructor accepting java.net.URI and
            org.apache.hadoop.conf.Configuration,
            * a public constructor accepting org.apache.hadoop.conf.Configuration,
            * a public static method named getInstance that accepts no
            arguments and returns an instance of
            com.amazonaws.auth.AWSCredentialsProvider, or
            * a public default constructor.

            Specifying org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider allows
            anonymous access to a publicly accessible S3 bucket without any credentials.
            Please note that allowing anonymous access to an S3 bucket compromises
            security and therefore is unsuitable for most use cases. It can be useful
            for accessing public data sets without requiring AWS credentials.

            If unspecified, then the default list of credential provider classes,
            queried in sequence, is:
            * org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider: looks
            for session login secrets in the Hadoop configuration.
            * org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider:
            Uses the values of fs.s3a.access.key and fs.s3a.secret.key.
            * com.amazonaws.auth.EnvironmentVariableCredentialsProvider: supports
            configuration of AWS access key ID and secret access key in
            environment variables named AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,
            and AWS_SESSION_TOKEN as documented in the AWS SDK.
            * org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider: picks up
            IAM credentials of any EC2 VM or AWS container in which the process is running.
        </description>
    </property>

    <property>
        <name>fs.defaultFS</name>
        <value>jfs://juicefsminio/hudi-dir</value>
        <description>Optional, you can also specify full path "jfs://myjfs/path-to-dir" with location to use JuiceFS</description>
    </property>
    <property>
        <name>fs.jfs.impl</name>
        <value>io.juicefs.JuiceFileSystem</value>
    </property>
    <property>
        <name>fs.AbstractFileSystem.jfs.impl</name>
        <value>io.juicefs.JuiceFS</value>
    </property>
    <property>
        <name>juicefs.meta</name>
        <value>mysql://root:root@(192.168.1.2:3306)/juicefs</value>
    </property>
    <property>
        <name>juicefs.cache-dir</name>
        <value>/tmp/juicefs-cache-dir</value>
    </property>
    <property>
        <name>juicefs.cache-size</name>
        <value>1024</value>
    </property>
    <property>
        <name>juicefs.access-log</name>
        <value>/tmp/juicefs.access.log</value>
    </property>


</configuration>

core-site configmap と flink-kubernetes-operator に基づいて flink タスクを作成する

タスク名がbasic-exampleの場合は、上記のcore-site.xmlに基づいてhadoop-config-basic-example構成マップを作成する必要もあります。

apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: basic-example
spec:
  image: xiaozhch5/flink-sql-submit:hudi-0.12-juicefs
  flinkVersion: v1_15
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "2"
    s3.endpoint: "http://192.168.1.2:9000"
    s3.path.style.access: "true"
    s3.access.key: "PSBZMLL1NXZYCX55QMBI"
    s3.secret.key: "CNACTHv4+fPHvYT7gwaKCyWR7K96zHXNU+f9yccJ"
    state.backend.incremental: "true"
    execution.checkpointing.interval: "300000ms"
    state.savepoints.dir: "s3://flink-data/savepoints"
    state.checkpoints.dir: "s3://flink-data/checkpoints"
  serviceAccount: flink
  jobManager:
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    resource:
      memory: "2048m"
      cpu: 2
  job:
    jarURI: local:///opt/flink/lib/flink-sql-submit-1.0.jar
    args: ["-f", "s3://flink-tasks/k8s-flink-sql-test.sql", "-m", "streaming", "-e", "http://192.168.1.2:9000", "-a", "PSBZMLL1NXZYCX55QMBI", "-s", "CNACTHv4+fPHvYT7gwaKCyWR7K96zHXNU+f9yccJ"]
    parallelism: 2
    upgradeMode: stateless
  podTemplate:
    spec:
      containers:
        - name: flink-main-container
          volumeMounts:
            - mountPath: /opt/hadoop/etc/hadoop/
              name: core-site
      volumes:
        - name: core-site
          configMap:
            name: core-site

flink SQL タスクは次のとおりです。

CREATE TABLE Orders (
    order_number BIGINT,
    price        DECIMAL(32,2),
    order_time   TIMESTAMP(3),
    PRIMARY KEY (order_number) NOT ENFORCED
) WITH (
  'connector' = 'datagen',
  'rows-per-second' = '10'
);

CREATE TABLE Orders_hudi (
    order_number BIGINT,
    price        DECIMAL(32,2),
    order_time   TIMESTAMP(3),
    PRIMARY KEY (order_number) NOT ENFORCED
) WITH (
  'connector' = 'hudi',
  'path' = 'jfs://juicefsminio/orders_hudi_2',
  'table.type' = 'MERGE_ON_READ'
);

insert into Orders_hudi select * from Orders;

おすすめ

転載: blog.csdn.net/weixin_39636364/article/details/128343059