Install nfs service on ubuntu
sudo apt-get install nfs-kernel-server
sudo vim /etc/exports
/data1/nfs/rootfs *(rw,sync,no_root_squash,no_subtree_check)
Analysis:
/data1/nfs/rootfs - directory on the NFS server, used for sharing with nfs clients
*——Allows access from all network segments, or specific IPs can be used
rw - Clients mounting this directory have read and write permissions on the shared directory
sync - data is written to memory and hard disk synchronously
no_root_squash - The root user has full administrative access to the root directory
no_subtree_check - Do not check the permissions of the parent directory
Start nfs service
Restart the rpcbind service and nfs service. nfs is an RPC program. Before using it, you need to map the port and set it through rpcbind.
sudo /etc/init.d/rpcbind restart
sudo /etc/init.d/nfs-kernel-server restart
k8s deploys the storageClass corresponding to the nfs service: nfs-client
helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
--set nfs.server=192.168.49.1 \
--set nfs.path=/data1/nfs/rootfs
Deploy minio based on nfs
crowd-pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: minio-pvc
spec:
storageClassName: nfs-client
accessModes:
- ReadWriteMany
resources:
requests:
storage: 102400Mi
minio-pod.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
app: minio
name: minio
namespace: default # Change this value to match the namespace metadata.name
spec:
containers:
- name: minio
image: quay.io/minio/minio:latest
command:
- /bin/bash
- -c
args:
- minio server /data --console-address :9090
volumeMounts:
- mountPath: /data
name: nfsvolume # Corresponds to the `spec.volumes` Persistent Volume
volumes:
- name: nfsvolume
persistentVolumeClaim:
claimName: minio-pvc
minio-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
expose: "true"
app: minio
name: minio
namespace: default
spec:
type: NodePort
ports:
- name: http1
port: 9000
protocol: TCP
nodePort: 30012
- name: http2
port: 9090
protocol: TCP
nodePort: 30013
selector:
app: minio
minio deploy juicefs
In order to deploy juicefs, you also need to deploy mysql
mysql-pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: mysql-pvc
spec:
storageClassName: nfs-client
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5G
mysql-deployment.yaml
apiVersion: v1
kind: ReplicationController
metadata:
name: mysql-rc
namespace: default
labels:
name: mysql-rc
spec:
replicas: 1
selector:
name: mysql-rc
template:
metadata:
labels:
name: mysql-rc
spec:
containers:
- name: mysql
image: mysql:5.7.39
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3306
env:
- name: MYSQL_ROOT_PASSWORD
value: "root"
volumeMounts:
- name: mysql-persistent-storage
mountPath: /var/lib/mysql #MySQL容器的数据都是存在这个目录的,要对这个目录做数据持久化
volumes:
- name: mysql-persistent-storage
persistentVolumeClaim:
claimName: mysql-pvc #指定pvc的名称
mysql-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
expose: "true"
name: mysql-rc
name: mysql
spec:
type: NodePort
ports:
- name: http
port: 3306
protocol: TCP
nodePort: 30006
selector:
name: mysql-rc
Initialize minio using juicefs
The default account password of minio is: minioadmin/minioadmin. Create a juicefs bucket in minio.
Mysql default account password is: root/root, create juicefs database in minio
juicefs format --storage=minio --bucket=http://192.168.1.2:9000/juicefs --access-key=minioadmin --secret-key=minioadmin mysql://root:root@(192.168.1.2:3306)/juicefs juicefsminio
Deploy flink tasks
Use core-site.xml to create configmap and core-site in default namespace
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.s3a.endpoint</name>
<value>http://192.168.1.2:9000</value>
<description>AWS S3 endpoint to connect to. An up-to-date list is
provided in the AWS Documentation: regions and endpoints. Without this
property, the standard region (s3.amazonaws.com) is assumed.
</description>
</property>
<property>
<name>fs.s3a.access.key</name>
<value>PSBZMLL1NXZYCX55QMBI</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>CNACTHv4+fPHvYT7gwaKCyWR7K96zHXNU+f9yccJ</value>
</property>
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
<description>Enable S3 path style access ie disabling the default virtual hosting behaviour.
Useful for S3A-compliant storage providers as it removes the need to set up DNS for virtual hosting.
</description>
</property>
<property>
<name>fs.s3a.aws.credentials.provider</name>
<value>
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
</value>
<description>
Comma-separated class names of credential provider classes which implement
com.amazonaws.auth.AWSCredentialsProvider.
When S3A delegation tokens are not enabled, this list will be used
to directly authenticate with S3 and DynamoDB services.
When S3A Delegation tokens are enabled, depending upon the delegation
token binding it may be used
to communicate with the STS endpoint to request session/role
credentials.
These are loaded and queried in sequence for a valid set of credentials.
Each listed class must implement one of the following means of
construction, which are attempted in order:
* a public constructor accepting java.net.URI and
org.apache.hadoop.conf.Configuration,
* a public constructor accepting org.apache.hadoop.conf.Configuration,
* a public static method named getInstance that accepts no
arguments and returns an instance of
com.amazonaws.auth.AWSCredentialsProvider, or
* a public default constructor.
Specifying org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider allows
anonymous access to a publicly accessible S3 bucket without any credentials.
Please note that allowing anonymous access to an S3 bucket compromises
security and therefore is unsuitable for most use cases. It can be useful
for accessing public data sets without requiring AWS credentials.
If unspecified, then the default list of credential provider classes,
queried in sequence, is:
* org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider: looks
for session login secrets in the Hadoop configuration.
* org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider:
Uses the values of fs.s3a.access.key and fs.s3a.secret.key.
* com.amazonaws.auth.EnvironmentVariableCredentialsProvider: supports
configuration of AWS access key ID and secret access key in
environment variables named AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,
and AWS_SESSION_TOKEN as documented in the AWS SDK.
* org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider: picks up
IAM credentials of any EC2 VM or AWS container in which the process is running.
</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>jfs://juicefsminio/hudi-dir</value>
<description>Optional, you can also specify full path "jfs://myjfs/path-to-dir" with location to use JuiceFS</description>
</property>
<property>
<name>fs.jfs.impl</name>
<value>io.juicefs.JuiceFileSystem</value>
</property>
<property>
<name>fs.AbstractFileSystem.jfs.impl</name>
<value>io.juicefs.JuiceFS</value>
</property>
<property>
<name>juicefs.meta</name>
<value>mysql://root:root@(192.168.1.2:3306)/juicefs</value>
</property>
<property>
<name>juicefs.cache-dir</name>
<value>/tmp/juicefs-cache-dir</value>
</property>
<property>
<name>juicefs.cache-size</name>
<value>1024</value>
</property>
<property>
<name>juicefs.access-log</name>
<value>/tmp/juicefs.access.log</value>
</property>
</configuration>
Create flink tasks based on core-site configmap and flink-kubernetes-operator
If the task name is basic-example, you also need to create hadoop-config-basic-example configmap based on the above core-site.xml
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: basic-example
spec:
image: xiaozhch5/flink-sql-submit:hudi-0.12-juicefs
flinkVersion: v1_15
flinkConfiguration:
taskmanager.numberOfTaskSlots: "2"
s3.endpoint: "http://192.168.1.2:9000"
s3.path.style.access: "true"
s3.access.key: "PSBZMLL1NXZYCX55QMBI"
s3.secret.key: "CNACTHv4+fPHvYT7gwaKCyWR7K96zHXNU+f9yccJ"
state.backend.incremental: "true"
execution.checkpointing.interval: "300000ms"
state.savepoints.dir: "s3://flink-data/savepoints"
state.checkpoints.dir: "s3://flink-data/checkpoints"
serviceAccount: flink
jobManager:
resource:
memory: "2048m"
cpu: 1
taskManager:
resource:
memory: "2048m"
cpu: 2
job:
jarURI: local:///opt/flink/lib/flink-sql-submit-1.0.jar
args: ["-f", "s3://flink-tasks/k8s-flink-sql-test.sql", "-m", "streaming", "-e", "http://192.168.1.2:9000", "-a", "PSBZMLL1NXZYCX55QMBI", "-s", "CNACTHv4+fPHvYT7gwaKCyWR7K96zHXNU+f9yccJ"]
parallelism: 2
upgradeMode: stateless
podTemplate:
spec:
containers:
- name: flink-main-container
volumeMounts:
- mountPath: /opt/hadoop/etc/hadoop/
name: core-site
volumes:
- name: core-site
configMap:
name: core-site
The flink sql task is:
CREATE TABLE Orders (
order_number BIGINT,
price DECIMAL(32,2),
order_time TIMESTAMP(3),
PRIMARY KEY (order_number) NOT ENFORCED
) WITH (
'connector' = 'datagen',
'rows-per-second' = '10'
);
CREATE TABLE Orders_hudi (
order_number BIGINT,
price DECIMAL(32,2),
order_time TIMESTAMP(3),
PRIMARY KEY (order_number) NOT ENFORCED
) WITH (
'connector' = 'hudi',
'path' = 'jfs://juicefsminio/orders_hudi_2',
'table.type' = 'MERGE_ON_READ'
);
insert into Orders_hudi select * from Orders;