I. Overview
Before Hadoop 2.0.0, a cluster had only one Namenode, which would face the single point of failure problem. If the Namenode machine dies, the entire cluster becomes useless. The cluster can only be recovered by restarting the Namenode. In addition, when the cluster is normally planned to be maintained, the entire cluster must be deactivated first, so there is no way to achieve a 7 * 24-hour availability state. Hadoop 2.0 and later versions added the Namenode high availability mechanism. Here we mainly talk about the deployment of Hadoop HA on k8s environment.
For non-highly available k8s environments, please refer to my article: [Cloud Native] Hadoop on k8s environment deployment
For highly available non-k8s environments, please refer to my article: Big Data Hadoop - Hadoop 3.3.4 HA (High Availability) Principle and Implementation (QJM)
HDFS YARN
2. Start deployment
Here is a transformation based on non-high-availability orchestration. For those who don't know, you can read my article above.
1) Add journalNode orchestration
1. Controller Statefulset
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: {{ include "hadoop.fullname" . }}-hdfs-jn
annotations:
checksum/config: {{ include (print $.Template.BasePath "/hadoop-configmap.yaml") . | sha256sum }}
labels:
app.kubernetes.io/name: {{ include "hadoop.name" . }}
helm.sh/chart: {{ include "hadoop.chart" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: hdfs-jn
spec:
selector:
matchLabels:
app.kubernetes.io/name: {{ include "hadoop.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: hdfs-jn
serviceName: {{ include "hadoop.fullname" . }}-hdfs-jn
replicas: {{ .Values.hdfs.jounralNode.replicas }}
template:
metadata:
labels:
app.kubernetes.io/name: {{ include "hadoop.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: hdfs-jn
spec:
affinity:
podAntiAffinity:
{{- if eq .Values.antiAffinity "hard" }}
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: "kubernetes.io/hostname"
labelSelector:
matchLabels:
app.kubernetes.io/name: {{ include "hadoop.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: hdfs-jn
{{- else if eq .Values.antiAffinity "soft" }}
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 5
podAffinityTerm:
topologyKey: "kubernetes.io/hostname"
labelSelector:
matchLabels:
app.kubernetes.io/name: {{ include "hadoop.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: hdfs-jn
{{- end }}
terminationGracePeriodSeconds: 0
containers:
- name: hdfs-jn
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy | quote }}
command:
- "/bin/bash"
- "/opt/apache/tmp/hadoop-config/bootstrap.sh"
- "-d"
resources:
{{ toYaml .Values.hdfs.jounralNode.resources | indent 10 }}
readinessProbe:
tcpSocket:
port: 8485
initialDelaySeconds: 10
timeoutSeconds: 2
livenessProbe:
tcpSocket:
port: 8485
initialDelaySeconds: 10
timeoutSeconds: 2
volumeMounts:
- name: hadoop-config
mountPath: /opt/apache/tmp/hadoop-config
{{- range .Values.persistence.journalNode.volumes }}
- name: {{ .name }}
mountPath: {{ .mountPath }}
{{- end }}
securityContext:
runAsUser: {{ .Values.securityContext.runAsUser }}
privileged: {{ .Values.securityContext.privileged }}
volumes:
- name: hadoop-config
configMap:
name: {{ include "hadoop.fullname" . }}
{{- if .Values.persistence.journalNode.enabled }}
volumeClaimTemplates:
{{- range .Values.persistence.journalNode.volumes }}
- metadata:
name: {{ .name }}
labels:
app.kubernetes.io/name: {{ include "hadoop.name" $ }}
helm.sh/chart: {{ include "hadoop.chart" $ }}
app.kubernetes.io/instance: {{ $.Release.Name }}
app.kubernetes.io/component: hdfs-jn
spec:
accessModes:
- {{ $.Values.persistence.journalNode.accessMode | quote }}
resources:
requests:
storage: {{ $.Values.persistence.journalNode.size | quote }}
{{- if $.Values.persistence.journalNode.storageClass }}
{{- if (eq "-" $.Values.persistence.journalNode.storageClass) }}
storageClassName: ""
{{- else }}
storageClassName: "{{ $.Values.persistence.journalNode.storageClass }}"
{{- end }}
{{- end }}
{{- else }}
- name: dfs
emptyDir: {}
{{- end }}
{{- end }}
复制代码
2、service
# A headless service to create DNS records
apiVersion: v1
kind: Service
metadata:
name: {{ include "hadoop.fullname" . }}-hdfs-jn
labels:
app.kubernetes.io/name: {{ include "hadoop.name" . }}
helm.sh/chart: {{ include "hadoop.chart" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: hdfs-jn
spec:
ports:
- name: jn
port: {{ .Values.service.journalNode.ports.jn }}
protocol: TCP
{{- if and (eq .Values.service.journalNode.type "NodePort") .Values.service.journalNode.nodePorts.jn }}
nodePort: {{ .Values.service.journalNode.nodePorts.jn }}
{{- end }}
type: {{ .Values.service.journalNode.type }}
selector:
app.kubernetes.io/name: {{ include "hadoop.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: hdfs-jn
复制代码
2) Modify the configuration
1. Modify values.yaml
image:
repository: myharbor.com/bigdata/hadoop
tag: 3.3.2
pullPolicy: IfNotPresent
# The version of the hadoop libraries being used in the image.
hadoopVersion: 3.3.2
logLevel: INFO
# Select antiAffinity as either hard or soft, default is soft
antiAffinity: "soft"
hdfs:
nameNode:
replicas: 2
pdbMinAvailable: 1
resources:
requests:
memory: "256Mi"
cpu: "10m"
limits:
memory: "2048Mi"
cpu: "1000m"
dataNode:
# Will be used as dfs.datanode.hostname
# You still need to set up services + ingress for every DN
# Datanodes will expect to
externalHostname: example.com
externalDataPortRangeStart: 9866
externalHTTPPortRangeStart: 9864
replicas: 3
pdbMinAvailable: 1
resources:
requests:
memory: "256Mi"
cpu: "10m"
limits:
memory: "2048Mi"
cpu: "1000m"
webhdfs:
enabled: true
jounralNode:
replicas: 3
pdbMinAvailable: 1
resources:
requests:
memory: "256Mi"
cpu: "10m"
limits:
memory: "2048Mi"
cpu: "1000m"
yarn:
resourceManager:
pdbMinAvailable: 1
replicas: 2
resources:
requests:
memory: "256Mi"
cpu: "10m"
limits:
memory: "2048Mi"
cpu: "2000m"
nodeManager:
pdbMinAvailable: 1
# The number of YARN NodeManager instances.
replicas: 1
# Create statefulsets in parallel (K8S 1.7+)
parallelCreate: false
# CPU and memory resources allocated to each node manager pod.
# This should be tuned to fit your workload.
resources:
requests:
memory: "256Mi"
cpu: "500m"
limits:
memory: "2048Mi"
cpu: "1000m"
persistence:
nameNode:
enabled: true
storageClass: "hadoop-ha-nn-local-storage"
accessMode: ReadWriteOnce
size: 1Gi
local:
- name: hadoop-ha-nn-0
host: "local-168-182-110"
path: "/opt/bigdata/servers/hadoop-ha/nn/data/data1"
- name: hadoop-ha-nn-1
host: "local-168-182-111"
path: "/opt/bigdata/servers/hadoop-ha/nn/data/data1"
dataNode:
enabled: true
enabledStorageClass: false
storageClass: "hadoop-ha-dn-local-storage"
accessMode: ReadWriteOnce
size: 1Gi
local:
- name: hadoop-ha-dn-0
host: "local-168-182-110"
path: "/opt/bigdata/servers/hadoop-ha/dn/data/data1"
- name: hadoop-ha-dn-1
host: "local-168-182-110"
path: "/opt/bigdata/servers/hadoop-ha/dn/data/data2"
- name: hadoop-ha-dn-2
host: "local-168-182-110"
path: "/opt/bigdata/servers/hadoop-ha/dn/data/data3"
- name: hadoop-ha-dn-3
host: "local-168-182-111"
path: "/opt/bigdata/servers/hadoop-ha/dn/data/data1"
- name: hadoop-ha-dn-4
host: "local-168-182-111"
path: "/opt/bigdata/servers/hadoop-ha/dn/data/data2"
- name: hadoop-ha-dn-5
host: "local-168-182-111"
path: "/opt/bigdata/servers/hadoop-ha/dn/data/data3"
- name: hadoop-ha-dn-6
host: "local-168-182-112"
path: "/opt/bigdata/servers/hadoop-ha/dn/data/data1"
- name: hadoop-ha-dn-7
host: "local-168-182-112"
path: "/opt/bigdata/servers/hadoop-ha/dn/data/data2"
- name: hadoop-ha-dn-8
host: "local-168-182-112"
path: "/opt/bigdata/servers/hadoop-ha/dn/data/data3"
volumes:
- name: dfs1
mountPath: /opt/apache/hdfs/datanode1
hostPath: /opt/bigdata/servers/hadoop-ha/dn/data/data1
- name: dfs2
mountPath: /opt/apache/hdfs/datanode2
hostPath: /opt/bigdata/servers/hadoop-ha/dn/data/data2
- name: dfs3
mountPath: /opt/apache/hdfs/datanode3
hostPath: /opt/bigdata/servers/hadoop-ha/dn/data/data3
journalNode:
enabled: true
storageClass: "hadoop-ha-jn-local-storage"
accessMode: ReadWriteOnce
size: 1Gi
local:
- name: hadoop-ha-jn-0
host: "local-168-182-110"
path: "/opt/bigdata/servers/hadoop-ha/jn/data/data1"
- name: hadoop-ha-jn-1
host: "local-168-182-111"
path: "/opt/bigdata/servers/hadoop-ha/jn/data/data1"
- name: hadoop-ha-jn-2
host: "local-168-182-112"
path: "/opt/bigdata/servers/hadoop-ha/jn/data/data1"
volumes:
- name: jn
mountPath: /opt/apache/hdfs/journalnode
service:
nameNode:
type: NodePort
ports:
dfs: 9000
webhdfs: 9870
nodePorts:
dfs: 30900
webhdfs: 30870
dataNode:
type: NodePort
ports:
webhdfs: 9864
nodePorts:
webhdfs: 30864
resourceManager:
type: NodePort
ports:
web: 8088
nodePorts:
web: 30088
journalNode:
type: ClusterIP
ports:
jn: 8485
nodePorts:
jn: ""
securityContext:
runAsUser: 9999
privileged: true
复制代码
2. Modify hadoop/templates/hadoop-configmap.yaml
There are a lot of modified content, so I won't post it here. The git download address will be given at the bottom.
3) Start the installation
# 创建存储目录
mkdir -p /opt/bigdata/servers/hadoop-ha/{nn,dn,jn}/data/data{1..3}
chmod -R 777 -R /opt/bigdata/servers/hadoop-ha/{nn,dn,jn}
helm install hadoop-ha ./hadoop -n hadoop-ha --create-namespace
复制代码
Check
kubectl get pods,svc -n hadoop-ha -owide
复制代码
HDFS WEB-nn1:http://192.168.182.110:31870/dfshealth.html#tab-overview
HDFS WEB-nn2:http://192.168.182.110:31871/dfshealth.html#tab-overview YARN WEB-rm1:http://192.168.182.110:31088/cluster/cluster YARN WEB-rm2:http://192.168.182.110:31089/cluster/cluster
4) Test verification
kubectl exec -it hadoop-ha-hadoop-hdfs-nn-0 -n hadoop-ha -- bash
复制代码
5) Uninstall
helm uninstall hadoop-ha -n hadoop-ha
kubectl delete pod -n hadoop-ha `kubectl get pod -n hadoop-ha|awk 'NR>1{print $1}'` --force
kubectl patch ns hadoop-ha -p '{"metadata":{"finalizers":null}}'
kubectl delete ns hadoop-ha --force
rm -fr /opt/bigdata/servers/hadoop-ha/{nn,dn,jn}/data/data{1..3}/*
复制代码
git download address: gitee.com/hadoop-bigd…
Hadoop HA on k8s environment deployment is here first. There are not many descriptions here. If you have any questions, please leave me a message. There may be some areas that are not perfect. We will continue to improve and add other services on this basis. Continue to share articles related to [big data + cloud native], please wait patiently~