Spark Operator ๋ž€?


Spark Operator ๋Š” Kubernetes ํ™˜๊ฒฝ์—์„œ Apache Spark Application ์„ ๋ฐฐํฌํ•˜๊ณ  ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ Custom Controller ๋‹ค.

๊ตฌ์„ฑ ์š”์†Œ

  1. Custom Resource Definitions (CRDs)
    • SparkApplication: ํ•˜๋‚˜์˜ Spark Application ์„ ์ •์˜
    • ScheduledSparkApplication: ์ฃผ๊ธฐ์ ์œผ๋กœ ์‹คํ–‰๋˜๋Š” Spark Application ์„ ์ •์˜
  2. Operator
    • CRD ๋ฅผ ๊ฐ์ง€ํ•˜๊ณ  ํ•„์š”ํ•œ K8s Object ๋ฅผ ์ƒ์„ฑ, ์—…๋ฐ์ดํŠธ, ์‚ญ์ œ
    • Spark Driver ์™€ Executor Pod ๋ฅผ ๊ด€๋ฆฌ
  3. Admission Webhook
    • K8s API ์„œ๋ฒ„์— ์š”์ฒญ์ด ๋„๋‹ฌํ•˜๊ธฐ ์ „์— ์š”์ฒญ์„ ๊ฐ€๋กœ์ฑ„์–ด ๋ฆฌ์†Œ์Šค์— ํ•„์š”ํ•œ ์„ค์ •์„ ์ฃผ์ž…
    • Spark Application Pod ๊ฐ€ ์ƒ์„ฑ๋  ๋•Œ ํ™˜๊ฒฝ ๋ณ€์ˆ˜, ๋ณผ๋ฅจ ๋งˆ์šดํŠธ ๋“ฑ ํ•„์š”ํ•œ ์„ค์ •์„ ์ฃผ์ž…

Spark Operator ์˜ ์ž‘๋™ ๋ฐฉ์‹

  1. ์‚ฌ์šฉ์ž๊ฐ€ Spark Application ์„ ์ƒ์„ฑ
  2. Operator ๊ฐ€ ์ด๋ฅผ ๊ฐ์ง€ํ•˜๊ณ  K8s Object ๋ฅผ ์ƒ์„ฑ
  3. Spark Driver Pod ๊ฐ€ ์ƒ์„ฑ๋จ
  4. Spark Driver ๊ฐ€ Executor Pod ์ƒ์„ฑ ์š”์ฒญ ๋ฐ ๊ด€๋ฆฌ
  5. Spark Job ์ด ์ข…๋ฃŒ๋˜๋ฉด Operator ๊ฐ€ ๋ฆฌ์†Œ์Šค๋ฅผ ์ •๋ฆฌ

Spark Operator ์„ค์น˜


# ๋„ค์ž„์ŠคํŽ˜์ด์Šค ์ƒ์„ฑ
kubectl create namespace spark-operator
 
# Helm Repo ์ถ”๊ฐ€ ๋ฐ ์—…๋ฐ์ดํŠธ
helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm repo update
 
# ์„ค์น˜
helm install spark-operator spark-operator/spark-operator \
  -n spark-operator \
  --set enableWebhook=true \
  --set image.tag=v1beta2-1.1.27-3.5.0 \
  --set serviceAccounts.spark.create=true \
  --set serviceAccounts.spark.name=spark-sa
 
# ์„ค์น˜ ํ™•์ธ
kubectl get pods -n spark-operator
kubectl get crds | grep sparkoperator

Helm ์ฐจํŠธ๋ฅผ ํ†ตํ•ด Spark Operator ๋ฅผ ์†์‰ฝ๊ฒŒ ์„ค์น˜ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, Webhook ์„ ํ™œ์„ฑํ™”ํ•˜๋ฉด Pod ์— ํ•„์š”ํ•œ ์„ค์ •(๋ผ๋ฒจ/์–ด๋…ธํ…Œ์ด์…˜/ํ™˜๊ฒฝ๋ณ€์ˆ˜)์„ ์ž๋™ ์ฃผ์ž…ํ•  ์ˆ˜ ์žˆ๋‹ค.

Spark Job ์‹คํ–‰


SparkApplication CRD ์‚ฌ์šฉ

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi-minimal
  namespace: data-jobs
spec:
  type: Scala
  mode: cluster
  image: gcr.io/spark-operator/spark:v3.5.0
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.0.jar
  sparkVersion: 3.5.0
  driver:
    cores: 1
    memory: 512m
    serviceAccount: spark-sa
  executor:
    instances: 2
    cores: 1
    memory: 512m
  restartPolicy:
    type: Never
  timeToLiveSeconds: 600  # ์™„๋ฃŒ ํ›„ 10๋ถ„ ๋’ค Driver/Executor Pod ์ •๋ฆฌ

YAML ํŒŒ์ผ๋กœ SparkApplication CRD ๋ฅผ ์ƒ์„ฑํ•˜๋ฉด Operator ๊ฐ€ ์ด๋ฅผ ๊ฐ์ง€ํ•˜๊ณ  Driver Pod ์„ ์ƒ์„ฑํ•œ๋‹ค. SparkApplication CRD ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Spark Job ์„ ์‹คํ–‰ํ•  ๊ฒฝ์šฐ TTL ์„ ์ ์šฉํ•ด Completed ์ƒํƒœ์˜ Pod ๋ฅผ ์ž๋™์œผ๋กœ ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ๋‹ค.