Ṣiṣẹ Apache Spark lori Kubernetes

Eyin onkawe, e ku osan. Loni a yoo sọrọ diẹ nipa Apache Spark ati awọn ireti idagbasoke rẹ.

Ṣiṣẹ Apache Spark lori Kubernetes

Ni agbaye ode oni ti Big Data, Apache Spark jẹ boṣewa de facto fun idagbasoke awọn iṣẹ ṣiṣe ṣiṣe data ipele. Ni afikun, o tun lo lati ṣẹda awọn ohun elo ṣiṣanwọle ti o ṣiṣẹ ni imọran ipele micro, sisẹ ati data gbigbe ni awọn ipin kekere (Spark Structured Streaming). Ati ni aṣa o ti jẹ apakan ti akopọ Hadoop gbogbogbo, ni lilo YARN (tabi ni awọn igba miiran Apache Mesos) bi oluṣakoso orisun. Ni ọdun 2020, lilo rẹ ni fọọmu aṣa rẹ ni ibeere fun ọpọlọpọ awọn ile-iṣẹ nitori aini awọn pinpin Hadoop ti o tọ - idagbasoke ti HDP ati CDH ti duro, CDH ko ni idagbasoke daradara ati pe o ni idiyele giga, ati awọn olupese Hadoop to ku ni boya dáwọ lati tẹlẹ tabi ni a baibai ojo iwaju. Nitorinaa, ifilọlẹ ti Apache Spark ni lilo Kubernetes jẹ iwulo ti o pọ si laarin agbegbe ati awọn ile-iṣẹ nla - di boṣewa ni orchestration eiyan ati iṣakoso awọn orisun ni ikọkọ ati awọn awọsanma ti gbogbo eniyan, o yanju iṣoro naa pẹlu ṣiṣe eto awọn orisun orisun inira ti awọn iṣẹ-ṣiṣe Spark lori YARN ati pese Syeed ti o dagbasoke ni imurasilẹ pẹlu ọpọlọpọ iṣowo ati awọn pinpin ṣiṣi fun awọn ile-iṣẹ ti gbogbo titobi ati awọn ila. Ni afikun, ni ji ti gbaye-gbale, pupọ julọ ti ṣakoso tẹlẹ lati gba tọkọtaya ti awọn fifi sori ẹrọ ti ara wọn ati ti pọ si oye wọn ni lilo rẹ, eyiti o jẹ irọrun gbigbe.

Bibẹrẹ pẹlu ẹya 2.3.0, Apache Spark gba atilẹyin osise fun ṣiṣe awọn iṣẹ ṣiṣe ni iṣupọ Kubernetes ati loni, a yoo sọrọ nipa idagbasoke lọwọlọwọ ti ọna yii, awọn aṣayan pupọ fun lilo rẹ ati awọn ọfin ti yoo pade lakoko imuse.

Ni akọkọ, jẹ ki a wo ilana ti idagbasoke awọn iṣẹ ṣiṣe ati awọn ohun elo ti o da lori Apache Spark ati ṣe afihan awọn ọran aṣoju ninu eyiti o nilo lati ṣiṣẹ iṣẹ-ṣiṣe kan lori iṣupọ Kubernetes kan. Ni ngbaradi ifiweranṣẹ yii, OpenShift jẹ lilo bi pinpin ati awọn aṣẹ ti o ni ibatan si IwUlO laini aṣẹ rẹ (oc) ni yoo fun. Fun awọn ipinpinpin Kubernetes miiran, awọn aṣẹ ti o baamu lati IwUlO laini aṣẹ Kubernetes boṣewa (kubectl) tabi awọn afọwọṣe wọn (fun apẹẹrẹ, fun eto adm oc) le ṣee lo.

First lilo irú - sipaki-fi

Lakoko idagbasoke awọn iṣẹ-ṣiṣe ati awọn ohun elo, olupilẹṣẹ nilo lati ṣiṣẹ awọn iṣẹ ṣiṣe lati ṣatunṣe iyipada data. Ni imọ-jinlẹ, awọn stubs le ṣee lo fun awọn idi wọnyi, ṣugbọn idagbasoke pẹlu ikopa ti gidi (botilẹjẹpe idanwo) awọn iṣẹlẹ ti awọn eto ipari ti fihan pe o yarayara ati dara julọ ni kilasi awọn iṣẹ ṣiṣe. Ninu ọran naa nigba ti a ba ṣatunṣe lori awọn iṣẹlẹ gidi ti awọn eto ipari, awọn oju iṣẹlẹ meji ṣee ṣe:

  • Olùgbéejáde nṣiṣẹ iṣẹ-ṣiṣe Spark ni agbegbe ni ipo imurasilẹ;

    Ṣiṣẹ Apache Spark lori Kubernetes

  • Olùgbéejáde kan nṣiṣẹ iṣẹ-ṣiṣe Spark kan lori iṣupọ Kubernetes ni lupu idanwo kan.

    Ṣiṣẹ Apache Spark lori Kubernetes

Aṣayan akọkọ ni ẹtọ lati wa, ṣugbọn pẹlu nọmba awọn alailanfani:

  • Olukuluku Olùgbéejáde gbọdọ wa ni ipese pẹlu wiwọle lati ibi iṣẹ si gbogbo awọn iṣẹlẹ ti awọn eto ipari ti o nilo;
  • iye awọn ohun elo ti o to ni a nilo lori ẹrọ iṣẹ lati ṣiṣẹ iṣẹ-ṣiṣe ti o dagbasoke.

Aṣayan keji ko ni awọn aila-nfani wọnyi, nitori lilo iṣupọ Kubernetes n gba ọ laaye lati pin adagun orisun orisun pataki fun ṣiṣe awọn iṣẹ ṣiṣe ati pese iwọle si pataki si awọn iṣẹlẹ eto ipari, ni irọrun pese iraye si rẹ nipa lilo awoṣe ipa Kubernetes fun gbogbo awọn ọmọ ẹgbẹ ti idagbasoke. Jẹ ki a ṣe afihan rẹ bi ọran lilo akọkọ - ifilọlẹ awọn iṣẹ-ṣiṣe Spark lati ẹrọ idagbasoke agbegbe kan lori iṣupọ Kubernetes ni lupu idanwo kan.

Jẹ ki a sọrọ diẹ sii nipa ilana ti iṣeto Spark lati ṣiṣẹ ni agbegbe. Lati bẹrẹ lilo Spark o nilo lati fi sii:

mkdir /opt/spark
cd /opt/spark
wget http://mirror.linux-ia64.org/apache/spark/spark-2.4.5/spark-2.4.5.tgz
tar zxvf spark-2.4.5.tgz
rm -f spark-2.4.5.tgz

A gba awọn idii pataki fun ṣiṣẹ pẹlu Kubernetes:

cd spark-2.4.5/
./build/mvn -Pkubernetes -DskipTests clean package

Kọ ni kikun gba akoko pupọ, ati lati ṣẹda awọn aworan Docker ati ṣiṣe wọn lori iṣupọ Kubernetes, o nilo awọn faili idẹ nikan lati inu ilana “apejọ /”, nitorinaa o le kọ koko-ọrọ yii nikan:

./build/mvn -f ./assembly/pom.xml -Pkubernetes -DskipTests clean package

Lati ṣiṣẹ awọn iṣẹ Spark lori Kubernetes, o nilo lati ṣẹda aworan Docker lati lo bi aworan ipilẹ. Awọn ọna 2 ṣee ṣe nibi:

  • Aworan Docker ti ipilẹṣẹ pẹlu koodu iṣẹ-ṣiṣe Spark ti o ṣiṣẹ;
  • Aworan ti a ṣẹda pẹlu Spark nikan ati awọn igbẹkẹle to wulo, koodu ti o ṣiṣẹ ti gbalejo latọna jijin (fun apẹẹrẹ, ni HDFS).

Ni akọkọ, jẹ ki a kọ aworan Docker kan ti o ni apẹẹrẹ idanwo ti iṣẹ-ṣiṣe Spark kan. Lati ṣẹda awọn aworan Docker, Spark ni ohun elo ti a pe ni “docker-image-tool”. Jẹ ki a ṣe iwadi lori iranlọwọ naa:

./bin/docker-image-tool.sh --help

Pẹlu iranlọwọ rẹ, o le ṣẹda awọn aworan Docker ati gbe wọn si awọn iforukọsilẹ latọna jijin, ṣugbọn nipasẹ aiyipada o ni nọmba awọn aila-nfani:

  • laisi ikuna ṣẹda awọn aworan Docker 3 ni ẹẹkan - fun Spark, PySpark ati R;
  • ko gba ọ laaye lati pato orukọ aworan kan.

Nitorinaa, a yoo lo ẹya ti o yipada ti ohun elo yii ti a fun ni isalẹ:

vi bin/docker-image-tool-upd.sh

#!/usr/bin/env bash

function error {
  echo "$@" 1>&2
  exit 1
}

if [ -z "${SPARK_HOME}" ]; then
  SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi
. "${SPARK_HOME}/bin/load-spark-env.sh"

function image_ref {
  local image="$1"
  local add_repo="${2:-1}"
  if [ $add_repo = 1 ] && [ -n "$REPO" ]; then
    image="$REPO/$image"
  fi
  if [ -n "$TAG" ]; then
    image="$image:$TAG"
  fi
  echo "$image"
}

function build {
  local BUILD_ARGS
  local IMG_PATH

  if [ ! -f "$SPARK_HOME/RELEASE" ]; then
    IMG_PATH=$BASEDOCKERFILE
    BUILD_ARGS=(
      ${BUILD_PARAMS}
      --build-arg
      img_path=$IMG_PATH
      --build-arg
      datagram_jars=datagram/runtimelibs
      --build-arg
      spark_jars=assembly/target/scala-$SPARK_SCALA_VERSION/jars
    )
  else
    IMG_PATH="kubernetes/dockerfiles"
    BUILD_ARGS=(${BUILD_PARAMS})
  fi

  if [ -z "$IMG_PATH" ]; then
    error "Cannot find docker image. This script must be run from a runnable distribution of Apache Spark."
  fi

  if [ -z "$IMAGE_REF" ]; then
    error "Cannot find docker image reference. Please add -i arg."
  fi

  local BINDING_BUILD_ARGS=(
    ${BUILD_PARAMS}
    --build-arg
    base_img=$(image_ref $IMAGE_REF)
  )
  local BASEDOCKERFILE=${BASEDOCKERFILE:-"$IMG_PATH/spark/docker/Dockerfile"}

  docker build $NOCACHEARG "${BUILD_ARGS[@]}" 
    -t $(image_ref $IMAGE_REF) 
    -f "$BASEDOCKERFILE" .
}

function push {
  docker push "$(image_ref $IMAGE_REF)"
}

function usage {
  cat <<EOF
Usage: $0 [options] [command]
Builds or pushes the built-in Spark Docker image.

Commands:
  build       Build image. Requires a repository address to be provided if the image will be
              pushed to a different registry.
  push        Push a pre-built image to a registry. Requires a repository address to be provided.

Options:
  -f file               Dockerfile to build for JVM based Jobs. By default builds the Dockerfile shipped with Spark.
  -p file               Dockerfile to build for PySpark Jobs. Builds Python dependencies and ships with Spark.
  -R file               Dockerfile to build for SparkR Jobs. Builds R dependencies and ships with Spark.
  -r repo               Repository address.
  -i name               Image name to apply to the built image, or to identify the image to be pushed.  
  -t tag                Tag to apply to the built image, or to identify the image to be pushed.
  -m                    Use minikube's Docker daemon.
  -n                    Build docker image with --no-cache
  -b arg      Build arg to build or push the image. For multiple build args, this option needs to
              be used separately for each build arg.

Using minikube when building images will do so directly into minikube's Docker daemon.
There is no need to push the images into minikube in that case, they'll be automatically
available when running applications inside the minikube cluster.

Check the following documentation for more information on using the minikube Docker daemon:

  https://kubernetes.io/docs/getting-started-guides/minikube/#reusing-the-docker-daemon

Examples:
  - Build image in minikube with tag "testing"
    $0 -m -t testing build

  - Build and push image with tag "v2.3.0" to docker.io/myrepo
    $0 -r docker.io/myrepo -t v2.3.0 build
    $0 -r docker.io/myrepo -t v2.3.0 push
EOF
}

if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
  usage
  exit 0
fi

REPO=
TAG=
BASEDOCKERFILE=
NOCACHEARG=
BUILD_PARAMS=
IMAGE_REF=
while getopts f:mr:t:nb:i: option
do
 case "${option}"
 in
 f) BASEDOCKERFILE=${OPTARG};;
 r) REPO=${OPTARG};;
 t) TAG=${OPTARG};;
 n) NOCACHEARG="--no-cache";;
 i) IMAGE_REF=${OPTARG};;
 b) BUILD_PARAMS=${BUILD_PARAMS}" --build-arg "${OPTARG};;
 esac
done

case "${@: -1}" in
  build)
    build
    ;;
  push)
    if [ -z "$REPO" ]; then
      usage
      exit 1
    fi
    push
    ;;
  *)
    usage
    exit 1
    ;;
esac

Pẹlu iranlọwọ rẹ, a ṣe apejọ aworan ipilẹ Spark kan ti o ni iṣẹ idanwo kan fun iṣiro Pi ni lilo Spark (nibi {docker-registry-url} ni URL ti iforukọsilẹ aworan Docker rẹ, {repo} ni orukọ ibi ipamọ inu iforukọsilẹ, eyiti o baamu iṣẹ akanṣe naa ni OpenShift , {image-name} - orukọ aworan naa (ti o ba jẹ pe a lo ipinya ipele mẹta ti awọn aworan, fun apẹẹrẹ, bi ninu iforukọsilẹ iṣọpọ ti awọn aworan Red Hat OpenShift), {tag} - tag ti eyi ẹya aworan):

./bin/docker-image-tool-upd.sh -f resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile -r {docker-registry-url}/{repo} -i {image-name} -t {tag} build

Wọle si iṣupọ OKD ni lilo ohun elo console (nibi {OKD-API-URL} ni URL API iṣupọ OKD):

oc login {OKD-API-URL}

Jẹ ki a gba ami ami olumulo lọwọlọwọ fun aṣẹ ni iforukọsilẹ Docker:

oc whoami -t

Wọle si Iforukọsilẹ Docker inu ti iṣupọ OKD (a lo ami-ami ti a gba ni lilo aṣẹ iṣaaju bi ọrọ igbaniwọle):

docker login {docker-registry-url}

Jẹ ki a gbe aworan Docker ti o pejọ si OKD Iforukọsilẹ Docker:

./bin/docker-image-tool-upd.sh -r {docker-registry-url}/{repo} -i {image-name} -t {tag} push

Jẹ ki a ṣayẹwo pe aworan ti o pejọ wa ni OKD. Lati ṣe eyi, ṣii URL ninu ẹrọ aṣawakiri pẹlu atokọ awọn aworan ti iṣẹ akanṣe ti o baamu (nibi {project} ni orukọ iṣẹ akanṣe inu akojọpọ OpenShift, {OKD-WEBUI-URL} ni URL ti console oju opo wẹẹbu OpenShift ) - https://{OKD-WEBUI-URL}/console /project/{project}/browse/images/{image-name}.

Lati ṣiṣẹ awọn iṣẹ ṣiṣe, akọọlẹ iṣẹ gbọdọ ṣẹda pẹlu awọn anfani lati ṣiṣe awọn podu bi gbongbo (a yoo jiroro lori aaye yii nigbamii):

oc create sa spark -n {project}
oc adm policy add-scc-to-user anyuid -z spark -n {project}

Jẹ ki a ṣiṣẹ aṣẹ-fisilẹ sipaki lati ṣe atẹjade iṣẹ-ṣiṣe Spark kan si iṣupọ OKD, ni pato akọọlẹ iṣẹ ti o ṣẹda ati aworan Docker:

 /opt/spark/bin/spark-submit --name spark-test --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=3 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.namespace={project} --conf spark.submit.deployMode=cluster --conf spark.kubernetes.container.image={docker-registry-url}/{repo}/{image-name}:{tag} --conf spark.master=k8s://https://{OKD-API-URL}  local:///opt/spark/examples/target/scala-2.11/jars/spark-examples_2.11-2.4.5.jar

Nibi:

-orukọ - orukọ iṣẹ-ṣiṣe ti yoo kopa ninu dida orukọ awọn pods Kubernetes;

— kilasi — kilasi faili ti o le ṣiṣẹ, ti a pe nigbati iṣẹ-ṣiṣe ba bẹrẹ;

-conf - Awọn paramita atunto sipaki;

spark.executor.instances - awọn nọmba ti Spark executors lati lọlẹ;

spark.kubernetes.authenticate.driver.serviceAccountName - orukọ akọọlẹ iṣẹ Kubernetes ti a lo nigbati o ba ṣe ifilọlẹ awọn adarọ-ese (lati ṣalaye ipo aabo ati awọn agbara nigba ibaraenisepo pẹlu Kubernetes API);

spark.kubernetes.namespace - Kubernetes namespace ninu eyiti awakọ ati awọn adarọ-ese yoo ṣe ifilọlẹ;

spark.submit.deployMode - ọna ti ifilọlẹ Spark (fun boṣewa sipaki-fi silẹ “iṣupọ” ti lo, fun Spark Operator ati nigbamii awọn ẹya ti Spark “onibara”);

spark.kubernetes.container.image - Aworan Docker ti a lo lati ṣe ifilọlẹ awọn adarọ-ese;

spark.master — Kubernetes API URL (ita ti wa ni pato ki wiwọle waye lati agbegbe ẹrọ);

local: // jẹ ọna si Spark executable inu aworan Docker.

A lọ si iṣẹ akanṣe OKD ti o baamu ati ṣe iwadi awọn adarọ-ese ti a ṣẹda - https://{OKD-WEBUI-URL}/console/project/{project}/browse/pods.

Lati rọrun ilana idagbasoke, aṣayan miiran le ṣee lo, ninu eyiti a ṣẹda aworan ipilẹ ti o wọpọ ti Spark, ti ​​a lo nipasẹ gbogbo awọn iṣẹ ṣiṣe lati ṣiṣẹ, ati awọn fọto ti awọn faili ti o ṣiṣẹ ni a tẹjade si ibi ipamọ ita (fun apẹẹrẹ, Hadoop) ati pato nigbati o pe. sipaki-fi silẹ bi ọna asopọ kan. Ni ọran yii, o le ṣiṣe awọn ẹya oriṣiriṣi ti awọn iṣẹ-ṣiṣe Spark laisi atunṣe awọn aworan Docker, lilo, fun apẹẹrẹ, WebHDFS lati gbejade awọn aworan. A firanṣẹ ibeere kan lati ṣẹda faili kan (nibi {ogun} ni agbalejo ti iṣẹ WebHDFS, {port} ni ibudo ti iṣẹ WebHDFS, {ọna-si-faili-on-hdfs} ni ọna ti o fẹ si faili naa lori HDFS):

curl -i -X PUT "http://{host}:{port}/webhdfs/v1/{path-to-file-on-hdfs}?op=CREATE

Iwọ yoo gba esi bi eleyi (nibi {ipo} ni URL ti o nilo lati lo lati ṣe igbasilẹ faili naa):

HTTP/1.1 307 TEMPORARY_REDIRECT
Location: {location}
Content-Length: 0

Gbe faili Spark ṣiṣẹ sinu HDFS (nibi {ọna-si-faili-agbegbe} ni ọna si faili imuṣiṣẹ Spark lori agbalejo lọwọlọwọ):

curl -i -X PUT -T {path-to-local-file} "{location}"

Lẹhin eyi, a le ṣe ifisilẹ sipaki ni lilo faili Spark ti a gbe si HDFS (nibi {orukọ kilasi} ni orukọ kilasi ti o nilo lati ṣe ifilọlẹ lati pari iṣẹ-ṣiṣe naa):

/opt/spark/bin/spark-submit --name spark-test --class {class-name} --conf spark.executor.instances=3 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.namespace={project} --conf spark.submit.deployMode=cluster --conf spark.kubernetes.container.image={docker-registry-url}/{repo}/{image-name}:{tag} --conf spark.master=k8s://https://{OKD-API-URL}  hdfs://{host}:{port}/{path-to-file-on-hdfs}

O yẹ ki o ṣe akiyesi pe lati le wọle si HDFS ati rii daju pe iṣẹ ṣiṣe ṣiṣẹ, o le nilo lati yi Dockerfile pada ati iwe afọwọkọ titẹsipoint.sh - ṣafikun itọsọna kan si Dockerfile lati daakọ awọn ile-ikawe ti o gbẹkẹle si /opt/spark/jars directory ati pẹlu faili iṣeto HDFS ni SPARK_CLASSPATH ni aaye titẹsi. sh.

Ọran lilo keji - Apache Livy

Siwaju sii, nigbati iṣẹ kan ba ni idagbasoke ati abajade nilo lati ni idanwo, ibeere naa waye ti ifilọlẹ rẹ gẹgẹbi apakan ti ilana CI / CD ati titele ipo ipaniyan rẹ. Nitoribẹẹ, o le ṣiṣẹ ni lilo ipe ifisilẹ sipaki ti agbegbe, ṣugbọn eyi ṣe idiju awọn amayederun CI/CD nitori o nilo fifi sori ẹrọ ati tunto Spark lori awọn aṣoju olupin CI / awọn asare ati ṣeto iraye si Kubernetes API. Fun ọran yii, imuse ibi-afẹde ti yan lati lo Apache Livy bi API REST fun ṣiṣe awọn iṣẹ-ṣiṣe Spark ti o gbalejo inu iṣupọ Kubernetes kan. Pẹlu iranlọwọ rẹ, o le ṣiṣe awọn iṣẹ-ṣiṣe Spark lori iṣupọ Kubernetes nipa lilo awọn ibeere cURL deede, eyiti o rọrun ni imuse ti o da lori eyikeyi ojutu CI, ati gbigbe si inu iṣupọ Kubernetes yanju ọran ti ijẹrisi nigbati o ba n ṣepọ pẹlu Kubernetes API.

Ṣiṣẹ Apache Spark lori Kubernetes

Jẹ ki a ṣe afihan rẹ bi ọran lilo keji - ṣiṣiṣẹ awọn iṣẹ Spark gẹgẹbi apakan ti ilana CI/CD lori iṣupọ Kubernetes ni lupu idanwo kan.

Diẹ diẹ nipa Apache Livy - o ṣiṣẹ bi olupin HTTP kan ti o pese wiwo oju opo wẹẹbu kan ati API RESTful ti o fun ọ laaye lati ṣe ifilọlẹ sipaki-fisilẹ latọna jijin nipasẹ gbigbe awọn aye pataki. Ni aṣa, o ti firanṣẹ gẹgẹbi apakan ti pinpin HDP, ṣugbọn o tun le gbe lọ si OKD tabi eyikeyi fifi sori Kubernetes miiran nipa lilo ifihan ti o yẹ ati ṣeto awọn aworan Docker, bii eyi - github.com/ttauveron/k8s-big-data-experiments/tree/master/livy-spark-2.3. Fun ọran wa, aworan Docker ti o jọra ni a kọ, pẹlu ẹya Spark 2.4.5 lati Dockerfile atẹle:

FROM java:8-alpine

ENV SPARK_HOME=/opt/spark
ENV LIVY_HOME=/opt/livy
ENV HADOOP_CONF_DIR=/etc/hadoop/conf
ENV SPARK_USER=spark

WORKDIR /opt

RUN apk add --update openssl wget bash && 
    wget -P /opt https://downloads.apache.org/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz && 
    tar xvzf spark-2.4.5-bin-hadoop2.7.tgz && 
    rm spark-2.4.5-bin-hadoop2.7.tgz && 
    ln -s /opt/spark-2.4.5-bin-hadoop2.7 /opt/spark

RUN wget http://mirror.its.dal.ca/apache/incubator/livy/0.7.0-incubating/apache-livy-0.7.0-incubating-bin.zip && 
    unzip apache-livy-0.7.0-incubating-bin.zip && 
    rm apache-livy-0.7.0-incubating-bin.zip && 
    ln -s /opt/apache-livy-0.7.0-incubating-bin /opt/livy && 
    mkdir /var/log/livy && 
    ln -s /var/log/livy /opt/livy/logs && 
    cp /opt/livy/conf/log4j.properties.template /opt/livy/conf/log4j.properties

ADD livy.conf /opt/livy/conf
ADD spark-defaults.conf /opt/spark/conf/spark-defaults.conf
ADD entrypoint.sh /entrypoint.sh

ENV PATH="/opt/livy/bin:${PATH}"

EXPOSE 8998

ENTRYPOINT ["/entrypoint.sh"]
CMD ["livy-server"]

Aworan ti ipilẹṣẹ le jẹ itumọ ati gbejade si ibi ipamọ Docker ti o wa tẹlẹ, gẹgẹbi ibi ipamọ OKD inu. Lati mu ṣiṣẹ, lo ifihan atẹle ({registry-url} - URL ti iforukọsilẹ aworan Docker, {image-name} - Orukọ aworan Docker, {tag} - Aami aworan Docker, {livy-url} - URL ti o fẹ nibiti olupin yoo wa ni wiwọle si Livy; ifihan “Route” ti lo ti Red Hat OpenShift ba lo bi pinpin Kubernetes, bibẹẹkọ Ingress ti o baamu tabi ifihan iṣẹ ti iru NodePort ti lo):

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    component: livy
  name: livy
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      component: livy
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        component: livy
    spec:
      containers:
        - command:
            - livy-server
          env:
            - name: K8S_API_HOST
              value: localhost
            - name: SPARK_KUBERNETES_IMAGE
              value: 'gnut3ll4/spark:v1.0.14'
          image: '{registry-url}/{image-name}:{tag}'
          imagePullPolicy: Always
          name: livy
          ports:
            - containerPort: 8998
              name: livy-rest
              protocol: TCP
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /var/log/livy
              name: livy-log
            - mountPath: /opt/.livy-sessions/
              name: livy-sessions
            - mountPath: /opt/livy/conf/livy.conf
              name: livy-config
              subPath: livy.conf
            - mountPath: /opt/spark/conf/spark-defaults.conf
              name: spark-config
              subPath: spark-defaults.conf
        - command:
            - /usr/local/bin/kubectl
            - proxy
            - '--port'
            - '8443'
          image: 'gnut3ll4/kubectl-sidecar:latest'
          imagePullPolicy: Always
          name: kubectl
          ports:
            - containerPort: 8443
              name: k8s-api
              protocol: TCP
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: spark
      serviceAccountName: spark
      terminationGracePeriodSeconds: 30
      volumes:
        - emptyDir: {}
          name: livy-log
        - emptyDir: {}
          name: livy-sessions
        - configMap:
            defaultMode: 420
            items:
              - key: livy.conf
                path: livy.conf
            name: livy-config
          name: livy-config
        - configMap:
            defaultMode: 420
            items:
              - key: spark-defaults.conf
                path: spark-defaults.conf
            name: livy-config
          name: spark-config
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: livy-config
data:
  livy.conf: |-
    livy.spark.deploy-mode=cluster
    livy.file.local-dir-whitelist=/opt/.livy-sessions/
    livy.spark.master=k8s://http://localhost:8443
    livy.server.session.state-retain.sec = 8h
  spark-defaults.conf: 'spark.kubernetes.container.image        "gnut3ll4/spark:v1.0.14"'
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: livy
  name: livy
spec:
  ports:
    - name: livy-rest
      port: 8998
      protocol: TCP
      targetPort: 8998
  selector:
    component: livy
  sessionAffinity: None
  type: ClusterIP
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  labels:
    app: livy
  name: livy
spec:
  host: {livy-url}
  port:
    targetPort: livy-rest
  to:
    kind: Service
    name: livy
    weight: 100
  wildcardPolicy: None

Lẹhin lilo rẹ ati ṣiṣe ifilọlẹ adarọ-ese ni aṣeyọri, wiwo ayaworan Livy wa ni ọna asopọ: http://{livy-url}/ui. Pẹlu Livy, a le ṣe atẹjade iṣẹ Spark wa nipa lilo ibeere REST lati, fun apẹẹrẹ, Postman. Apeere ti ikojọpọ pẹlu awọn ibeere ni a gbekalẹ ni isalẹ (awọn ariyanjiyan atunto pẹlu awọn oniyipada pataki fun iṣẹ ṣiṣe ti a ṣe ifilọlẹ ni apejọ “args”):

{
    "info": {
        "_postman_id": "be135198-d2ff-47b6-a33e-0d27b9dba4c8",
        "name": "Spark Livy",
        "schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json"
    },
    "item": [
        {
            "name": "1 Submit job with jar",
            "request": {
                "method": "POST",
                "header": [
                    {
                        "key": "Content-Type",
                        "value": "application/json"
                    }
                ],
                "body": {
                    "mode": "raw",
                    "raw": "{nt"file": "local:///opt/spark/examples/target/scala-2.11/jars/spark-examples_2.11-2.4.5.jar", nt"className": "org.apache.spark.examples.SparkPi",nt"numExecutors":1,nt"name": "spark-test-1",nt"conf": {ntt"spark.jars.ivy": "/tmp/.ivy",ntt"spark.kubernetes.authenticate.driver.serviceAccountName": "spark",ntt"spark.kubernetes.namespace": "{project}",ntt"spark.kubernetes.container.image": "{docker-registry-url}/{repo}/{image-name}:{tag}"nt}n}"
                },
                "url": {
                    "raw": "http://{livy-url}/batches",
                    "protocol": "http",
                    "host": [
                        "{livy-url}"
                    ],
                    "path": [
                        "batches"
                    ]
                }
            },
            "response": []
        },
        {
            "name": "2 Submit job without jar",
            "request": {
                "method": "POST",
                "header": [
                    {
                        "key": "Content-Type",
                        "value": "application/json"
                    }
                ],
                "body": {
                    "mode": "raw",
                    "raw": "{nt"file": "hdfs://{host}:{port}/{path-to-file-on-hdfs}", nt"className": "{class-name}",nt"numExecutors":1,nt"name": "spark-test-2",nt"proxyUser": "0",nt"conf": {ntt"spark.jars.ivy": "/tmp/.ivy",ntt"spark.kubernetes.authenticate.driver.serviceAccountName": "spark",ntt"spark.kubernetes.namespace": "{project}",ntt"spark.kubernetes.container.image": "{docker-registry-url}/{repo}/{image-name}:{tag}"nt},nt"args": [ntt"HADOOP_CONF_DIR=/opt/spark/hadoop-conf",ntt"MASTER=k8s://https://kubernetes.default.svc:8443"nt]n}"
                },
                "url": {
                    "raw": "http://{livy-url}/batches",
                    "protocol": "http",
                    "host": [
                        "{livy-url}"
                    ],
                    "path": [
                        "batches"
                    ]
                }
            },
            "response": []
        }
    ],
    "event": [
        {
            "listen": "prerequest",
            "script": {
                "id": "41bea1d0-278c-40c9-ad42-bf2e6268897d",
                "type": "text/javascript",
                "exec": [
                    ""
                ]
            }
        },
        {
            "listen": "test",
            "script": {
                "id": "3cdd7736-a885-4a2d-9668-bd75798f4560",
                "type": "text/javascript",
                "exec": [
                    ""
                ]
            }
        }
    ],
    "protocolProfileBehavior": {}
}

Jẹ ki a ṣiṣẹ ibeere akọkọ lati inu ikojọpọ, lọ si wiwo OKD ki o ṣayẹwo pe iṣẹ naa ti ṣe ifilọlẹ ni aṣeyọri - https://{OKD-WEBUI-URL}/console/project/{project}/browse/pods. Ni akoko kanna, igba kan yoo han ni wiwo Livy (http://{livy-url}/ui), laarin eyiti, ni lilo Livy API tabi wiwo ayaworan, o le tọpa ilọsiwaju ti iṣẹ-ṣiṣe ki o ṣe iwadi igba naa awọn akọọlẹ.

Bayi jẹ ki a fihan bi Livy ṣe n ṣiṣẹ. Lati ṣe eyi, jẹ ki a ṣayẹwo awọn akọọlẹ ti apoti Livy inu adarọ ese pẹlu olupin Livy - https://{OKD-WEBUI-URL}/console/project/{project}/browse/pods/{livy-pod-name }?tab=àkọọ́lẹ̀. Lati ọdọ wọn a le rii pe nigba pipe Livy REST API ninu apoti kan ti a npè ni “livy”, ifisilẹ sipaki kan ti wa ni ṣiṣe, ti o jọra si eyiti a lo loke (nibi {livy-pod-name} ni orukọ ti adarọ-ese ti a ṣẹda. pẹlu olupin Livy). Akopọ naa tun ṣafihan ibeere keji ti o fun ọ laaye lati ṣiṣe awọn iṣẹ ṣiṣe ti o gbalejo latọna jijin ṣiṣẹ Spark nipa lilo olupin Livy kan.

Kẹta lilo irú - Spark onišẹ

Nisisiyi pe iṣẹ-ṣiṣe ti ni idanwo, ibeere ti nṣiṣẹ ni igbagbogbo dide. Ọna abinibi lati ṣe awọn iṣẹ ṣiṣe nigbagbogbo ni iṣupọ Kubernetes jẹ nkan CronJob ati pe o le lo, ṣugbọn ni akoko yii lilo awọn oniṣẹ lati ṣakoso awọn ohun elo ni Kubernetes jẹ olokiki pupọ ati fun Spark o wa oniṣẹ ti o dagba, eyiti o tun jẹ. lo ninu Idawọlẹ-ipele solusan (fun apẹẹrẹ, Lightbend FastData Platform). A ṣeduro lilo rẹ - ẹya iduroṣinṣin lọwọlọwọ ti Spark (2.4.5) ni awọn aṣayan iṣeto ni opin fun ṣiṣe awọn iṣẹ Spark ni Kubernetes, lakoko ti ẹya pataki ti atẹle (3.0.0) n kede atilẹyin ni kikun fun Kubernetes, ṣugbọn ọjọ itusilẹ rẹ jẹ aimọ . Oluṣeto Spark ṣe isanpada fun aipe yii nipa fifi awọn aṣayan atunto pataki kun (fun apẹẹrẹ, iṣagbesori ConfigMap kan pẹlu iṣeto iwọle Hadoop si awọn pods Spark) ati agbara lati ṣiṣe iṣẹ ṣiṣe eto deede.

Ṣiṣẹ Apache Spark lori Kubernetes
Jẹ ki a ṣe afihan rẹ bi ọran lilo kẹta - ṣiṣe awọn iṣẹ Spark nigbagbogbo lori iṣupọ Kubernetes ni lupu iṣelọpọ kan.

Oluṣeto Spark jẹ orisun ṣiṣi ati idagbasoke laarin Google Cloud Platform - github.com/GoogleCloudPlatform/spark-on-k8s-operator. Fifi sori rẹ le ṣee ṣe ni awọn ọna mẹta:

  1. Gẹgẹbi apakan ti Lightbend FastData Platform / Cloudflow fifi sori;
  2. Lilo Helm:
    helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
    helm install incubator/sparkoperator --namespace spark-operator
    	

  3. Lilo awọn ifihan lati ibi ipamọ osise (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/tree/master/manifest). O tọ lati ṣe akiyesi atẹle naa - Cloudflow pẹlu oniṣẹ kan pẹlu ẹya API v1beta1. Ti o ba ti lo iru fifi sori ẹrọ, awọn apejuwe ifihan ohun elo Spark yẹ ki o da lori apẹẹrẹ awọn afi ni Git pẹlu ẹya API ti o yẹ, fun apẹẹrẹ, "v1beta1-0.9.0-2.4.0". Ẹya ti oniṣẹ ni a le rii ni apejuwe ti CRD ti o wa ninu oniṣẹ ninu iwe-itumọ “awọn ẹya”:
    oc get crd sparkapplications.sparkoperator.k8s.io -o yaml
    	

Ti oniṣẹ ba ti fi sori ẹrọ ni deede, adarọ ese ti nṣiṣe lọwọ pẹlu oniṣẹ Spark yoo han ninu iṣẹ akanṣe (fun apẹẹrẹ, cloudflow-fdp-sparkoperator ni aaye Cloudflow fun fifi sori Cloudflow) ati iru orisun orisun Kubernetes ti o baamu ti a npè ni “sparkapplications” yoo han. . O le ṣawari awọn ohun elo Spark ti o wa pẹlu aṣẹ atẹle:

oc get sparkapplications -n {project}

Lati ṣiṣẹ awọn iṣẹ-ṣiṣe nipa lilo Spark Operator o nilo lati ṣe awọn nkan mẹta:

  • ṣẹda aworan Docker ti o pẹlu gbogbo awọn ile-ikawe pataki, bakanna bi iṣeto ni ati awọn faili ṣiṣe. Ni aworan ibi-afẹde, eyi jẹ aworan ti a ṣẹda ni ipele CI / CD ati idanwo lori iṣupọ idanwo;
  • Ṣe atẹjade aworan Docker kan si iforukọsilẹ ti o wa lati inu iṣupọ Kubernetes;
  • ṣe ipilẹṣẹ ifihan pẹlu iru “SparkApplication” ati apejuwe ti iṣẹ-ṣiṣe lati ṣe ifilọlẹ. Awọn ifihan apẹẹrẹ wa ni ibi ipamọ osise (fun apẹẹrẹ. github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/v1beta1-0.9.0-2.4.0/examples/spark-pi.yaml). Awọn aaye pataki wa lati ṣe akiyesi nipa manifesto:
    1. iwe-itumọ “apiVersion” gbọdọ tọkasi ẹya API ti o baamu ẹya oniṣẹ;
    2. iwe-itumọ “metadata.namespace” gbọdọ tọka aaye orukọ ninu eyiti ohun elo naa yoo ṣe ifilọlẹ;
    3. iwe-itumọ “spec.image” gbọdọ ni adirẹsi ti aworan Docker ti o ṣẹda ninu iforukọsilẹ wiwọle;
    4. iwe-itumọ “spec.mainClass” gbọdọ ni kilasi iṣẹ-ṣiṣe Spark ti o nilo lati ṣiṣẹ nigbati ilana ba bẹrẹ;
    5. ọna si faili idẹ ti o le ṣiṣẹ gbọdọ wa ni pato ninu iwe-itumọ “spec.mainApplicationFile”;
    6. iwe-itumọ “spec.sparkVersion” gbọdọ tọkasi ẹya ti Spark ti nlo;
    7. iwe-itumọ “spec.driver.serviceAccount” gbọdọ pato iroyin iṣẹ laarin aaye orukọ Kubernetes ti o baamu ti yoo ṣee lo lati ṣiṣẹ ohun elo naa;
    8. iwe-itumọ “spec.executor” gbọdọ tọka nọmba awọn orisun ti a pin si ohun elo naa;
    9. iwe-itumọ “spec.volumeMounts” gbọdọ pato itọsọna agbegbe ninu eyiti awọn faili iṣẹ-ṣiṣe Spark agbegbe yoo ṣẹda.

Apeere ti ipilẹṣẹ ifihan (nibi {spark-service-account} jẹ akọọlẹ iṣẹ kan ninu iṣupọ Kubernetes fun ṣiṣe awọn iṣẹ ṣiṣe Spark):

apiVersion: "sparkoperator.k8s.io/v1beta1"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: {project}
spec:
  type: Scala
  mode: cluster
  image: "gcr.io/spark-operator/spark:v2.4.0"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar"
  sparkVersion: "2.4.0"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 0.1
    coreLimit: "200m"
    memory: "512m"
    labels:
      version: 2.4.0
    serviceAccount: {spark-service-account}
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 2.4.0
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

Ifihan yii ṣalaye akọọlẹ iṣẹ kan fun eyiti, ṣaaju ki o to tẹjade ifihan, o gbọdọ ṣẹda awọn asopọ ipa pataki ti o pese awọn ẹtọ iraye si pataki fun ohun elo Spark lati ṣe ajọṣepọ pẹlu Kubernetes API (ti o ba jẹ dandan). Ninu ọran wa, ohun elo nilo awọn ẹtọ lati ṣẹda Pods. Jẹ ká ṣẹda awọn pataki ipa abuda:

oc adm policy add-role-to-user edit system:serviceaccount:{project}:{spark-service-account} -n {project}

O tun tọ lati ṣe akiyesi pe sipesifikesonu iṣafihan yii le pẹlu paramita “hadoopConfigMap” kan, eyiti o fun ọ laaye lati pato ConfigMap kan pẹlu iṣeto Hadoop laisi nini lati gbe faili ti o baamu kọkọ si aworan Docker. O tun dara fun ṣiṣe awọn iṣẹ ṣiṣe nigbagbogbo - lilo paramita “iṣeto”, iṣeto kan fun ṣiṣe iṣẹ-ṣiṣe ti a fun ni a le sọ pato.

Lẹhin iyẹn, a ṣafipamọ ifarahan wa si faili spark-pi.yaml a si lo si iṣupọ Kubernetes wa:

oc apply -f spark-pi.yaml

Eyi yoo ṣẹda ohun kan ti iru “awọn ohun elo”:

oc get sparkapplications -n {project}
> NAME       AGE
> spark-pi   22h

Ni idi eyi, adarọ ese kan pẹlu ohun elo kan yoo ṣẹda, ipo eyiti yoo han ni “awọn ohun elo” ti a ṣẹda. O le wo pẹlu aṣẹ atẹle:

oc get sparkapplications spark-pi -o yaml -n {project}

Lẹhin ipari iṣẹ-ṣiṣe naa, POD yoo gbe lọ si ipo “Pari”, eyiti yoo tun ṣe imudojuiwọn ni “awọn ohun elo”. Awọn igbasilẹ ohun elo ni a le wo ni ẹrọ aṣawakiri tabi lilo aṣẹ atẹle (nibi {sparkapplications-pod-name} ni orukọ podu ti iṣẹ ṣiṣe):

oc logs {sparkapplications-pod-name} -n {project}

Awọn iṣẹ-ṣiṣe Spark tun le ṣakoso ni lilo ohun elo sparkctl amọja. Lati fi sii, ẹda ẹda oniye pẹlu koodu orisun rẹ, fi sori ẹrọ Go ki o kọ ohun elo yii:

git clone https://github.com/GoogleCloudPlatform/spark-on-k8s-operator.git
cd spark-on-k8s-operator/
wget https://dl.google.com/go/go1.13.3.linux-amd64.tar.gz
tar -xzf go1.13.3.linux-amd64.tar.gz
sudo mv go /usr/local
mkdir $HOME/Projects
export GOROOT=/usr/local/go
export GOPATH=$HOME/Projects
export PATH=$GOPATH/bin:$GOROOT/bin:$PATH
go -version
cd sparkctl
go build -o sparkctl
sudo mv sparkctl /usr/local/bin

Jẹ ki a ṣayẹwo atokọ ti awọn iṣẹ ṣiṣe Spark:

sparkctl list -n {project}

Jẹ ki a ṣẹda apejuwe kan fun iṣẹ Spark:

vi spark-app.yaml

apiVersion: "sparkoperator.k8s.io/v1beta1"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: {project}
spec:
  type: Scala
  mode: cluster
  image: "gcr.io/spark-operator/spark:v2.4.0"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar"
  sparkVersion: "2.4.0"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1000m"
    memory: "512m"
    labels:
      version: 2.4.0
    serviceAccount: spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 2.4.0
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

Jẹ ki a ṣiṣẹ iṣẹ ti a ṣalaye nipa lilo sparkctl:

sparkctl create spark-app.yaml -n {project}

Jẹ ki a ṣayẹwo atokọ ti awọn iṣẹ ṣiṣe Spark:

sparkctl list -n {project}

Jẹ ki a ṣayẹwo atokọ ti awọn iṣẹlẹ ti iṣẹ-ṣiṣe Spark ti a ṣe ifilọlẹ:

sparkctl event spark-pi -n {project} -f

Jẹ ki a ṣayẹwo ipo iṣẹ Spark ti nṣiṣẹ:

sparkctl status spark-pi -n {project}

Ni ipari, Emi yoo fẹ lati gbero awọn aila-nfani ti a ṣe awari ti lilo ẹya iduroṣinṣin lọwọlọwọ ti Spark (2.4.5) ni Kubernetes:

  1. Akọkọ ati, boya, ailagbara akọkọ ni aini agbegbe Data. Pelu gbogbo awọn ailagbara ti YARN, awọn anfani tun wa lati lo, fun apẹẹrẹ, ilana ti jiṣẹ koodu si data (dipo data si koodu). Ṣeun si i, awọn iṣẹ-ṣiṣe Spark ni a ṣe lori awọn apa nibiti data ti o wa ninu awọn iṣiro wa, ati pe akoko ti o gba lati fi data ranṣẹ lori nẹtiwọọki naa dinku pupọ. Nigba lilo Kubernetes, a dojuko pẹlu iwulo lati gbe data ti o ni ipa ninu iṣẹ kan kọja nẹtiwọọki naa. Ti wọn ba tobi to, akoko ipaniyan iṣẹ le pọ si ni pataki, ati pe o tun nilo iye nla ti aaye disk ti a pin si awọn iṣẹlẹ iṣẹ-ṣiṣe Spark fun ibi ipamọ igba diẹ wọn. Alailanfani yii le dinku nipasẹ lilo sọfitiwia amọja ti o ṣe idaniloju agbegbe data ni Kubernetes (fun apẹẹrẹ, Alluxio), ṣugbọn eyi tumọ si iwulo lati ṣafipamọ ẹda pipe ti data lori awọn apa ti iṣupọ Kubernetes.
  2. Alailanfani pataki keji ni aabo. Nipa aiyipada, awọn ẹya ti o ni ibatan si aabo nipa ṣiṣiṣẹ awọn iṣẹ Spark jẹ alaabo, lilo Kerberos ko ni aabo ninu iwe aṣẹ (botilẹjẹpe awọn aṣayan ti o baamu ti ṣafihan ni ẹya 3.0.0, eyiti yoo nilo iṣẹ afikun), ati iwe aabo fun lilo Spark (https://spark.apache.org/docs/2.4.5/security.html) YARN nikan, Mesos ati Standalone Cluster han bi awọn ile itaja bọtini. Ni akoko kanna, olumulo labẹ ẹniti a ṣe ifilọlẹ awọn iṣẹ-ṣiṣe Spark ko le ṣe pato taara - a ṣe pato akọọlẹ iṣẹ nikan labẹ eyiti yoo ṣiṣẹ, ati pe olumulo ti yan da lori awọn eto imulo aabo tunto. Ni iyi yii, boya olumulo gbongbo ti lo, eyiti ko ni aabo ni agbegbe iṣelọpọ, tabi olumulo kan ti o ni UID laileto, eyiti ko ni irọrun nigba pinpin awọn ẹtọ iwọle si data (eyi le ṣee yanju nipasẹ ṣiṣẹda PodSecurityPolicies ati sisopọ wọn si awọn iroyin iṣẹ ti o baamu). Lọwọlọwọ, ojutu ni lati gbe gbogbo awọn faili pataki taara sinu aworan Docker, tabi yi iwe afọwọkọ ifilọlẹ Spark pada lati lo ẹrọ fun titoju ati gbigba awọn aṣiri ti o gba sinu eto rẹ.
  3. Ṣiṣe awọn iṣẹ Spark nipa lilo Kubernetes jẹ ifowosi tun wa ni ipo idanwo ati pe awọn ayipada pataki le wa ninu awọn ohun-ọṣọ ti a lo (awọn faili atunto, awọn aworan ipilẹ Docker, ati awọn iwe afọwọkọ ifilọlẹ) ni ọjọ iwaju. Ati nitootọ, nigba ti ngbaradi ohun elo, awọn ẹya 2.3.0 ati 2.4.5 ni idanwo, ihuwasi naa yatọ si pataki.

Jẹ ki a duro fun awọn imudojuiwọn - ẹya tuntun ti Spark (3.0.0) ti tu silẹ laipẹ, eyiti o mu awọn ayipada nla wa si iṣẹ Spark lori Kubernetes, ṣugbọn o ni idaduro ipo idanwo ti atilẹyin fun oluṣakoso orisun yii. Boya awọn imudojuiwọn atẹle yoo jẹ ki o ṣee ṣe lati ṣeduro ni kikun fifisilẹ YARN ati ṣiṣe awọn iṣẹ Spark lori Kubernetes laisi iberu fun aabo ti eto rẹ ati laisi iwulo lati yipada awọn paati iṣẹ ṣiṣe ni ominira.

Ipari.

orisun: www.habr.com

Fi ọrọìwòye kun