Running Apache Scintilla in Kubernetes

Carissimi lectores, Salve. Hodie pauca de Apache Spark et eius evolutionis exspectatione loquemur.

Running Apache Scintilla in Kubernetes

In mundo huius temporis Big Data, Apache Scintilla est vexillum de facto ad massam datorum processus munerum promovendam. Praeterea etiam adhibita est applicationes effusiores creare quae in notione batch microform, processus et naves data in parvis portionibus (Spark Structured Streaming). Et tradito tradito partem acervi Hadoop altioris, utens YARN (vel in aliquibus casibus Apache Mesos) pro facultate praeposita est. Per 2020, usus eius in forma tradito agitur de plerisque societatibus propter defectum honestorum distributionum Hadoop - HDP et CDH progressio cessavit, CDH non bene evolvit et magno pretio habet, reliqui autem Hadoop praebitorum habent. aut desiit esse aut obscura futura. Ideo immissio Apache Spark utens Kubernetes augere interest inter societates et magnas societates - vexillum fieri in orchestratione et administratione opum in privatis et publicis nubibus, problema solvit cum incommodis subsidiis scheduling of Spark munerum YARN ac praebet. constanter evolvens suggestum cum multis mercatoriis et apertis distributionibus pro societatibus omnium magnitudinum et plagarum. Praeterea, in motu popularitatis, plerique iam duas institutiones suas acquirere curaverunt et eorum peritiam in usu augerunt, qui motum simplificantem.

Cum versione 2.3.0, Apache Spark officialem sustentationem acquisivit ad currendum munia in botro Kubernetes et hodie, loquemur de recenti huius adventui maturitate, variae optiones pro usu et laqueis quae in exsequendo occurrent.

Ante omnia inspiciamus processum evolutionis officiorum et applicationum in Apache Scintilla et elucidanda causarum typicarum in quibus opus est ut munus in botro Kubernetes currere. In hoc posto apparando, OpenShift adhibetur ut distributio et mandata quae ad suum mandatum pertinet utilitatem rectam (oc) dabuntur. Ad alias distributiones Kubernetes, quae respondentia mandata a vexillum Kubernetes mandatum rectae utilitatis (kubectl) vel similitudinibus suis (exempli gratia pro oc adm retulerunt) adhiberi possunt.

Primum usus causa - scintilla-submittere

Per evolutionem operum et applicationum, opus est elit currere negotia ad transmutationem debug data. Theoretice Stipulae ad hos usus adhiberi possunt, sed progressiones cum participatione realium (etsi test) instantiarum rationum finium in hoc genere officiorum citius et melius esse probaverunt. In casu, cum instantias reales rationum finium incidimus, duo missiones possibilis sunt;

  • Elit decurrit munus scintillae localiter in standalone mode;

    Running Apache Scintilla in Kubernetes

  • Elit decurrit negotium Scintillae in botro Kubernetes in ansa test.

    Running Apache Scintilla in Kubernetes

Prima optio ius est esse, sed plura incommoda secumfert;

  • Quisque elit accessum praeberi debet ex operis ad omnes instantias rationum finis, quibus eget;
  • satis copia opum requiri in machina laborat ut negotium crescat.

Secunda optio haec incommoda non habet, quandoquidem usus Kubernetes botri permittit te collocare necessariam piscinam subsidiorum ad currendum negotium et praebendum necessariis accessibus ad finem systematis instantiarum, molliter praebens accessum ad illud utendo ad exemplar partes Kubernetes. omnes sodales eget dolor. In luceamus ut primus casus usus - scintillae immissionem munerum ex machina locali elit in Kubernetes botrum in ansa test.

Plus fama de processu scintillae erectionis ad currere localiter. Incipere usura Scintilla debes instituere eam:

mkdir /opt/spark
cd /opt/spark
wget http://mirror.linux-ia64.org/apache/spark/spark-2.4.5/spark-2.4.5.tgz
tar zxvf spark-2.4.5.tgz
rm -f spark-2.4.5.tgz

Fasciculos necessarios colligimus ad operandum cum Kubernetibus:

cd spark-2.4.5/
./build/mvn -Pkubernetes -DskipTests clean package

Plenum constructum multum temporis accipit, et imagines Docker creare et in botro Kubernetes currere, re vera tantum opus est fasciculis vasorum ex directorio "conventus/", ergo hoc subprojectum solum aedificare potes:

./build/mvn -f ./assembly/pom.xml -Pkubernetes -DskipTests clean package

Currere jobs in Kubernetes scintillare, debes imaginem Docker creare ut imagine turpi utaris. Hic aditus adsunt 2 possibilis:

  • Generatum Docker imaginem in codice operis scintillantis exsecutabile includit;
  • Imago creata solum Scintillam includit et dependentias necessarias, code exsecutabile remotum obsidetur (exempli gratia, in HDFS).

Primum aedificemus imaginem Docker in quo exemplum documenti scintillae molis est. Scintilla ad imagines Docker creare utilitatem habet quae "docker-image-instrumentum" vocatur. Studeamus auxilium in ea;

./bin/docker-image-tool.sh --help

Eius ope, imagines Docker creare potes easque in registra remotas inponere, sed defalta plura incommoda habet;

  • sine mendacio simul imagines efficit 3 Docker β€” pro Scintilla, PySpark et R;
  • non permittit te nomen imaginis designare.

Ideo modificatam versionem huius utilitatis quae infra dabimus, utemur.

vi bin/docker-image-tool-upd.sh

#!/usr/bin/env bash

function error {
  echo "$@" 1>&2
  exit 1
}

if [ -z "${SPARK_HOME}" ]; then
  SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi
. "${SPARK_HOME}/bin/load-spark-env.sh"

function image_ref {
  local image="$1"
  local add_repo="${2:-1}"
  if [ $add_repo = 1 ] && [ -n "$REPO" ]; then
    image="$REPO/$image"
  fi
  if [ -n "$TAG" ]; then
    image="$image:$TAG"
  fi
  echo "$image"
}

function build {
  local BUILD_ARGS
  local IMG_PATH

  if [ ! -f "$SPARK_HOME/RELEASE" ]; then
    IMG_PATH=$BASEDOCKERFILE
    BUILD_ARGS=(
      ${BUILD_PARAMS}
      --build-arg
      img_path=$IMG_PATH
      --build-arg
      datagram_jars=datagram/runtimelibs
      --build-arg
      spark_jars=assembly/target/scala-$SPARK_SCALA_VERSION/jars
    )
  else
    IMG_PATH="kubernetes/dockerfiles"
    BUILD_ARGS=(${BUILD_PARAMS})
  fi

  if [ -z "$IMG_PATH" ]; then
    error "Cannot find docker image. This script must be run from a runnable distribution of Apache Spark."
  fi

  if [ -z "$IMAGE_REF" ]; then
    error "Cannot find docker image reference. Please add -i arg."
  fi

  local BINDING_BUILD_ARGS=(
    ${BUILD_PARAMS}
    --build-arg
    base_img=$(image_ref $IMAGE_REF)
  )
  local BASEDOCKERFILE=${BASEDOCKERFILE:-"$IMG_PATH/spark/docker/Dockerfile"}

  docker build $NOCACHEARG "${BUILD_ARGS[@]}" 
    -t $(image_ref $IMAGE_REF) 
    -f "$BASEDOCKERFILE" .
}

function push {
  docker push "$(image_ref $IMAGE_REF)"
}

function usage {
  cat <<EOF
Usage: $0 [options] [command]
Builds or pushes the built-in Spark Docker image.

Commands:
  build       Build image. Requires a repository address to be provided if the image will be
              pushed to a different registry.
  push        Push a pre-built image to a registry. Requires a repository address to be provided.

Options:
  -f file               Dockerfile to build for JVM based Jobs. By default builds the Dockerfile shipped with Spark.
  -p file               Dockerfile to build for PySpark Jobs. Builds Python dependencies and ships with Spark.
  -R file               Dockerfile to build for SparkR Jobs. Builds R dependencies and ships with Spark.
  -r repo               Repository address.
  -i name               Image name to apply to the built image, or to identify the image to be pushed.  
  -t tag                Tag to apply to the built image, or to identify the image to be pushed.
  -m                    Use minikube's Docker daemon.
  -n                    Build docker image with --no-cache
  -b arg      Build arg to build or push the image. For multiple build args, this option needs to
              be used separately for each build arg.

Using minikube when building images will do so directly into minikube's Docker daemon.
There is no need to push the images into minikube in that case, they'll be automatically
available when running applications inside the minikube cluster.

Check the following documentation for more information on using the minikube Docker daemon:

  https://kubernetes.io/docs/getting-started-guides/minikube/#reusing-the-docker-daemon

Examples:
  - Build image in minikube with tag "testing"
    $0 -m -t testing build

  - Build and push image with tag "v2.3.0" to docker.io/myrepo
    $0 -r docker.io/myrepo -t v2.3.0 build
    $0 -r docker.io/myrepo -t v2.3.0 push
EOF
}

if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
  usage
  exit 0
fi

REPO=
TAG=
BASEDOCKERFILE=
NOCACHEARG=
BUILD_PARAMS=
IMAGE_REF=
while getopts f:mr:t:nb:i: option
do
 case "${option}"
 in
 f) BASEDOCKERFILE=${OPTARG};;
 r) REPO=${OPTARG};;
 t) TAG=${OPTARG};;
 n) NOCACHEARG="--no-cache";;
 i) IMAGE_REF=${OPTARG};;
 b) BUILD_PARAMS=${BUILD_PARAMS}" --build-arg "${OPTARG};;
 esac
done

case "${@: -1}" in
  build)
    build
    ;;
  push)
    if [ -z "$REPO" ]; then
      usage
      exit 1
    fi
    push
    ;;
  *)
    usage
    exit 1
    ;;
esac

Cum eius auxilio coniungimus imaginem praecipuam Spark in qua test negotium pro calculando Pi utendo Spark (hic {docker-subcriptio-url} est domicilium registri imaginis tui Docker, {repo} est nomen repositorii intra subcriptio. quod congruit propositio in OpenShift , nomen imaginis {image-nomen} (si trium graduum separatio imaginum adhibetur, exempli gratia, sicut in registro integrorum Red Hat OpenShift images), {tag} - tag huius versio imaginis);

./bin/docker-image-tool-upd.sh -f resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile -r {docker-registry-url}/{repo} -i {image-name} -t {tag} build

Log in ad botrum OKD utens utilitate solaris (hic {OKD-API-URL} est botrus OKD API URL);

oc login {OKD-API-URL}

Demus hodiernam user scriptor signum auctoritatis in Docker Subcriptio:

oc whoami -t

Log in ad internum Docker Subcriptio botri OKD (signo utimur consecuti utendo imperio praecedente sicut tessera);

docker login {docker-registry-url}

Docker simulacrum ad Docker Subcriptio OKD collectum imposuisti:

./bin/docker-image-tool-upd.sh -r {docker-registry-url}/{repo} -i {image-name} -t {tag} push

Compesceamus imaginem congregatam praesto esse in OKD. Ad hoc faciendum, Domicilium in navigatro aperi cum indice imaginum projecti respondentis (hic {project} nomen est botri projecti intra OpenShift, {OKD-WEBUI-URL} est domicilium consolatorium interretialem OpenShift ) - https://{OKD-WEBUI-URL}/console/project/{project}/browse/images/{image-name}.

Ad munia currere, ratio muneris creari debet cum privilegiis ut radix siliquae currens (de hoc puncto postea dicemus);

oc create sa spark -n {project}
oc adm policy add-scc-to-user anyuid -z spark -n {project}

Curramus mandatum scintillae ut scintillam negotium edat ad botrum OKD, specificans rationem servitii creati et imaginem Docker:

 /opt/spark/bin/spark-submit --name spark-test --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=3 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.namespace={project} --conf spark.submit.deployMode=cluster --conf spark.kubernetes.container.image={docker-registry-url}/{repo}/{image-name}:{tag} --conf spark.master=k8s://https://{OKD-API-URL}  local:///opt/spark/examples/target/scala-2.11/jars/spark-examples_2.11-2.4.5.jar

hic

- nomen - nomen muneris quod participabit formationis nomine Kubernetes siliquae;

- classis - genus fasciculi exsecutabilis, vocatus cum negotium incipit;

β€” conf β€” Scintilla configurationis parametri;

spark.executor.instances - numerus executorum scintillulae ad immittendum;

spark.kubernetes.authenticate.driver.serviceAccountName - nomen officii Kubernetes cum siliquis deducendis adhibitis (definire contextum securitatis et facultates cum mutuo cum Kubernetibus API);

spark.kubernetes.namespace - Spatium nominandi Kubernetes in quo siliquae exactoris et exsecutoris deducentur;

spark.submit.deployMode - methodus scintillam deducendi (pro vexillum scintillans submittere "botrus" dicitur, pro scintilla operans et post versiones Spark "cliens");

spark.kubernetes.container.image - imago Docker siliquas mittere solebat;

spark.master β€” Kubernetes API URL (externus specificatus tam accessus e machina locali occurrit);

local/ via est ad scintillam exsecutabile intra imaginem Docker.

Ad respondentem OKD inceptum imus et siliquas creatas perscrutamur - https://{OKD-WEBUI-URL}/console/project/{project}/browse/podes.

Ad simpliciorem processum evolutionis simpliciorem adhiberi potest alia optio, in qua basis communis imago Spark creatur, omnibus officiis ad currendum adhibita, et snapshots imaginum exsecutabilium ad repono externa (exempli gratia Hadoop) divulgantur et nominata vocantem. scintilla submittere nexum. Hoc in casu, varias versiones operum Scintillae currere potes sine imagines reaedificandi Docker, exempli gratia, WebHDFS ad imagines divulgandas. Rogationem mittimus ut tabellam creet (hic {hospes} est militia WebHDFS servitii, portus {portus} est servitutis WebHDFS portus, {iter-ad-file-in-hdfs} est iter desideratum ad tabella in HDFS);

curl -i -X PUT "http://{host}:{port}/webhdfs/v1/{path-to-file-on-hdfs}?op=CREATE

Responsum sic accipietis (hic {locus}} est domicilium quod tabella ad detrahendum adhibenda est);

HTTP/1.1 307 TEMPORARY_REDIRECT
Location: {location}
Content-Length: 0

Spark fasciculum exsecutabile onerate in HDFS (hic {iter-ad-sicut-locum}} est via ad scintillam exsecutabile currens hospes);

curl -i -X PUT -T {path-to-local-file} "{location}"

Postea scintillae submittere possumus utendo scintillae ad HDFS fasciculi impositi (hic {class-nomen} nomen est classis quae ad negotium conficiendum deduci debet);

/opt/spark/bin/spark-submit --name spark-test --class {class-name} --conf spark.executor.instances=3 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.namespace={project} --conf spark.submit.deployMode=cluster --conf spark.kubernetes.container.image={docker-registry-url}/{repo}/{image-name}:{tag} --conf spark.master=k8s://https://{OKD-API-URL}  hdfs://{host}:{port}/{path-to-file-on-hdfs}

Animadvertendum est quod ad HDFS accessum et opus operarum invigilandum, debes mutare Dockerfile et ingressum.sh script - directivum ad Dockerfile addere ad bibliothecas dependentes ad /opt/spark/jars directorium et ad exemplum includunt HDFS configurationis fasciculum in SPARK_CLASSPATH in aculeo.

Secundus usus casus - Apache Livy

Praeterea, cum negotium explicatur et effectus probatione indiget, quaestio oritur ut eam tamquam partem processus CI/CD deducendi et statum exsecutionis persequatur. Utique, potest currere eam utens localis scintillae submittere vocationis, sed hoc implicat infrastructuram CI/CD cum scintillam instruere et configurare postulat in agentibus CI ministris/cursoribus et accessum ad Kubernetes API statuere. Hoc in casu, scopum exsecutionis Apache Livium uti voluit ut CESTUM API ad munia currens Scintilla intra botrum Kubernetes hosted. Cum eius auxilio, potes scintillare officia in a Kubernetes botri currilis petitionibus regularibus utentibus, quae facile in quavis solutione CI fundatur, et eius collocatio intra Kubernetes botrum proventum authenticitatis solvit cum mutuo cum Kubernetibus API.

Running Apache Scintilla in Kubernetes

Illum exaggeremus ut alterum usum casus - scintillam currendo munia ut partem processus CI/CD super botrum Kubernetes in ansa test.

Paulo de Apache Livius - operatur sicut HTTP ministrator qui interfaciem interretialem praebet et API QUIESTUS qui te permittit ut remotissime scintillam submittere, parametros necessarios transeundo. Traditionaliter evectus est ut pars distributionis HDP, sed etiam explicari potest ad OKD vel ad quamlibet aliam Kubernetes institutionem adhibitis opportunis manifestis ac statuto imaginum Docker, qualis est hic - github.com/ttauveron/k8s-big-data-experiments/tree/master/livy-spark-2.3. Nostro casu similis imago Docker aedificata est, cum versione scintilla 2.4.5 ab sequenti Dockerfile:

FROM java:8-alpine

ENV SPARK_HOME=/opt/spark
ENV LIVY_HOME=/opt/livy
ENV HADOOP_CONF_DIR=/etc/hadoop/conf
ENV SPARK_USER=spark

WORKDIR /opt

RUN apk add --update openssl wget bash && 
    wget -P /opt https://downloads.apache.org/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz && 
    tar xvzf spark-2.4.5-bin-hadoop2.7.tgz && 
    rm spark-2.4.5-bin-hadoop2.7.tgz && 
    ln -s /opt/spark-2.4.5-bin-hadoop2.7 /opt/spark

RUN wget http://mirror.its.dal.ca/apache/incubator/livy/0.7.0-incubating/apache-livy-0.7.0-incubating-bin.zip && 
    unzip apache-livy-0.7.0-incubating-bin.zip && 
    rm apache-livy-0.7.0-incubating-bin.zip && 
    ln -s /opt/apache-livy-0.7.0-incubating-bin /opt/livy && 
    mkdir /var/log/livy && 
    ln -s /var/log/livy /opt/livy/logs && 
    cp /opt/livy/conf/log4j.properties.template /opt/livy/conf/log4j.properties

ADD livy.conf /opt/livy/conf
ADD spark-defaults.conf /opt/spark/conf/spark-defaults.conf
ADD entrypoint.sh /entrypoint.sh

ENV PATH="/opt/livy/bin:${PATH}"

EXPOSE 8998

ENTRYPOINT ["/entrypoint.sh"]
CMD ["livy-server"]

Generata imago edificari et imponi potest ad repositum existens Docker repositorium internum OKD repositorium. Ad eam explicandam, sequenti manifesta utere ({subcriptio-url} - URL subcriptio imaginis Docker, {image-nomen} - nomen imaginis Docker, {tag} - docker imaginis tag, {livy-url} - URL ubi desideratur server adiri Livio, "Route" manifesta adhibetur si Rubrum Hat OpenShift adhibetur ut Kubernetes distributio, alioquin respondens Ingress seu Service manifesta typi NodePort adhibetur):

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    component: livy
  name: livy
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      component: livy
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        component: livy
    spec:
      containers:
        - command:
            - livy-server
          env:
            - name: K8S_API_HOST
              value: localhost
            - name: SPARK_KUBERNETES_IMAGE
              value: 'gnut3ll4/spark:v1.0.14'
          image: '{registry-url}/{image-name}:{tag}'
          imagePullPolicy: Always
          name: livy
          ports:
            - containerPort: 8998
              name: livy-rest
              protocol: TCP
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /var/log/livy
              name: livy-log
            - mountPath: /opt/.livy-sessions/
              name: livy-sessions
            - mountPath: /opt/livy/conf/livy.conf
              name: livy-config
              subPath: livy.conf
            - mountPath: /opt/spark/conf/spark-defaults.conf
              name: spark-config
              subPath: spark-defaults.conf
        - command:
            - /usr/local/bin/kubectl
            - proxy
            - '--port'
            - '8443'
          image: 'gnut3ll4/kubectl-sidecar:latest'
          imagePullPolicy: Always
          name: kubectl
          ports:
            - containerPort: 8443
              name: k8s-api
              protocol: TCP
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: spark
      serviceAccountName: spark
      terminationGracePeriodSeconds: 30
      volumes:
        - emptyDir: {}
          name: livy-log
        - emptyDir: {}
          name: livy-sessions
        - configMap:
            defaultMode: 420
            items:
              - key: livy.conf
                path: livy.conf
            name: livy-config
          name: livy-config
        - configMap:
            defaultMode: 420
            items:
              - key: spark-defaults.conf
                path: spark-defaults.conf
            name: livy-config
          name: spark-config
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: livy-config
data:
  livy.conf: |-
    livy.spark.deploy-mode=cluster
    livy.file.local-dir-whitelist=/opt/.livy-sessions/
    livy.spark.master=k8s://http://localhost:8443
    livy.server.session.state-retain.sec = 8h
  spark-defaults.conf: 'spark.kubernetes.container.image        "gnut3ll4/spark:v1.0.14"'
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: livy
  name: livy
spec:
  ports:
    - name: livy-rest
      port: 8998
      protocol: TCP
      targetPort: 8998
  selector:
    component: livy
  sessionAffinity: None
  type: ClusterIP
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  labels:
    app: livy
  name: livy
spec:
  host: {livy-url}
  port:
    targetPort: livy-rest
  to:
    kind: Service
    name: livy
    weight: 100
  wildcardPolicy: None

Postea adhibitis et feliciter deductis vasculum, Livium graphice interface ad nexum praesto est: http://{livy-url}/ui. Apud Livium possumus evulgare negotium nostrum Scintillae utens recumbente petitione, exempli gratia, Postman. Exemplum collectionis cum petitionibus infra proponitur (configurationis argumenta cum variabilibus necessariis ad operationem operis immissi in "args" ordinata transiri possunt);

{
    "info": {
        "_postman_id": "be135198-d2ff-47b6-a33e-0d27b9dba4c8",
        "name": "Spark Livy",
        "schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json"
    },
    "item": [
        {
            "name": "1 Submit job with jar",
            "request": {
                "method": "POST",
                "header": [
                    {
                        "key": "Content-Type",
                        "value": "application/json"
                    }
                ],
                "body": {
                    "mode": "raw",
                    "raw": "{nt"file": "local:///opt/spark/examples/target/scala-2.11/jars/spark-examples_2.11-2.4.5.jar", nt"className": "org.apache.spark.examples.SparkPi",nt"numExecutors":1,nt"name": "spark-test-1",nt"conf": {ntt"spark.jars.ivy": "/tmp/.ivy",ntt"spark.kubernetes.authenticate.driver.serviceAccountName": "spark",ntt"spark.kubernetes.namespace": "{project}",ntt"spark.kubernetes.container.image": "{docker-registry-url}/{repo}/{image-name}:{tag}"nt}n}"
                },
                "url": {
                    "raw": "http://{livy-url}/batches",
                    "protocol": "http",
                    "host": [
                        "{livy-url}"
                    ],
                    "path": [
                        "batches"
                    ]
                }
            },
            "response": []
        },
        {
            "name": "2 Submit job without jar",
            "request": {
                "method": "POST",
                "header": [
                    {
                        "key": "Content-Type",
                        "value": "application/json"
                    }
                ],
                "body": {
                    "mode": "raw",
                    "raw": "{nt"file": "hdfs://{host}:{port}/{path-to-file-on-hdfs}", nt"className": "{class-name}",nt"numExecutors":1,nt"name": "spark-test-2",nt"proxyUser": "0",nt"conf": {ntt"spark.jars.ivy": "/tmp/.ivy",ntt"spark.kubernetes.authenticate.driver.serviceAccountName": "spark",ntt"spark.kubernetes.namespace": "{project}",ntt"spark.kubernetes.container.image": "{docker-registry-url}/{repo}/{image-name}:{tag}"nt},nt"args": [ntt"HADOOP_CONF_DIR=/opt/spark/hadoop-conf",ntt"MASTER=k8s://https://kubernetes.default.svc:8443"nt]n}"
                },
                "url": {
                    "raw": "http://{livy-url}/batches",
                    "protocol": "http",
                    "host": [
                        "{livy-url}"
                    ],
                    "path": [
                        "batches"
                    ]
                }
            },
            "response": []
        }
    ],
    "event": [
        {
            "listen": "prerequest",
            "script": {
                "id": "41bea1d0-278c-40c9-ad42-bf2e6268897d",
                "type": "text/javascript",
                "exec": [
                    ""
                ]
            }
        },
        {
            "listen": "test",
            "script": {
                "id": "3cdd7736-a885-4a2d-9668-bd75798f4560",
                "type": "text/javascript",
                "exec": [
                    ""
                ]
            }
        }
    ],
    "protocolProfileBehavior": {}
}

Primam petitionem ex collectione faciamus, ad OKD interfacem et vide ut munus feliciter emissum sit - https://{OKD-WEBUI-URL}/console/project/{project}/browse/podes. Eodem tempore, sessio apparebit apud Livium interface (http://{livy-url}/ui), in qua, Livio API vel graphico instrumento adhibito, processum operis et studii sessionis indagare potes. tigna.

Nunc quomodo Livius opera habeat ostendamus. Hoc ut facias, tigna Livii continens intra vasculum cum Livio servo - https://{ OKD-WEBUI-URL}/console/project/{project}/browse/pods/{livy-pod-name }?tab=logs. Ex his videre possumus Livium REST API cum in vase nomine "livy" vocaret, scintillam subiisse supplicium, cui simile est (hic {livy-pod-name}) vasculum creatum nomen. apud Livium Serv). Collectio etiam secundam quaestionem inducit quae te permittit ut officia currendi quae remotum exercitum scintilla exsecutabile utens a Livio servo.

Tertius usus casus - Scintilla Operator

Nunc probato munere, quaestio de cursu semper oritur. Patria via ad operas in botro Kubernetes regulariter currit est ens CronJob et eo uti potes, sed in momento usus operariorum ad applicationes in Kubernetes moderandas valde popularis est et pro Scintilla est satis mature operarius, qui etiam est. usus in solutionibus Enterprise-gradu (exempli gratia, Lightbend FastData rostris). Commendamus eam utendo - versionem stabilem currentis Spark (2.4.5) optiones conformationis magis limitatae ad currendum negotium in Kubernetes, dum altera maior versio (3.0.0) plenum subsidium Kubernetes declarat, sed eius emissio dies ignota manet . Scintilla Operator huic defectui compensat addendo optiones magni ponderis (exempli gratia: Ascendens ConfigMap cum Hadoop configurationem ad Scintillam siliquam) et facultatem currendi munus regulariter horarium.

Running Apache Scintilla in Kubernetes
Illum exaggeremus ut tertium usum casuum - scintillare munia regulariter currentem in botro Kubernetes in ansa productionis.

Scintilla Operator fons apertus est et elaboratus in Google Cloud Platform - github.com/GoogleCloudPlatform/spark-on-k8s-operator. Eius institutionem fieri potest in 3 modis:

  1. Ut pars Lightbend FastData Platform / Cloudflow installation;
  2. Usus Helm:
    helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
    helm install incubator/sparkoperator --namespace spark-operator
    	

  3. Usura manifestat e promptuario officiali (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/tree/master/manifest). Notatu dignum est sequenti - Cloudflow inclusa operator cum versione API v1beta1. Si hoc genus institutionis adhibetur, Spark applicatione manifestarum descriptionum in Git exemplo textuum inniti debent cum opportunitate API versionis, exempli gratia, "v1beta1-0.9.0-2.4.0". Auctoris versio in descriptione CRD comprehendi potest in operante in dictionary "versionibus":
    oc get crd sparkapplications.sparkoperator.k8s.io -o yaml
    	

Si operator recte installatur, vasculum activum cum Spark operatore apparebit in projecto respondente (exempli gratia: nubes-fdp-sparkoperator in spatio Cloudflow ad installationem Cloudflui) et respondet Kubernetes speciei resource nomine "sparkapplicationes" apparebit . Praesto explorare potes applicationes scintillae cum mandato sequenti:

oc get sparkapplications -n {project}

Ad operas currere usura Scintilla operans opus facere III res:

  • simulacrum Docker crea in quo omnes bibliothecas necessarias comprehendit, necnon schematisma et lima exsecutabile. In imagine scopo, haec est imago in scaena CI/CD creata et in botro experimento probata;
  • evulgare Docker imaginem ad registro pervium ex Kubernetes botro;
  • manifestam generant cum "SparkApplication" specie et descriptione negotii deducendi. Exemplum manifestat in promptuario officiali (v.g. github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/v1beta1-0.9.0-2.4.0/examples/spark-pi.yaml). Sunt notanda de manifesto momenti:
    1. thesaurus "apiVersion" indicare debet API versionem versioni operanti respondentem;
    2. the "metadata.namespace" dictionary must indicate the name space in which application will be launched;
    3. thesaurus "spec.image" continere debet inscriptionem imaginis Docker creatae in registro accessibilis;
    4. the "spec.mainClass" dictionary contineri debet scintillam negotium classis, quod currere debet cum processus incipit;
    5. the "spec.mainApplicationFile" dictionary continere debet iter ad fasciculi exsecutabile;
    6. the "spec.sparkVersion" dictionary indicare debet versionem Spark adhibitam esse;
    7. thesaurus "spec.driver.serviceAccount" servitii rationem denotare debet in spatio spatiorum respondentium Kubernetes quod applicatio ad currendum adhibebitur;
    8. thesaurus "spec.executor" indicare debet numerum facultatum applicationi partita;
    9. the "spec.volumeMounts" dictionary notare debet directorium localem in quo scintillatio localis ratis creabitur.

Exemplum generandi manifestam (hic {spark-service-account} est ratio muneris intra Kubernetes botrum pro scintilla currendi munia);

apiVersion: "sparkoperator.k8s.io/v1beta1"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: {project}
spec:
  type: Scala
  mode: cluster
  image: "gcr.io/spark-operator/spark:v2.4.0"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar"
  sparkVersion: "2.4.0"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 0.1
    coreLimit: "200m"
    memory: "512m"
    labels:
      version: 2.4.0
    serviceAccount: {spark-service-account}
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 2.4.0
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

Haec manifesta significat rationem officii pro qua, antequam manifesta divulgetur, partes necessarias ligaturas creare debes, quae necessarias accessus iura praebent ad applicationem cum Kubernetes API (si opus est). In casu nostro, applicatio iura ad Pods creandum indiget. Partes necessarias ligamen efficiamus:

oc adm policy add-role-to-user edit system:serviceaccount:{project}:{spark-service-account} -n {project}

Notatu etiam dignum est quod haec manifesta specificatio parametrum "hadoopConfigMap" includere potest, quod permittit te specificare configurationem cum configuratione Hadoop, quin prius tabellam in Docker imagine ponas. Convenit etiam ad officia currens regulariter - utens parametro "schedule", ratio certa ad currendum negotium datum potest.

Postea servamus limam nostram manifestam scintillae pi.yaml et ad botrum Kubernetes nostris applicamus;

oc apply -f spark-pi.yaml

Hoc genus objectum creabit "sparkapplicationes":

oc get sparkapplications -n {project}
> NAME       AGE
> spark-pi   22h

Hoc in casu, vasculum cum applicatione creabitur, cuius status in creatis "sparkapplicationibus" ostendetur. Potes hoc inspicere cum sequenti imperio;

oc get sparkapplications spark-pi -o yaml -n {project}

POD, munere perfecto, ad statum "Completum" movebitur, quod etiam in "sparkapplications" renovabit. Tigna applicatio spectari potest in navigatro vel sequenti mandato utens (hic {sparkapplicationes-pod-nomen} nomen est vasculi currit negotium);

oc logs {sparkapplications-pod-name} -n {project}

Scintillae functiones etiam tractari possunt utentes utilitates speciales sparkctl. Ut inaugurare eam, repositorium clone cum suo fonte codice, inaugurare vade et hanc utilitatem aedifica;

git clone https://github.com/GoogleCloudPlatform/spark-on-k8s-operator.git
cd spark-on-k8s-operator/
wget https://dl.google.com/go/go1.13.3.linux-amd64.tar.gz
tar -xzf go1.13.3.linux-amd64.tar.gz
sudo mv go /usr/local
mkdir $HOME/Projects
export GOROOT=/usr/local/go
export GOPATH=$HOME/Projects
export PATH=$GOPATH/bin:$GOROOT/bin:$PATH
go -version
cd sparkctl
go build -o sparkctl
sudo mv sparkctl /usr/local/bin

Investigemus ordinem scintillae currentis operum:

sparkctl list -n {project}

Faciamus descriptionem scintillae negotium:

vi spark-app.yaml

apiVersion: "sparkoperator.k8s.io/v1beta1"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: {project}
spec:
  type: Scala
  mode: cluster
  image: "gcr.io/spark-operator/spark:v2.4.0"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar"
  sparkVersion: "2.4.0"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1000m"
    memory: "512m"
    labels:
      version: 2.4.0
    serviceAccount: spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 2.4.0
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

Curramus negotium descripsit utens sparkctl;

sparkctl create spark-app.yaml -n {project}

Investigemus ordinem scintillae currentis operum:

sparkctl list -n {project}

Investigemus ordinem rerum in negotium scintillae immissae:

sparkctl event spark-pi -n {project} -f

Perscrutemur statum currentis scintillae negotium:

sparkctl status spark-pi -n {project}

Denique inventa incommoda considerare vellem utendi versionis stabilis hodiernae Scintillae (2.4.5) in Kubernetes:

  1. Prima et fortasse incommodum est locorum notitiae defectus. Quamvis omnia defectus YARN, etiam commoda utendo eo, exempli gratia, principium tradendi codicem ad notas (potius quam ad codicem datas). Ob eam, Scintilla officia in nodis fiebat ubi notitia calculi implicata erat, et tempus accepit ut notitias tradendas in retiaculis signanter reducta erat. Cum Kubernetes utentes, obvium sumus cum necessitate ut notitias movendi quae in retiaculis involvuntur. Si satis magna sunt, munus exsecutionis tempus signanter augere potest, et etiam exiguum magnum spatium orbis partita instantiarum pro temporis repositionis munere scintillare. Hoc incommodum mitigari potest adhibitis programmatibus specialioribus quae in Kubernetes (exempli gratia Alluxio) data locat, sed hoc actu significat necessitatem condere exemplar notitiarum in nodis Kubernetes botri.
  2. Secundum incommodum grave est securitas. Defalta, lineamenta securitatis relatas circa scintillationem munerum debilium, usus Kerberos non est opertus in documentis officialibus (quamvis optiones respondentes in versione 3.0.0 introductae sunt, quae opus additicium requirent), et documentum securitatis ad Scintilla utens (https://spark.apache.org/docs/2.4.5/security.html) solum YARN, Mesos et Standalone botrus apparent sicut thesauri clavis. Eodem tempore, usor sub quo Spark munera immittuntur, directe specificari non potest - tantum rationem servitutis exprimimus sub qua laborabit, et usor seligitur secundum rationes securitatis conformatae. Hac in re, vel usus radicis usus est, qui in ambitu gignente non est tutus, vel cum usore temere UID, quod incommodum est cum accessum iuris ad notitias distribuendo (hoc solvi potest per PodSecurityPolicies creando et cum illis ligando. muneris rationum correspondentes). In praesenti, solutio est vel omnia necessaria lima directe in imaginem Docker collocare, vel scriptam scintillam demittere ut mechanismum adhibeat ad secreta recondenda et recuperanda, quae in tuo ordine suscepta sunt.
  3. Currens Spark jobs utens Kubernetes publice adhuc in modo experimentali est et possunt esse mutationes significantes in artificialibus adhibitis (limae figurationis, imagines turpia Docker, et scripta deducendi) in futurum. Et re vera, cum materiam pararemus, versiones 2.3.0 et 2.4.5 probatae sunt, mores significanter diversae sunt.

Exspectemus updates - nova versio Spark (3.0.0) nuper dimissa est, quae notabiles mutationes operi Spark in Kubernetes intulit, sed experimentalem statum subsidii huius subsidii procurator retinuit. Fortasse sequentia updates re vera efficere poterunt ut plene suadent deserenda NARRATIO et scintilla currendi negotia in Kubernetes sine timore propter securitatem systematis vestri et sine necessitate ad partes functiones independenter modificandas.

Finis.

Source: www.habr.com