ื”ืคืขืœืช Apache Spark ื‘-Kubernetes

ืงื•ืจืื™ื ื™ืงืจื™ื, ืฆื”ืจื™ื™ื ื˜ื•ื‘ื™ื. ื”ื™ื•ื ื ื“ื‘ืจ ืงืฆืช ืขืœ Apache Spark ื•ืขืœ ืกื™ื›ื•ื™ื™ ื”ืคื™ืชื•ื— ืฉืœื•.

ื”ืคืขืœืช Apache Spark ื‘-Kubernetes

ื‘ืขื•ืœื ื”ืžื•ื“ืจื ื™ ืฉืœ Big Data, Apache Spark ื”ื•ื ื”ืชืงืŸ ื“ื” ืคืงื˜ื• ืœืคื™ืชื•ื— ืžืฉื™ืžื•ืช ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ืืฆื•ื•ื”. ื‘ื ื•ืกืฃ, ื”ื•ื ืžืฉืžืฉ ื’ื ืœื™ืฆื™ืจืช ืืคืœื™ืงืฆื™ื•ืช ืกื˜ืจื™ืžื™ื ื’ ืฉืขื•ื‘ื“ื•ืช ื‘ืงื•ื ืกืคื˜ ื”-micro batch, ืขื™ื‘ื•ื“ ื•ืžืฉืœื•ื— ื ืชื•ื ื™ื ื‘ืžื ื•ืช ืงื˜ื ื•ืช (Spark Structured Streaming). ื•ื‘ืื•ืคืŸ ืžืกื•ืจืชื™ ื–ื” ื”ื™ื” ื—ืœืง ืžื”ืžื—ืกื ื™ืช ื”ื›ื•ืœืœืช ืฉืœ Hadoop, ืชื•ืš ืฉื™ืžื•ืฉ ื‘-YARN (ืื• ื‘ืžืงืจื™ื ืžืกื•ื™ืžื™ื Apache Mesos) ื›ืžื ื”ืœ ื”ืžืฉืื‘ื™ื. ืขื“ ืฉื ืช 2020, ื”ืฉื™ืžื•ืฉ ื‘ื• ื‘ืžืชื›ื•ื ืชื• ื”ืžืกื•ืจืชื™ืช ืžื•ื˜ืœ ื‘ืกืคืง ืขื‘ื•ืจ ืจื•ื‘ ื”ื—ื‘ืจื•ืช ื‘ื’ืœืœ ื”ื™ืขื“ืจ ื”ืคืฆื•ืช ื”ื’ื•ื ื•ืช ืฉืœ Hadoop - ื”ืคื™ืชื•ื— ืฉืœ HDP ื•-CDH ื ืขืฆืจ, CDH ืื™ื ื• ืžืคื•ืชื— ื”ื™ื˜ื‘ ื•ืขืœื•ืชื• ื’ื‘ื•ื”ื”, ื•ืฉืืจ ืกืคืงื™ Hadoop ืื• ืฉื”ืคืกื™ืง ืœื”ืชืงื™ื™ื ืื• ืฉื™ืฉ ืœื• ืขืชื™ื“ ืขืžื•ื. ืœื›ืŸ, ื”ืฉืงืช Apache Spark ื‘ืืžืฆืขื•ืช Kubernetes ืžืขื•ืจืจืช ืขื ื™ื™ืŸ ื”ื•ืœืš ื•ื’ื•ื‘ืจ ื‘ืงืจื‘ ื”ืงื”ื™ืœื” ื•ื—ื‘ืจื•ืช ื’ื“ื•ืœื•ืช - ื›ืฉื”ื™ื ื”ื•ืคื›ืช ืœืกื˜ื ื“ืจื˜ ื‘ืชื–ืžื•ืจ ืงื•ื ื˜ื™ื™ื ืจื™ื ื•ื ื™ื”ื•ืœ ืžืฉืื‘ื™ื ื‘ืขื ื ื™ื ืคืจื˜ื™ื™ื ื•ืฆื™ื‘ื•ืจื™ื™ื, ื”ื™ื ืคื•ืชืจืช ืืช ื”ื‘ืขื™ื” ื‘ืชื–ืžื•ืŸ ืžืฉืื‘ื™ื ืœื ื ื•ื— ืฉืœ ืžืฉื™ืžื•ืช Spark ื‘-YARN ื•ืžืกืคืงืช ืคืœื˜ืคื•ืจืžื” ืžืชืคืชื—ืช ื‘ื”ืชืžื“ื” ืขื ื”ืคืฆื•ืช ืžืกื—ืจื™ื•ืช ื•ืคืชื•ื—ื•ืช ืจื‘ื•ืช ืœื—ื‘ืจื•ืช ื‘ื›ืœ ื”ื’ื“ืœื™ื ื•ื”ืคืกื™ื. ื‘ื ื•ืกืฃ, ื‘ืขืงื‘ื•ืช ื”ืคื•ืคื•ืœืจื™ื•ืช, ืจื•ื‘ื ื›ื‘ืจ ื”ืฆืœื™ื—ื• ืœืจื›ื•ืฉ ื›ืžื” ืžืชืงื ื™ื ืžืฉืœื”ื ื•ื”ื’ื“ื™ืœื• ืืช ื”ืžื•ืžื—ื™ื•ืช ืฉืœื”ื ื‘ืฉื™ืžื•ืฉ ื‘ื•, ืžื” ืฉืžืคืฉื˜ ืืช ื”ืžื”ืœืš.

ื”ื—ืœ ืžื’ืจืกื” 2.3.0, Apache Spark ืจื›ืฉื” ืชืžื™ื›ื” ืจืฉืžื™ืช ื‘ื”ืคืขืœืช ืžืฉื™ืžื•ืช ื‘ืืฉื›ื•ืœ Kubernetes ื•ื”ื™ื•ื, ื ื“ื‘ืจ ืขืœ ื”ื‘ืฉืœื•ืช ื”ื ื•ื›ื—ื™ืช ืฉืœ ื’ื™ืฉื” ื–ื•, ืืคืฉืจื•ื™ื•ืช ืฉื•ื ื•ืช ืœืฉื™ืžื•ืฉ ื‘ื” ื•ืขืœ ื”ืžืœื›ื•ื“ื•ืช ืฉื™ื™ืชืงืœื• ื‘ืžื”ืœืš ื”ื™ื™ืฉื•ื.

ืงื•ื“ื ื›ืœ, ื‘ื•ืื• ื ืกืชื›ืœ ืขืœ ืชื”ืœื™ืš ืคื™ืชื•ื— ืžืฉื™ืžื•ืช ื•ื™ื™ืฉื•ืžื™ื ื”ืžื‘ื•ืกืกื™ื ืขืœ Apache Spark ื•ื ื“ื’ื™ืฉ ืžืงืจื™ื ื˜ื™ืคื•ืกื™ื™ื ื‘ื”ื ืฆืจื™ืš ืœื”ืจื™ืฅ ืžืฉื™ืžื” ืขืœ ืืฉื›ื•ืœ Kubernetes. ื‘ื”ื›ื ืช ืคื•ืกื˜ ื–ื”, OpenShift ืžืฉืžืฉ ื›ื”ืคืฆื” ื•ื™ื™ื ืชื ื• ืคืงื•ื“ื•ืช ืจืœื•ื•ื ื˜ื™ื•ืช ืœื›ืœื™ ื”ืฉื™ืจื•ืช ืฉืœ ืฉื•ืจืช ื”ืคืงื•ื“ื” (oc). ืขื‘ื•ืจ ื”ืคืฆื•ืช ืื—ืจื•ืช ืฉืœ Kubernetes, ื ื™ืชืŸ ืœื”ืฉืชืžืฉ ื‘ืคืงื•ื“ื•ืช ื”ืžืชืื™ืžื•ืช ืžืฉื™ืจื•ืช ืฉื•ืจืช ื”ืคืงื•ื“ื” ื”ืกื˜ื ื“ืจื˜ื™ ืฉืœ Kubernetes (kubectl) ืื• ื‘ืื ืœื•ื’ื™ื ืฉืœื”ืŸ (ืœื“ื•ื’ืžื”, ืขื‘ื•ืจ ืžื“ื™ื ื™ื•ืช oc adm).

ืžืงืจื” ืฉื™ืžื•ืฉ ืจืืฉื•ืŸ - spark-submit

ื‘ืžื”ืœืš ื”ืคื™ืชื•ื— ืฉืœ ืžืฉื™ืžื•ืช ื•ื™ื™ืฉื•ืžื™ื, ื”ืžืคืชื— ืฆืจื™ืš ืœื”ืคืขื™ืœ ืžืฉื™ืžื•ืช ืœื ื™ืคื•ื™ ื‘ืื’ื™ื ื‘ื˜ืจื ืกืคื•ืจืžืฆื™ื” ืฉืœ ื ืชื•ื ื™ื. ืชื™ืื•ืจื˜ื™ืช, ื ื™ืชืŸ ืœื”ืฉืชืžืฉ ื‘ืกื˜ืื‘ ืœืžื˜ืจื•ืช ืืœื•, ืืš ืคื™ืชื•ื— ื”ื›ื•ืœืœ ืžื•ืคืขื™ื ืืžื™ืชื™ื™ื (ืื ื›ื™ ืžื‘ื—ืŸ) ืฉืœ ืžืขืจื›ื•ืช ืงืฆื” ื”ื•ื›ื™ื— ืืช ืขืฆืžื• ื›ืžื”ื™ืจ ื•ื˜ื•ื‘ ื™ื•ืชืจ ื‘ืžืขืžื“ ื–ื” ืฉืœ ืžืฉื™ืžื•ืช. ื‘ืžืงืจื” ื‘ื• ืื ื• ืžื ืคื™ื ื‘ืื’ื™ื ื‘ืžื•ืคืขื™ื ืืžื™ืชื™ื™ื ืฉืœ ืžืขืจื›ื•ืช ืงืฆื”, ืฉื ื™ ืชืจื—ื™ืฉื™ื ืืคืฉืจื™ื™ื:

  • ื”ืžืคืชื— ืžืจื™ืฅ ืžืฉื™ืžืช Spark ื‘ืื•ืคืŸ ืžืงื•ืžื™ ื‘ืžืฆื‘ ืขืฆืžืื™;

    ื”ืคืขืœืช Apache Spark ื‘-Kubernetes

  • ืžืคืชื— ืžืจื™ืฅ ืžืฉื™ืžืช Spark ืขืœ ืืฉื›ื•ืœ Kubernetes ื‘ืœื•ืœืืช ื‘ื“ื™ืงื”.

    ื”ืคืขืœืช Apache Spark ื‘-Kubernetes

ืœืืคืฉืจื•ืช ื”ืจืืฉื•ื ื” ื™ืฉ ื–ื›ื•ืช ืงื™ื•ื, ืืš ื˜ื•ืžื ืช ื‘ื—ื•ื‘ื” ืžืกืคืจ ื—ืกืจื•ื ื•ืช:

  • ื™ืฉ ืœืกืคืง ืœื›ืœ ืžืคืชื— ื’ื™ืฉื” ืžืžืงื•ื ื”ืขื‘ื•ื“ื” ืœื›ืœ ื”ืžื•ืคืขื™ื ืฉืœ ืžืขืจื›ื•ืช ื”ืงืฆื” ืฉื”ื•ื ืฆืจื™ืš;
  • ื ื“ืจืฉืช ื›ืžื•ืช ืžืกืคืงืช ืฉืœ ืžืฉืื‘ื™ื ื‘ืžื›ื•ื ื” ื”ืขื•ื‘ื“ืช ื›ื“ื™ ืœื”ืคืขื™ืœ ืืช ื”ืžืฉื™ืžื” ื”ืžืคื•ืชื—ืช.

ืœืืคืฉืจื•ืช ื”ืฉื ื™ื™ื” ืื™ืŸ ืืช ื”ื—ืกืจื•ื ื•ืช ื”ืœืœื•, ืฉื›ืŸ ื”ืฉื™ืžื•ืฉ ื‘ืืฉื›ื•ืœ Kubernetes ืžืืคืฉืจ ืœืš ืœื”ืงืฆื•ืช ืืช ืžืื’ืจ ื”ืžืฉืื‘ื™ื ื”ื“ืจื•ืฉ ืœื”ืคืขืœืช ืžืฉื™ืžื•ืช ื•ืœืกืคืง ืœื• ืืช ื”ื’ื™ืฉื” ื”ื“ืจื•ืฉื” ืœืžื•ืคืขื™ ืงืฆื” ืฉืœ ืžืขืจื›ืช, ืชื•ืš ืžืชืŸ ื’ื™ืฉื” ืืœื™ื• ื‘ืฆื•ืจื” ื’ืžื™ืฉื” ื‘ืืžืฆืขื•ืช ืžื•ื“ืœ ืœื—ื™ืงื•ื™ ืฉืœ Kubernetes ืขื‘ื•ืจ ื›ืœ ื—ื‘ืจื™ ืฆื•ื•ืช ื”ืคื™ืชื•ื—. ื‘ื•ืื• ื ื“ื’ื™ืฉ ืืช ื–ื” ื›ืžืงืจื” ื”ืฉื™ืžื•ืฉ ื”ืจืืฉื•ืŸ - ื”ืฉืงืช ืžืฉื™ืžื•ืช Spark ืžืžื—ืฉื‘ ืžืคืชื— ืžืงื•ืžื™ ืขืœ ืืฉื›ื•ืœ Kubernetes ื‘ืžืขื’ืœ ื‘ื“ื™ืงื”.

ื‘ื•ืื• ื ื“ื‘ืจ ื™ื•ืชืจ ืขืœ ืชื”ืœื™ืš ื”ื’ื“ืจืช Spark ืœื”ืคืขืœื” ืžืงื•ืžื™ืช. ื›ื“ื™ ืœื”ืชื—ื™ืœ ืœื”ืฉืชืžืฉ ื‘-Spark, ืขืœื™ืš ืœื”ืชืงื™ืŸ ืื•ืชื•:

mkdir /opt/spark
cd /opt/spark
wget http://mirror.linux-ia64.org/apache/spark/spark-2.4.5/spark-2.4.5.tgz
tar zxvf spark-2.4.5.tgz
rm -f spark-2.4.5.tgz

ืื ื• ืื•ืกืคื™ื ืืช ื”ื—ื‘ื™ืœื•ืช ื”ื“ืจื•ืฉื•ืช ืœืขื‘ื•ื“ื” ืขื Kubernetes:

cd spark-2.4.5/
./build/mvn -Pkubernetes -DskipTests clean package

ื‘ื ื™ื™ื” ืžืœืื” ืœื•ืงื—ืช ื”ืจื‘ื” ื–ืžืŸ, ื•ื›ื“ื™ ืœื™ืฆื•ืจ ืชืžื•ื ื•ืช Docker ื•ืœื”ืจื™ืฅ ืื•ืชืŸ ื‘ืืฉื›ื•ืœ Kubernetes, ืืชื” ื‘ืืžืช ืฆืจื™ืš ืจืง ืงื‘ืฆื™ jar ืžืกืคืจื™ื™ืช "assembly/", ืื– ืืชื” ื™ื›ื•ืœ ืœื‘ื ื•ืช ืจืง ืชืช-ืคืจื•ื™ืงื˜ ื–ื”:

./build/mvn -f ./assembly/pom.xml -Pkubernetes -DskipTests clean package

ื›ื“ื™ ืœื”ืคืขื™ืœ ืขื‘ื•ื“ื•ืช Spark ื‘-Kubernetes, ืขืœื™ืš ืœื™ืฆื•ืจ ืชืžื•ื ืช Docker ืฉืชืฉืžืฉ ื›ืชืžื•ื ืช ื‘ืกื™ืก. ื™ืฉ ื›ืืŸ 2 ื’ื™ืฉื•ืช ืืคืฉืจื™ื•ืช:

  • ืชืžื•ื ืช Docker ืฉื ื•ืฆืจื” ื›ื•ืœืœืช ืืช ืงื•ื“ ื”ืžืฉื™ืžื” Spark ื”ื ื™ืชืŸ ืœื”ืคืขืœื”;
  • ื”ืชืžื•ื ื” ืฉื ื•ืฆืจื” ื›ื•ืœืœืช ืจืง ืืช Spark ื•ืืช ื”ืชืœื•ืช ื”ื ื“ืจืฉืช, ืงื•ื“ ื”ื”ืคืขืœื” ืžืชืืจื— ืžืจื—ื•ืง (ืœื“ื•ื’ืžื”, ื‘-HDFS).

ืจืืฉื™ืช, ื‘ื•ืื• ื ื‘ื ื” ืชืžื•ื ืช Docker ื”ืžื›ื™ืœื” ื“ื•ื’ืžื” ืœื‘ื“ื™ืงื” ืฉืœ ืžืฉื™ืžืช Spark. ื›ื“ื™ ืœื™ืฆื•ืจ ืชืžื•ื ื•ืช Docker, ืœ-Spark ื™ืฉ ื›ืœื™ ืขื–ืจ ืฉื ืงืจื "docker-image-tool". ื‘ื•ืื• ืœืœืžื•ื“ ืืช ื”ืขื–ืจื” ืขืœ ื–ื”:

./bin/docker-image-tool.sh --help

ื‘ืขื–ืจืชื• ืชื•ื›ืœื• ืœื™ืฆื•ืจ ืชืžื•ื ื•ืช Docker ื•ืœื”ืขืœื•ืช ืื•ืชืŸ ืœืจื™ืฉื•ื ืžืจื•ื—ืงื™ื, ืืš ื›ื‘ืจื™ืจืช ืžื—ื“ืœ ื™ืฉ ืœื” ืžืกืคืจ ื—ืกืจื•ื ื•ืช:

  • ื‘ืœื™ ืœื”ื™ื›ืฉืœ ื™ื•ืฆืจ 3 ืชืžื•ื ื•ืช Docker ื‘ื‘ืช ืื—ืช - ืขื‘ื•ืจ Spark, PySpark ื•-R;
  • ืื™ื ื• ืžืืคืฉืจ ืœืš ืœืฆื™ื™ืŸ ืฉื ืชืžื•ื ื”.

ืœื›ืŸ, ืื ื• ื ืฉืชืžืฉ ื‘ื’ืจืกื” ืฉื•ื ื” ืฉืœ ื›ืœื™ ื”ืฉื™ืจื•ืช ื”ืžื•ืคื™ืข ืœื”ืœืŸ:

vi bin/docker-image-tool-upd.sh

#!/usr/bin/env bash

function error {
  echo "$@" 1>&2
  exit 1
}

if [ -z "${SPARK_HOME}" ]; then
  SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi
. "${SPARK_HOME}/bin/load-spark-env.sh"

function image_ref {
  local image="$1"
  local add_repo="${2:-1}"
  if [ $add_repo = 1 ] && [ -n "$REPO" ]; then
    image="$REPO/$image"
  fi
  if [ -n "$TAG" ]; then
    image="$image:$TAG"
  fi
  echo "$image"
}

function build {
  local BUILD_ARGS
  local IMG_PATH

  if [ ! -f "$SPARK_HOME/RELEASE" ]; then
    IMG_PATH=$BASEDOCKERFILE
    BUILD_ARGS=(
      ${BUILD_PARAMS}
      --build-arg
      img_path=$IMG_PATH
      --build-arg
      datagram_jars=datagram/runtimelibs
      --build-arg
      spark_jars=assembly/target/scala-$SPARK_SCALA_VERSION/jars
    )
  else
    IMG_PATH="kubernetes/dockerfiles"
    BUILD_ARGS=(${BUILD_PARAMS})
  fi

  if [ -z "$IMG_PATH" ]; then
    error "Cannot find docker image. This script must be run from a runnable distribution of Apache Spark."
  fi

  if [ -z "$IMAGE_REF" ]; then
    error "Cannot find docker image reference. Please add -i arg."
  fi

  local BINDING_BUILD_ARGS=(
    ${BUILD_PARAMS}
    --build-arg
    base_img=$(image_ref $IMAGE_REF)
  )
  local BASEDOCKERFILE=${BASEDOCKERFILE:-"$IMG_PATH/spark/docker/Dockerfile"}

  docker build $NOCACHEARG "${BUILD_ARGS[@]}" 
    -t $(image_ref $IMAGE_REF) 
    -f "$BASEDOCKERFILE" .
}

function push {
  docker push "$(image_ref $IMAGE_REF)"
}

function usage {
  cat <<EOF
Usage: $0 [options] [command]
Builds or pushes the built-in Spark Docker image.

Commands:
  build       Build image. Requires a repository address to be provided if the image will be
              pushed to a different registry.
  push        Push a pre-built image to a registry. Requires a repository address to be provided.

Options:
  -f file               Dockerfile to build for JVM based Jobs. By default builds the Dockerfile shipped with Spark.
  -p file               Dockerfile to build for PySpark Jobs. Builds Python dependencies and ships with Spark.
  -R file               Dockerfile to build for SparkR Jobs. Builds R dependencies and ships with Spark.
  -r repo               Repository address.
  -i name               Image name to apply to the built image, or to identify the image to be pushed.  
  -t tag                Tag to apply to the built image, or to identify the image to be pushed.
  -m                    Use minikube's Docker daemon.
  -n                    Build docker image with --no-cache
  -b arg      Build arg to build or push the image. For multiple build args, this option needs to
              be used separately for each build arg.

Using minikube when building images will do so directly into minikube's Docker daemon.
There is no need to push the images into minikube in that case, they'll be automatically
available when running applications inside the minikube cluster.

Check the following documentation for more information on using the minikube Docker daemon:

  https://kubernetes.io/docs/getting-started-guides/minikube/#reusing-the-docker-daemon

Examples:
  - Build image in minikube with tag "testing"
    $0 -m -t testing build

  - Build and push image with tag "v2.3.0" to docker.io/myrepo
    $0 -r docker.io/myrepo -t v2.3.0 build
    $0 -r docker.io/myrepo -t v2.3.0 push
EOF
}

if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
  usage
  exit 0
fi

REPO=
TAG=
BASEDOCKERFILE=
NOCACHEARG=
BUILD_PARAMS=
IMAGE_REF=
while getopts f:mr:t:nb:i: option
do
 case "${option}"
 in
 f) BASEDOCKERFILE=${OPTARG};;
 r) REPO=${OPTARG};;
 t) TAG=${OPTARG};;
 n) NOCACHEARG="--no-cache";;
 i) IMAGE_REF=${OPTARG};;
 b) BUILD_PARAMS=${BUILD_PARAMS}" --build-arg "${OPTARG};;
 esac
done

case "${@: -1}" in
  build)
    build
    ;;
  push)
    if [ -z "$REPO" ]; then
      usage
      exit 1
    fi
    push
    ;;
  *)
    usage
    exit 1
    ;;
esac

ื‘ืขื–ืจืชื•, ืื ื• ืžืจื›ื™ื‘ื™ื ืชืžื•ื ืช Spark ื‘ืกื™ืกื™ืช ื”ืžื›ื™ืœื” ืžืฉื™ืžืช ื‘ื“ื™ืงื” ืœื—ื™ืฉื•ื‘ Pi ื‘ืืžืฆืขื•ืช Spark (ื›ืืŸ {docker-registry-url} ื”ื™ื ื›ืชื•ื‘ืช ื”ืืชืจ ืฉืœ ืจื™ืฉื•ื ื”ืชืžื•ื ื•ืช ืฉืœ Docker, {repo} ื”ื•ื ืฉื ื”ืžืื’ืจ ื‘ืชื•ืš ื”ืจื™ืฉื•ื, ืฉืžืชืื™ื ืœืคืจื•ื™ื™ืงื˜ ื‘-OpenShift , {image-name} - ืฉื ื”ืชืžื•ื ื” (ืื ื ืขืฉื” ืฉื™ืžื•ืฉ ื‘ื”ืคืจื“ื” ืชืœืช-ืžืคืœืกื™ืช ืฉืœ ืชืžื•ื ื•ืช, ืœืžืฉืœ, ื›ืžื• ื‘ืจื™ืฉื•ื ื”ืžืฉื•ืœื‘ ืฉืœ Red Hat OpenShift ืชืžื•ื ื•ืช), {tag} - ืชื’ ืฉืœ ื–ื” ื’ืจืกื” ืฉืœ ื”ืชืžื•ื ื”):

./bin/docker-image-tool-upd.sh -f resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile -r {docker-registry-url}/{repo} -i {image-name} -t {tag} build

ื”ื™ื›ื ืก ืœืืฉื›ื•ืœ OKD ื‘ืืžืฆืขื•ืช ื›ืœื™ ื”ืฉื™ืจื•ืช ื”ืžืกื•ืฃ (ื›ืืŸ {OKD-API-URL} ื”ื™ื ื›ืชื•ื‘ืช ื”-URL ืฉืœ OKD ืืฉื›ื•ืœ API):

oc login {OKD-API-URL}

ื‘ื•ืื• ื ืฉื™ื’ ืืช ื”ืืกื™ืžื•ืŸ ืฉืœ ื”ืžืฉืชืžืฉ ื”ื ื•ื›ื—ื™ ืœื”ืจืฉืื” ื‘-Docker Registry:

oc whoami -t

ื”ื™ื›ื ืก ืœ-Docker Registry ื”ืคื ื™ืžื™ ืฉืœ ืืฉื›ื•ืœ OKD (ืื ื• ืžืฉืชืžืฉื™ื ื‘ืืกื™ืžื•ืŸ ืฉื”ื•ืฉื’ ื‘ืืžืฆืขื•ืช ื”ืคืงื•ื“ื” ื”ืงื•ื“ืžืช ื›ืกื™ืกืžื”):

docker login {docker-registry-url}

ื‘ื•ืื• ื ืขืœื” ืืช ืชืžื•ื ืช Docker ื”ืžื•ืจื›ื‘ืช ืœ-Docker Registry OKD:

./bin/docker-image-tool-upd.sh -r {docker-registry-url}/{repo} -i {image-name} -t {tag} push

ื‘ื•ืื• ื ื‘ื“ื•ืง ืฉื”ืชืžื•ื ื” ื”ืžื•ืจื›ื‘ืช ื–ืžื™ื ื” ื‘-OKD. ื›ื“ื™ ืœืขืฉื•ืช ื–ืืช, ืคืชื— ืืช ื›ืชื•ื‘ืช ื”ืืชืจ ื‘ื“ืคื“ืคืŸ ืขื ืจืฉื™ืžืช ืชืžื•ื ื•ืช ืฉืœ ื”ืคืจื•ื™ืงื˜ ื”ืžืชืื™ื (ื›ืืŸ {project} ื”ื•ื ืฉื ื”ืคืจื•ื™ืงื˜ ื‘ืชื•ืš ืืฉื›ื•ืœ OpenShift, {OKD-WEBUI-URL} ื”ื•ื ื›ืชื•ื‘ืช ื”ืืชืจ ืฉืœ ืžืกื•ืฃ ื”ืื™ื ื˜ืจื ื˜ ืฉืœ OpenShift ) - https://{OKD-WEBUI-URL}/console /project/{project}/browse/images/{image-name}.

ื›ื“ื™ ืœื”ืคืขื™ืœ ืžืฉื™ืžื•ืช, ื™ืฉ ืœื™ืฆื•ืจ ื—ืฉื‘ื•ืŸ ืฉื™ืจื•ืช ืขื ื”ืจืฉืื•ืช ืœื”ืคืขื™ืœ ืคื•ื“ื™ื ื›-root (ื ื“ื•ืŸ ื‘ื ืงื•ื“ื” ื–ื• ืžืื•ื—ืจ ื™ื•ืชืจ):

oc create sa spark -n {project}
oc adm policy add-scc-to-user anyuid -z spark -n {project}

ื”ื‘ื” ื ืจื™ืฅ ืืช ื”ืคืงื•ื“ื” spark-submit ื›ื“ื™ ืœืคืจืกื ืžืฉื™ืžืช Spark ืœืืฉื›ื•ืœ OKD, ืชื•ืš ืฆื™ื•ืŸ ื—ืฉื‘ื•ืŸ ื”ืฉื™ืจื•ืช ืฉื ื•ืฆืจ ื•ืชืžื•ื ืช Docker:

 /opt/spark/bin/spark-submit --name spark-test --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=3 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.namespace={project} --conf spark.submit.deployMode=cluster --conf spark.kubernetes.container.image={docker-registry-url}/{repo}/{image-name}:{tag} --conf spark.master=k8s://https://{OKD-API-URL}  local:///opt/spark/examples/target/scala-2.11/jars/spark-examples_2.11-2.4.5.jar

ะ—ะดะตััŒ:

โ€”ืฉื โ€” ืฉื ื”ืžืฉื™ืžื” ืฉืชืฉืชืชืฃ ื‘ื”ื™ื•ื•ืฆืจื•ืช ื”ืฉื ืฉืœ ืชืจืžื™ืœื™ Kubernetes;

โ€”class โ€” ืžื—ืœืงื” ืฉืœ ืงื•ื‘ืฅ ื”ื”ืคืขืœื”, ื”ื ืงืจืืช ื›ืืฉืจ ื”ืžืฉื™ืžื” ืžืชื—ื™ืœื”;

โ€”conf โ€” ืคืจืžื˜ืจื™ ืชืฆื•ืจืช Spark;

spark.executor.instances - ืžืกืคืจ ื”ืžื•ืฆื™ืื™ื ืœืคื•ืขืœ ืฉืœ Spark ืœื”ืคืขืœื”;

spark.kubernetes.authenticate.driver.serviceAccountName - ืฉื ื—ืฉื‘ื•ืŸ ื”ืฉื™ืจื•ืช Kubernetes ื”ืžืฉืžืฉ ื‘ืขืช ื”ืคืขืœืช ืคื•ื“ื™ื (ื›ื“ื™ ืœื”ื’ื“ื™ืจ ืืช ื”ืงืฉืจ ื”ืื‘ื˜ื—ื” ื•ื”ื™ื›ื•ืœื•ืช ื‘ืขืช ืื™ื ื˜ืจืืงืฆื™ื” ืขื ื”-API ืฉืœ Kubernetes);

spark.kubernetes.namespace - ืžืจื—ื‘ ื”ืฉืžื•ืช ืฉืœ Kubernetes ืฉื‘ื• ื™ื•ืคืขืœื• ืชืจืžื™ืœื™ื ืฉืœ ืžื ื”ืœื™ ื”ืชืงื ื™ื ื•ืžื‘ืฆืขื™ื;

spark.submit.deployMode - ืฉื™ื˜ืช ื”ืฉืงืช Spark (ืขื‘ื•ืจ ื”-Spark-submit ืกื˜ื ื“ืจื˜ื™ ื ืขืฉื” ืฉื™ืžื•ืฉ ื‘-"cluster", ืขื‘ื•ืจ Spark Operator ื•ื’ืจืกืื•ืช ืžืื•ื—ืจื•ืช ื™ื•ืชืจ ืฉืœ Spark "client");

spark.kubernetes.container.image - ืชืžื•ื ืช Docker ื”ืžืฉืžืฉืช ืœื”ืคืขืœืช ืคื•ื“ื™ื;

spark.master - URL ืฉืœ Kubernetes API (ื—ื™ืฆื•ื ื™ ืžืฆื•ื™ืŸ ื›ืš ืฉื”ื’ื™ืฉื” ืžืชืจื—ืฉืช ืžื”ืžื—ืฉื‘ ื”ืžืงื•ืžื™);

local:// ื”ื•ื ื”ื ืชื™ื‘ ืœืงื•ื‘ืฅ ื”ื”ืคืขืœื” ืฉืœ Spark ื‘ืชื•ืš ืชืžื•ื ืช Docker.

ืื ื—ื ื• ื”ื•ืœื›ื™ื ืœืคืจื•ื™ืงื˜ OKD ื”ืžืชืื™ื ื•ืœื•ืžื“ื™ื ืืช ื”ืคื•ื“ื™ื ืฉื ื•ืฆืจื• - https://{OKD-WEBUI-URL}/console/project/{project}/browse/pods.

ื›ื“ื™ ืœืคืฉื˜ ืืช ืชื”ืœื™ืš ื”ืคื™ืชื•ื—, ื ื™ืชืŸ ืœื”ืฉืชืžืฉ ื‘ืืคืฉืจื•ืช ื ื•ืกืคืช, ื‘ื” ื ื•ืฆืจืช ืชืžื•ื ืช ื‘ืกื™ืก ืžืฉื•ืชืคืช ืฉืœ Spark, ื”ืžืฉืžืฉืช ืืช ื›ืœ ื”ืžืฉื™ืžื•ืช ืœื”ืคืขืœื”, ื•ืชืžื•ื ื•ืช ืžืฆื‘ ืฉืœ ืงื‘ืฆื™ ื”ืคืขืœื” ืžืชืคืจืกืžื•ืช ืœืื—ืกื•ืŸ ื—ื™ืฆื•ื ื™ (ืœื“ื•ื’ืžื”, Hadoop) ื•ืžืฆื•ื™ื ื•ืช ื‘ืขืช ืงืจื™ืื” spark-submit ื›ืงื™ืฉื•ืจ. ื‘ืžืงืจื” ื–ื”, ืืชื” ื™ื›ื•ืœ ืœื”ืจื™ืฅ ื’ืจืกืื•ืช ืฉื•ื ื•ืช ืฉืœ ืžืฉื™ืžื•ืช Spark ืžื‘ืœื™ ืœื‘ื ื•ืช ืžื—ื“ืฉ ืชืžื•ื ื•ืช Docker, ื‘ืืžืฆืขื•ืช, ืœืžืฉืœ, WebHDFS ื›ื“ื™ ืœืคืจืกื ืชืžื•ื ื•ืช. ืื ื• ืฉื•ืœื—ื™ื ื‘ืงืฉื” ืœื™ืฆื™ืจืช ืงื•ื‘ืฅ (ื›ืืŸ {host} ื”ื•ื ื”ืžืืจื— ืฉืœ ืฉื™ืจื•ืช WebHDFS, {port} ื”ื•ื ื”ื™ืฆื™ืื” ืฉืœ ืฉื™ืจื•ืช WebHDFS, {path-to-file-on-hdfs} ื”ื•ื ื”ื ืชื™ื‘ ื”ืจืฆื•ื™ ืœืงื•ื‘ืฅ ื‘-HDFS):

curl -i -X PUT "http://{host}:{port}/webhdfs/v1/{path-to-file-on-hdfs}?op=CREATE

ืชืงื‘ืœ ืชื’ื•ื‘ื” ื›ื–ื• (ื”ื ื” {location} ื”ื™ื ื›ืชื•ื‘ืช ื”ืืชืจ ืฉื™ืฉ ืœื”ืฉืชืžืฉ ื‘ื” ื›ื“ื™ ืœื”ื•ืจื™ื“ ืืช ื”ืงื•ื‘ืฅ):

HTTP/1.1 307 TEMPORARY_REDIRECT
Location: {location}
Content-Length: 0

ื˜ืขืŸ ืืช ืงื•ื‘ืฅ ื”ื”ืคืขืœื” ืฉืœ Spark ืœืชื•ืš HDFS (ื›ืืŸ {path-to-local-file} ื”ื•ื ื”ื ืชื™ื‘ ืœืงื•ื‘ืฅ ื”ื”ืคืขืœื” ืฉืœ Spark ื‘ืžืืจื— ื”ื ื•ื›ื—ื™):

curl -i -X PUT -T {path-to-local-file} "{location}"

ืœืื—ืจ ืžื›ืŸ, ื ื•ื›ืœ ืœื‘ืฆืข spark-submit ื‘ืืžืฆืขื•ืช ืงื•ื‘ืฅ ื”-Spark ืฉื”ื•ืขืœื” ืœ-HDFS (ื›ืืŸ {class-name} ื”ื•ื ืฉื ื”ืžื—ืœืงื” ืฉืฆืจื™ืš ืœื”ืคืขื™ืœ ื›ื“ื™ ืœื”ืฉืœื™ื ืืช ื”ืžืฉื™ืžื”):

/opt/spark/bin/spark-submit --name spark-test --class {class-name} --conf spark.executor.instances=3 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.namespace={project} --conf spark.submit.deployMode=cluster --conf spark.kubernetes.container.image={docker-registry-url}/{repo}/{image-name}:{tag} --conf spark.master=k8s://https://{OKD-API-URL}  hdfs://{host}:{port}/{path-to-file-on-hdfs}

ื™ืฉ ืœืฆื™ื™ืŸ ืฉื›ื“ื™ ืœื’ืฉืช ืœ-HDFS ื•ืœื”ื‘ื˜ื™ื— ืฉื”ืžืฉื™ืžื” ืขื•ื‘ื“ืช, ื™ื™ืชื›ืŸ ืฉื™ื”ื™ื” ืขืœื™ืš ืœืฉื ื•ืช ืืช ื”- Dockerfile ื•ืืช ื”ืกืงืจื™ืคื˜ entrypoint.sh - ื”ื•ืกืฃ ื”ื ื—ื™ื” ืœ- Dockerfile ืœื”ืขืชืงืช ืกืคืจื™ื•ืช ืชืœื•ื™ื•ืช ืœืกืคืจื™ื™ืช /opt/spark/jars ื• ื›ืœื•ืœ ืืช ืงื•ื‘ืฅ ื”ืชืฆื•ืจื” HDFS ื‘-SPARK_CLASSPATH ื‘ื ืงื•ื“ืช ื”ื›ื ื™ืกื”. sh.

ืžืงืจื” ืฉื™ืžื•ืฉ ืฉื ื™ - Apache Livy

ื™ืชืจื” ืžื›ืš, ื›ืืฉืจ ืžืคืชื—ื™ื ืžืฉื™ืžื” ื•ื™ืฉ ืœื‘ื“ื•ืง ืืช ื”ืชื•ืฆืื”, ืขื•ืœื” ื”ืฉืืœื” ืฉืœ ื”ืฉืงืชื” ื›ื—ืœืง ืžืชื”ืœื™ืš ื”-CI/CD ื•ืžืขืงื‘ ืื—ืจ ืžืฆื‘ ื‘ื™ืฆื•ืขื”. ื›ืžื•ื‘ืŸ, ืืชื” ื™ื›ื•ืœ ืœื”ืคืขื™ืœ ืื•ืชื• ื‘ืืžืฆืขื•ืช ืงืจื™ืื” ืžืงื•ืžื™ืช ืฉืœ spark-submit, ืื‘ืœ ื–ื” ืžืกื‘ืš ืืช ืชืฉืชื™ืช ื”-CI/CD ืžื›ื™ื•ื•ืŸ ืฉื”ื™ื ื“ื•ืจืฉืช ื”ืชืงื ื” ื•ื”ื’ื“ืจื” ืฉืœ Spark ืขืœ ืกื•ื›ื ื™/ืจืฆื™ื ืฉืœ ืฉืจืช CI ื•ื”ื’ื“ืจืช ื’ื™ืฉื” ืœ-Kubernetes API. ื‘ืžืงืจื” ื–ื”, ื”ื˜ืžืขืช ื”ื™ืขื“ ื‘ื—ืจื” ืœื”ืฉืชืžืฉ ื‘- Apache Livy ื‘ืชื•ืจ REST API ืœื”ืคืขืœืช ืžืฉื™ืžื•ืช Spark ื”ืžืชืืจื—ื•ืช ื‘ืชื•ืš ืืฉื›ื•ืœ Kubernetes. ื‘ืขื–ืจืชื• ืชื•ื›ืœื• ืœื”ืจื™ืฅ ืžืฉื™ืžื•ืช Spark ืขืœ ืืฉื›ื•ืœ Kubernetes ื‘ืืžืฆืขื•ืช ื‘ืงืฉื•ืช cURL ืจื’ื™ืœื•ืช, ืืฉืจ ืžื™ื•ืฉืžื•ืช ื‘ืงืœื•ืช ื‘ื”ืชื‘ืกืก ืขืœ ื›ืœ ืคืชืจื•ืŸ CI, ื•ื”ืžื™ืงื•ื ืฉืœื” ื‘ืชื•ืš ืืฉื›ื•ืœ Kubernetes ืคื•ืชืจ ืืช ื‘ืขื™ื™ืช ื”ืื™ืžื•ืช ื‘ืขืช ืื™ื ื˜ืจืืงืฆื™ื” ืขื Kubernetes API.

ื”ืคืขืœืช Apache Spark ื‘-Kubernetes

ื‘ื•ืื• ื ื“ื’ื™ืฉ ืืช ื–ื” ื›ืžืงืจื” ืฉื™ืžื•ืฉ ืฉื ื™ - ื”ืคืขืœืช ืžืฉื™ืžื•ืช Spark ื›ื—ืœืง ืžืชื”ืœื™ืš CI/CD ืขืœ ืืฉื›ื•ืœ Kubernetes ื‘ืœื•ืœืืช ื‘ื“ื™ืงื”.

ืงืฆืช ืขืœ Apache Livy - ื”ื•ื ืขื•ื‘ื“ ื›ืฉืจืช HTTP ื”ืžืกืคืง ืžืžืฉืง ืื™ื ื˜ืจื ื˜ ื•- RESTful API ื”ืžืืคืฉืจ ืœื”ืคืขื™ืœ ืžืจื—ื•ืง spark-submit ืขืœ ื™ื“ื™ ื”ืขื‘ืจืช ื”ืคืจืžื˜ืจื™ื ื”ื“ืจื•ืฉื™ื. ื‘ืื•ืคืŸ ืžืกื•ืจืชื™ ื”ื•ื ื ืฉืœื— ื›ื—ืœืง ืžื”ืคืฆืช HDP, ืืš ื ื™ืชืŸ ื’ื ืœืคืจื•ืก ืื•ืชื• ืœ-OKD ืื• ืœื›ืœ ื”ืชืงื ื” ืื—ืจืช ืฉืœ Kubernetes ื‘ืืžืฆืขื•ืช ื”ืžื ื™ืคืกื˜ ื”ืžืชืื™ื ื•ืžืขืจื›ืช ืฉืœ ืชืžื•ื ื•ืช Docker, ื›ืžื• ื–ื• - github.com/ttauveron/k8s-big-data-experiments/tree/master/livy-spark-2.3. ื‘ืžืงืจื” ืฉืœื ื•, ื ื‘ื ืชื” ืชืžื•ื ืช Docker ื“ื•ืžื”, ื›ื•ืœืœ Spark ื’ืจืกื” 2.4.5 ืžื”-Dockerfile ื”ื‘ื:

FROM java:8-alpine

ENV SPARK_HOME=/opt/spark
ENV LIVY_HOME=/opt/livy
ENV HADOOP_CONF_DIR=/etc/hadoop/conf
ENV SPARK_USER=spark

WORKDIR /opt

RUN apk add --update openssl wget bash && 
    wget -P /opt https://downloads.apache.org/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz && 
    tar xvzf spark-2.4.5-bin-hadoop2.7.tgz && 
    rm spark-2.4.5-bin-hadoop2.7.tgz && 
    ln -s /opt/spark-2.4.5-bin-hadoop2.7 /opt/spark

RUN wget http://mirror.its.dal.ca/apache/incubator/livy/0.7.0-incubating/apache-livy-0.7.0-incubating-bin.zip && 
    unzip apache-livy-0.7.0-incubating-bin.zip && 
    rm apache-livy-0.7.0-incubating-bin.zip && 
    ln -s /opt/apache-livy-0.7.0-incubating-bin /opt/livy && 
    mkdir /var/log/livy && 
    ln -s /var/log/livy /opt/livy/logs && 
    cp /opt/livy/conf/log4j.properties.template /opt/livy/conf/log4j.properties

ADD livy.conf /opt/livy/conf
ADD spark-defaults.conf /opt/spark/conf/spark-defaults.conf
ADD entrypoint.sh /entrypoint.sh

ENV PATH="/opt/livy/bin:${PATH}"

EXPOSE 8998

ENTRYPOINT ["/entrypoint.sh"]
CMD ["livy-server"]

ื ื™ืชืŸ ืœื‘ื ื•ืช ื•ืœื”ืขืœื•ืช ืืช ื”ืชืžื•ื ื” ืฉื ื•ืฆืจื” ืœืžืื’ืจ Docker ื”ืงื™ื™ื ืฉืœืš, ื›ื’ื•ืŸ ืžืื’ืจ OKD ื”ืคื ื™ืžื™. ื›ื“ื™ ืœืคืจื•ืก ืื•ืชื•, ื”ืฉืชืžืฉ ื‘ืžื ื™ืคืกื˜ ื”ื‘ื ({registry-url} - ื›ืชื•ื‘ืช ื”ืืชืจ ืฉืœ ืจื™ืฉื•ื ื”ืชืžื•ื ื•ืช ืฉืœ Docker, {image-name} - ืฉื ืชืžื•ื ืช Docker, {tag} - ืชื’ ื”ืชืžื•ื ื” ืฉืœ Docker, {livy-url} - ื›ืชื•ื‘ืช ื”ืืชืจ ื”ืจืฆื•ื™ื” ืฉื‘ื” ื”ืฉืจืช ื™ื”ื™ื” ื ื’ื™ืฉ Livy; ื”ืžื ื™ืคืกื˜ "Route" ืžืฉืžืฉ ืื Red Hat OpenShift ืžืฉืžืฉ ื›ื”ืคืฆื” Kubernetes, ืื—ืจืช ื ืขืฉื” ืฉื™ืžื•ืฉ ื‘ืžื ื™ืคืกื˜ Ingress ืื• Service ื”ืžืงื‘ื™ืœ ืžืกื•ื’ NodePort):

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    component: livy
  name: livy
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      component: livy
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        component: livy
    spec:
      containers:
        - command:
            - livy-server
          env:
            - name: K8S_API_HOST
              value: localhost
            - name: SPARK_KUBERNETES_IMAGE
              value: 'gnut3ll4/spark:v1.0.14'
          image: '{registry-url}/{image-name}:{tag}'
          imagePullPolicy: Always
          name: livy
          ports:
            - containerPort: 8998
              name: livy-rest
              protocol: TCP
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /var/log/livy
              name: livy-log
            - mountPath: /opt/.livy-sessions/
              name: livy-sessions
            - mountPath: /opt/livy/conf/livy.conf
              name: livy-config
              subPath: livy.conf
            - mountPath: /opt/spark/conf/spark-defaults.conf
              name: spark-config
              subPath: spark-defaults.conf
        - command:
            - /usr/local/bin/kubectl
            - proxy
            - '--port'
            - '8443'
          image: 'gnut3ll4/kubectl-sidecar:latest'
          imagePullPolicy: Always
          name: kubectl
          ports:
            - containerPort: 8443
              name: k8s-api
              protocol: TCP
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: spark
      serviceAccountName: spark
      terminationGracePeriodSeconds: 30
      volumes:
        - emptyDir: {}
          name: livy-log
        - emptyDir: {}
          name: livy-sessions
        - configMap:
            defaultMode: 420
            items:
              - key: livy.conf
                path: livy.conf
            name: livy-config
          name: livy-config
        - configMap:
            defaultMode: 420
            items:
              - key: spark-defaults.conf
                path: spark-defaults.conf
            name: livy-config
          name: spark-config
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: livy-config
data:
  livy.conf: |-
    livy.spark.deploy-mode=cluster
    livy.file.local-dir-whitelist=/opt/.livy-sessions/
    livy.spark.master=k8s://http://localhost:8443
    livy.server.session.state-retain.sec = 8h
  spark-defaults.conf: 'spark.kubernetes.container.image        "gnut3ll4/spark:v1.0.14"'
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: livy
  name: livy
spec:
  ports:
    - name: livy-rest
      port: 8998
      protocol: TCP
      targetPort: 8998
  selector:
    component: livy
  sessionAffinity: None
  type: ClusterIP
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  labels:
    app: livy
  name: livy
spec:
  host: {livy-url}
  port:
    targetPort: livy-rest
  to:
    kind: Service
    name: livy
    weight: 100
  wildcardPolicy: None

ืœืื—ืจ ื”ื—ืœืชื• ื•ื”ืฉืงืช ื”ืคื•ื“ ื‘ื”ืฆืœื—ื”, ื”ืžืžืฉืง ื”ื’ืจืคื™ ืฉืœ Livy ื–ืžื™ืŸ ื‘ืงื™ืฉื•ืจ: http://{livy-url}/ui. ืขื Livy, ื ื•ื›ืœ ืœืคืจืกื ืืช ืžืฉื™ืžืช ื”-Spark ืฉืœื ื• ื‘ืืžืฆืขื•ืช ื‘ืงืฉืช REST ืฉืœ, ืœืžืฉืœ, Postman. ื“ื•ื’ืžื” ืœืื•ืกืฃ ืขื ื‘ืงืฉื•ืช ืžื•ืฆื’ืช ืœื”ืœืŸ (ืืคืฉืจ ืœื”ืขื‘ื™ืจ ืืจื’ื•ืžื ื˜ื™ื ืฉืœ ืชืฆื•ืจื” ืขื ืžืฉืชื ื™ื ื”ื ื—ื•ืฆื™ื ืœืคืขื•ืœืช ื”ืžืฉื™ืžื” ืฉื”ื•ืฉืงื” ื‘ืžืขืจืš "args"):

{
    "info": {
        "_postman_id": "be135198-d2ff-47b6-a33e-0d27b9dba4c8",
        "name": "Spark Livy",
        "schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json"
    },
    "item": [
        {
            "name": "1 Submit job with jar",
            "request": {
                "method": "POST",
                "header": [
                    {
                        "key": "Content-Type",
                        "value": "application/json"
                    }
                ],
                "body": {
                    "mode": "raw",
                    "raw": "{nt"file": "local:///opt/spark/examples/target/scala-2.11/jars/spark-examples_2.11-2.4.5.jar", nt"className": "org.apache.spark.examples.SparkPi",nt"numExecutors":1,nt"name": "spark-test-1",nt"conf": {ntt"spark.jars.ivy": "/tmp/.ivy",ntt"spark.kubernetes.authenticate.driver.serviceAccountName": "spark",ntt"spark.kubernetes.namespace": "{project}",ntt"spark.kubernetes.container.image": "{docker-registry-url}/{repo}/{image-name}:{tag}"nt}n}"
                },
                "url": {
                    "raw": "http://{livy-url}/batches",
                    "protocol": "http",
                    "host": [
                        "{livy-url}"
                    ],
                    "path": [
                        "batches"
                    ]
                }
            },
            "response": []
        },
        {
            "name": "2 Submit job without jar",
            "request": {
                "method": "POST",
                "header": [
                    {
                        "key": "Content-Type",
                        "value": "application/json"
                    }
                ],
                "body": {
                    "mode": "raw",
                    "raw": "{nt"file": "hdfs://{host}:{port}/{path-to-file-on-hdfs}", nt"className": "{class-name}",nt"numExecutors":1,nt"name": "spark-test-2",nt"proxyUser": "0",nt"conf": {ntt"spark.jars.ivy": "/tmp/.ivy",ntt"spark.kubernetes.authenticate.driver.serviceAccountName": "spark",ntt"spark.kubernetes.namespace": "{project}",ntt"spark.kubernetes.container.image": "{docker-registry-url}/{repo}/{image-name}:{tag}"nt},nt"args": [ntt"HADOOP_CONF_DIR=/opt/spark/hadoop-conf",ntt"MASTER=k8s://https://kubernetes.default.svc:8443"nt]n}"
                },
                "url": {
                    "raw": "http://{livy-url}/batches",
                    "protocol": "http",
                    "host": [
                        "{livy-url}"
                    ],
                    "path": [
                        "batches"
                    ]
                }
            },
            "response": []
        }
    ],
    "event": [
        {
            "listen": "prerequest",
            "script": {
                "id": "41bea1d0-278c-40c9-ad42-bf2e6268897d",
                "type": "text/javascript",
                "exec": [
                    ""
                ]
            }
        },
        {
            "listen": "test",
            "script": {
                "id": "3cdd7736-a885-4a2d-9668-bd75798f4560",
                "type": "text/javascript",
                "exec": [
                    ""
                ]
            }
        }
    ],
    "protocolProfileBehavior": {}
}

ื‘ื•ื ื ื‘ืฆืข ืืช ื”ื‘ืงืฉื” ื”ืจืืฉื•ื ื” ืžื”ืื•ืกืฃ, ื ืขื‘ื•ืจ ืœืžืžืฉืง OKD ื•ื ื‘ื“ื•ืง ืฉื”ืžืฉื™ืžื” ื”ื•ืฉืงื” ื‘ื”ืฆืœื—ื” - https://{OKD-WEBUI-URL}/console/project/{project}/browse/pods. ื‘ืžืงื‘ื™ืœ, ื™ื•ืคื™ืข ื”ืคืขืœื” ื‘ืžืžืฉืง Livy (http://{livy-url}/ui), ืฉื‘ืชื•ื›ื•, ื‘ืืžืฆืขื•ืช ื”-Livy API ืื• ื”ืžืžืฉืง ื”ื’ืจืคื™, ืชื•ื›ืœื• ืœืขืงื•ื‘ ืื—ืจ ื”ืชืงื“ืžื•ืช ื”ืžืฉื™ืžื” ื•ืœืœืžื•ื“ ืืช ื”ื”ืคืขืœื” ื™ื•ืžื ื™ื.

ืขื›ืฉื™ื• ื‘ื•ืื• ื ืจืื” ืื™ืš ืœื™ื‘ื™ ืขื•ื‘ื“. ืœืฉื ื›ืš, ื‘ื•ืื• ื ื‘ื—ืŸ ืืช ื”ื™ื•ืžื ื™ื ืฉืœ ืžื™ื›ืœ Livy ื‘ืชื•ืš ื”ืคื•ื“ ืขื ืฉืจืช Livy - https://{OKD-WEBUI-URL}/console/project/{project}/browse/pods/{livy-pod-name }?tab=logs. ืžื”ื ืื ื• ื™ื›ื•ืœื™ื ืœืจืื•ืช ืฉื›ืืฉืจ ืงื•ืจืื™ื ืœ-Livy REST API ื‘ืžื™ื›ืœ ื‘ืฉื "livy", ืžืชื‘ืฆืขืช spark-submit, ื‘ื“ื•ืžื” ืœื–ื” ืฉื”ืฉืชืžืฉื ื• ืœืžืขืœื” (ื›ืืŸ {livy-pod-name} ื”ื•ื ื”ืฉื ืฉืœ ื”ืคื•ื“ ืฉื ื•ืฆืจ ืขื ืฉืจืช Livy). ื”ืื•ืกืฃ ืžืฆื™ื’ ื’ื ืฉืื™ืœืชื” ืฉื ื™ื™ื” ื”ืžืืคืฉืจืช ืœืš ืœื”ืจื™ืฅ ืžืฉื™ืžื•ืช ื”ืžืืจื—ื•ืช ืžืจื—ื•ืง ืงื•ื‘ืฅ ื”ืคืขืœื” ืฉืœ Spark ื‘ืืžืฆืขื•ืช ืฉืจืช Livy.

ืžืงืจื” ืฉื™ืžื•ืฉ ืฉืœื™ืฉื™ - Spark Operator

ื›ืขืช, ืœืื—ืจ ืฉื”ืžืฉื™ืžื” ื ื‘ื“ืงื”, ืขื•ืœื” ื”ืฉืืœื” ืฉืœ ื”ืคืขืœืชื” ื‘ืื•ืคืŸ ืงื‘ื•ืข. ื”ื“ืจืš ื”ืžืงื•ืจื™ืช ืœื”ืจื™ืฅ ืžืฉื™ืžื•ืช ื‘ืื•ืคืŸ ืงื‘ื•ืข ื‘ืืฉื›ื•ืœ Kubernetes ื”ื™ื ื”ื™ืฉื•ืช CronJob ื•ืืชื” ื™ื›ื•ืœ ืœื”ืฉืชืžืฉ ื‘ื”, ืื‘ืœ ื›ืจื’ืข ื”ืฉื™ืžื•ืฉ ื‘ืื•ืคืจื˜ื•ืจื™ื ืœื ื™ื”ื•ืœ ื™ื™ืฉื•ืžื™ื ื‘- Kubernetes ื”ื•ื ืžืื•ื“ ืคื•ืคื•ืœืจื™ ื•ืขื‘ื•ืจ Spark ื™ืฉ ืื•ืคืจื˜ื•ืจ ื“ื™ ื‘ื•ื’ืจ, ืฉื”ื•ื ื’ื ืžืฉืžืฉ ื‘ืคืชืจื•ื ื•ืช ื‘ืจืžืช ื”ืืจื’ื•ืŸ (ืœื“ื•ื’ืžื”, Lightbend FastData Platform). ืื ื• ืžืžืœื™ืฆื™ื ืœื”ืฉืชืžืฉ ื‘ื• - ืœื’ืจืกื” ื”ื™ืฆื™ื‘ื” ื”ื ื•ื›ื—ื™ืช ืฉืœ Spark (2.4.5) ื™ืฉ ืืคืฉืจื•ื™ื•ืช ืชืฆื•ืจื” ืžื•ื’ื‘ืœื•ืช ืœืžื“ื™ ืœื”ืคืขืœืช ืžืฉื™ืžื•ืช Spark ื‘-Kubernetes, ื‘ืขื•ื“ ืฉื”ื’ืจืกื” ื”ื’ื“ื•ืœื” ื”ื‘ืื” (3.0.0) ืžืฆื”ื™ืจื” ืขืœ ืชืžื™ื›ื” ืžืœืื” ื‘-Kubernetes, ืืš ืชืืจื™ืš ื”ืฉื—ืจื•ืจ ืฉืœื” ืœื ื™ื“ื•ืข . Spark Operator ืžืคืฆื” ืขืœ ื—ืกืจื•ืŸ ื–ื” ืขืœ ื™ื“ื™ ื”ื•ืกืคืช ืืคืฉืจื•ื™ื•ืช ืชืฆื•ืจื” ื—ืฉื•ื‘ื•ืช (ืœื“ื•ื’ืžื”, ื”ืจื›ื‘ื” ืฉืœ ConfigMap ืขื ืชืฆื•ืจืช ื’ื™ืฉื” ืœ-Hadoop ืœ-Spark pods) ื•ื”ื™ื›ื•ืœืช ืœื”ืคืขื™ืœ ืžืฉื™ืžื” ืžืชื•ื–ืžื ืช ื‘ืื•ืคืŸ ืงื‘ื•ืข.

ื”ืคืขืœืช Apache Spark ื‘-Kubernetes
ื‘ื•ืื• ื ื“ื’ื™ืฉ ืืช ื–ื” ื›ืžืงืจื” ืฉื™ืžื•ืฉ ืฉืœื™ืฉื™ - ื”ืคืขืœืช ืžืฉื™ืžื•ืช Spark ื‘ืื•ืคืŸ ืงื‘ื•ืข ืขืœ ืืฉื›ื•ืœ Kubernetes ื‘ืœื•ืœืืช ื™ื™ืฆื•ืจ.

Spark Operator ื”ื•ื ืงื•ื“ ืคืชื•ื— ื•ืžืคื•ืชื— ื‘ืชื•ืš Google Cloud Platform - github.com/GoogleCloudPlatform/spark-on-k8s-operator. ื”ืชืงื ืชื• ื™ื›ื•ืœื” ืœื”ืชื‘ืฆืข ื‘-3 ื“ืจื›ื™ื:

  1. ื›ื—ืœืง ืžื”ืชืงื ืช Lightbend FastData Platform/Cloudflow;
  2. ืฉื™ืžื•ืฉ ื‘ื”ื’ื”:
    helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
    helm install incubator/sparkoperator --namespace spark-operator
    	

  3. ืฉื™ืžื•ืฉ ื‘ืžื ื™ืคืกื˜ื™ื ืžื”ืžืื’ืจ ื”ืจืฉืžื™ (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/tree/master/manifest). ืจืื•ื™ ืœืฆื™ื™ืŸ ืืช ื”ื“ื‘ืจื™ื ื”ื‘ืื™ื - Cloudflow ื›ื•ืœืœ ืžืคืขื™ืœ ืขื ื’ืจืกืช API v1beta1. ืื ื ืขืฉื” ืฉื™ืžื•ืฉ ื‘ืกื•ื’ ื–ื” ืฉืœ ื”ืชืงื ื”, ืชื™ืื•ืจื™ ื”ืžื ื™ืคืกื˜ ืฉืœ ื™ื™ืฉื•ื Spark ืฆืจื™ื›ื™ื ืœื”ืชื‘ืกืก ืขืœ ืชื’ื™ื•ืช ืœื“ื•ื’ืžื” ื‘-Git ืขื ื’ืจืกืช ื”-API ื”ืžืชืื™ืžื”, ืœืžืฉืœ, "v1beta1-0.9.0-2.4.0". ื ื™ืชืŸ ืœืžืฆื•ื ืืช ื’ืจืกืช ื”ืื•ืคืจื˜ื•ืจ ื‘ืชื™ืื•ืจ ื”-CRD ื”ื›ืœื•ืœ ื‘ืื•ืคืจื˜ื•ืจ ื‘ืžื™ืœื•ืŸ "ื’ืจืกืื•ืช":
    oc get crd sparkapplications.sparkoperator.k8s.io -o yaml
    	

ืื ื”ืžืคืขื™ืœ ืžื•ืชืงืŸ ื›ื”ืœื›ื”, ืคื•ื“ ืคืขื™ืœ ืขื ื”ืžืคืขื™ืœ ืฉืœ Spark ื™ื•ืคื™ืข ื‘ืคืจื•ื™ืงื˜ ื”ืžืชืื™ื (ืœื“ื•ื’ืžื”, cloudflow-fdp-sparkoperator ื‘ื—ืœืœ Cloudflow ืขื‘ื•ืจ ื”ืชืงื ืช Cloudflow) ื•ื™ื•ืคื™ืข ืกื•ื’ ืžืฉืื‘ Kubernetes ืžืชืื™ื ื‘ืฉื "sparkapplications" . ืืชื” ื™ื›ื•ืœ ืœื—ืงื•ืจ ื™ื™ืฉื•ืžื™ Spark ื–ืžื™ื ื™ื ืขื ื”ืคืงื•ื“ื” ื”ื‘ืื”:

oc get sparkapplications -n {project}

ื›ื“ื™ ืœื”ืคืขื™ืœ ืžืฉื™ืžื•ืช ื‘ืืžืฆืขื•ืช Spark Operator, ืขืœื™ืš ืœืขืฉื•ืช 3 ื“ื‘ืจื™ื:

  • ืฆื•ืจ ืชืžื•ื ืช Docker ื”ื›ื•ืœืœืช ืืช ื›ืœ ื”ืกืคืจื™ื•ืช ื”ื“ืจื•ืฉื•ืช, ื›ืžื• ื’ื ืงื‘ืฆื™ ืชืฆื•ืจื” ื•ื”ืคืขืœื”. ื‘ืชืžื•ื ืช ื”ืžื˜ืจื”, ื–ื•ื”ื™ ืชืžื•ื ื” ืฉื ื•ืฆืจื” ื‘ืฉืœื‘ ื”-CI/CD ื•ื ื‘ื“ืงื” ืขืœ ืืฉื›ื•ืœ ื‘ื“ื™ืงื”;
  • ืคืจืกื ืชืžื•ื ืช Docker ืœืจื™ืฉื•ื ื”ื ื’ื™ืฉ ืžืืฉื›ื•ืœ Kubernetes;
  • ืœื™ืฆื•ืจ ืžื ื™ืคืกื˜ ืขื ืกื•ื’ "SparkApplication" ื•ืชื™ืื•ืจ ืฉืœ ื”ืžืฉื™ืžื” ืฉืชื•ืฉืง. ืžื ื™ืคืกื˜ื™ื ืœื“ื•ื’ืžื” ื–ืžื™ื ื™ื ื‘ืžืื’ืจ ื”ืจืฉืžื™ (ืœืžืฉืœ. github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/v1beta1-0.9.0-2.4.0/examples/spark-pi.yaml). ื™ืฉ ื ืงื•ื“ื•ืช ื—ืฉื•ื‘ื•ืช ืœืฉื™ื ืœื‘ ืœืžื ื™ืคืกื˜:
    1. ื”ืžื™ืœื•ืŸ "apiVersion" ื—ื™ื™ื‘ ืœืฆื™ื™ืŸ ืืช ื’ืจืกืช ื”-API ื”ืžืชืื™ืžื” ืœื’ืจืกืช ื”ืžืคืขื™ืœ;
    2. ื”ืžื™ืœื•ืŸ "metadata.namespace" ื—ื™ื™ื‘ ืœืฆื™ื™ืŸ ืืช ืžืจื—ื‘ ื”ืฉืžื•ืช ืฉื‘ื• ืชื•ืคืขืœ ื”ืืคืœื™ืงืฆื™ื”;
    3. ื”ืžื™ืœื•ืŸ "spec.image" ื—ื™ื™ื‘ ืœื”ื›ื™ืœ ืืช ื”ื›ืชื•ื‘ืช ืฉืœ ืชืžื•ื ืช Docker ืฉื ื•ืฆืจื” ื‘ืจื™ืฉื•ื ื ื’ื™ืฉ;
    4. ื”ืžื™ืœื•ืŸ "spec.mainClass" ื—ื™ื™ื‘ ืœื”ื›ื™ืœ ืืช ืžื—ืœืงืช ื”ืžืฉื™ืžื•ืช Spark ืฉื™ืฉ ืœื”ืคืขื™ืœ ื›ืืฉืจ ื”ืชื”ืœื™ืš ืžืชื—ื™ืœ;
    5. ื”ืžื™ืœื•ืŸ "spec.mainApplicationFile" ื—ื™ื™ื‘ ืœื”ื›ื™ืœ ืืช ื”ื ืชื™ื‘ ืœืงื•ื‘ืฅ jar ื”ื ื™ืชืŸ ืœื”ืคืขืœื”;
    6. ื”ืžื™ืœื•ืŸ "spec.sparkVersion" ื—ื™ื™ื‘ ืœืฆื™ื™ืŸ ืืช ื”ื’ืจืกื” ืฉืœ Spark ื‘ืฉื™ืžื•ืฉ;
    7. ื”ืžื™ืœื•ืŸ "spec.driver.serviceAccount" ื—ื™ื™ื‘ ืœืฆื™ื™ืŸ ืืช ื—ืฉื‘ื•ืŸ ื”ืฉื™ืจื•ืช ื‘ืชื•ืš ืžืจื—ื‘ ื”ืฉืžื•ืช ื”ืžืชืื™ื ืฉืœ Kubernetes ืฉื™ืฉืžืฉ ืœื”ืคืขืœืช ื”ื™ื™ืฉื•ื;
    8. ื”ืžื™ืœื•ืŸ "spec.executor" ื—ื™ื™ื‘ ืœืฆื™ื™ืŸ ืืช ืžืกืคืจ ื”ืžืฉืื‘ื™ื ืฉื”ื•ืงืฆื• ืœืืคืœื™ืงืฆื™ื”;
    9. ื”ืžื™ืœื•ืŸ "spec.volumeMounts" ื—ื™ื™ื‘ ืœืฆื™ื™ืŸ ืืช ื”ืกืคืจื™ื™ื” ื”ืžืงื•ืžื™ืช ืฉื‘ื” ื™ื™ื•ื•ืฆืจื• ืงื‘ืฆื™ ื”ืžืฉื™ืžื•ืช ื”ืžืงื•ืžื™ื™ื ืฉืœ Spark.

ื“ื•ื’ืžื” ืœื™ืฆื™ืจืช ืžื ื™ืคืกื˜ (ื›ืืŸ {spark-service-account} ื”ื•ื ื—ืฉื‘ื•ืŸ ืฉื™ืจื•ืช ื‘ืชื•ืš ืืฉื›ื•ืœ Kubernetes ืœื”ืคืขืœืช ืžืฉื™ืžื•ืช Spark):

apiVersion: "sparkoperator.k8s.io/v1beta1"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: {project}
spec:
  type: Scala
  mode: cluster
  image: "gcr.io/spark-operator/spark:v2.4.0"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar"
  sparkVersion: "2.4.0"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 0.1
    coreLimit: "200m"
    memory: "512m"
    labels:
      version: 2.4.0
    serviceAccount: {spark-service-account}
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 2.4.0
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

ืžื ื™ืคืกื˜ ื–ื” ืžืฆื™ื™ืŸ ื—ืฉื‘ื•ืŸ ืฉื™ืจื•ืช ืฉืขื‘ื•ืจื•, ืœืคื ื™ ืคืจืกื•ื ื”ืžื ื™ืคืกื˜, ืขืœื™ืš ืœื™ืฆื•ืจ ืืช ื›ืจื™ื›ื•ืช ื”ืชืคืงื™ื“ื™ื ื”ื ื—ื•ืฆื•ืช ื”ืžืกืคืงื•ืช ืืช ื–ื›ื•ื™ื•ืช ื”ื’ื™ืฉื” ื”ื“ืจื•ืฉื•ืช ืœื™ื™ืฉื•ื Spark ืœืื™ื ื˜ืจืืงืฆื™ื” ืขื ื”-API ืฉืœ Kubernetes (ื‘ืžื™ื“ืช ื”ืฆื•ืจืš). ื‘ืžืงืจื” ืฉืœื ื•, ื”ืืคืœื™ืงืฆื™ื” ื–ืงื•ืงื” ืœื–ื›ื•ื™ื•ืช ืœื™ืฆื™ืจืช Pods. ื‘ื•ืื• ื ื™ืฆื•ืจ ืืช ื›ืจื™ื›ืช ื”ืชืคืงื™ื“ื™ื ื”ื“ืจื•ืฉื”:

oc adm policy add-role-to-user edit system:serviceaccount:{project}:{spark-service-account} -n {project}

ืจืื•ื™ ื’ื ืœืฆื™ื™ืŸ ืฉืžืคืจื˜ ืžื ื™ืคืกื˜ ื–ื” ืขืฉื•ื™ ืœื›ืœื•ืœ ืคืจืžื˜ืจ "hadoopConfigMap", ื”ืžืืคืฉืจ ืœืš ืœืฆื™ื™ืŸ ConfigMap ืขื ืชืฆื•ืจืช Hadoop ืžื‘ืœื™ ืฉืชืฆื˜ืจืš ืœืžืงื ืชื—ื™ืœื” ืืช ื”ืงื•ื‘ืฅ ื”ืžืชืื™ื ื‘ืชืžื•ื ืช Docker. ื–ื” ืžืชืื™ื ื’ื ืœื”ืคืขืœืช ืžืฉื™ืžื•ืช ื‘ืื•ืคืŸ ืงื‘ื•ืข - ื‘ืืžืฆืขื•ืช ืคืจืžื˜ืจ "ืœื•ื— ื–ืžื ื™ื", ื ื™ืชืŸ ืœืฆื™ื™ืŸ ืœื•ื— ื–ืžื ื™ื ืœื”ืคืขืœืช ืžืฉื™ืžื” ื ืชื•ื ื”.

ืœืื—ืจ ืžื›ืŸ, ืื ื• ืฉื•ืžืจื™ื ืืช ื”ืžื ื™ืคืกื˜ ืฉืœื ื• ื‘ืงื•ื‘ืฅ spark-pi.yaml ื•ืžื—ื™ืœื™ื ืื•ืชื• ืขืœ ืืฉื›ื•ืœ Kubernetes ืฉืœื ื•:

oc apply -f spark-pi.yaml

ืคืขื•ืœื” ื–ื• ืชื™ืฆื•ืจ ืื•ื‘ื™ื™ืงื˜ ืžืกื•ื’ "sparkapplications":

oc get sparkapplications -n {project}
> NAME       AGE
> spark-pi   22h

ื‘ืžืงืจื” ื–ื”, ื™ื™ื•ื•ืฆืจ ืคื•ื“ ืขื ืืคืœื™ืงืฆื™ื”, ืฉื”ืกื˜ื˜ื•ืก ืฉืœื• ื™ื•ืฆื’ ื‘-"sparkapplications". ืืชื” ื™ื›ื•ืœ ืœืจืื•ืช ืืช ื–ื” ืขื ื”ืคืงื•ื“ื” ื”ื‘ืื”:

oc get sparkapplications spark-pi -o yaml -n {project}

ืขื ืกื™ื•ื ื”ืžืฉื™ืžื”, ื”-POD ื™ืขื‘ื•ืจ ืœืžืฆื‘ "ื”ื•ืฉืœื", ืืฉืจ ื™ืชืขื“ื›ืŸ ื’ื ื‘-"sparkapplications". ื ื™ืชืŸ ืœืฆืคื•ืช ื‘ื™ื•ืžื ื™ ื™ื™ืฉื•ืžื™ื ื‘ื“ืคื“ืคืŸ ืื• ื‘ืืžืฆืขื•ืช ื”ืคืงื•ื“ื” ื”ื‘ืื” (ื›ืืŸ {sparkapplications-pod-name} ื”ื•ื ืฉื ื”ืคื•ื“ ืฉืœ ื”ืžืฉื™ืžื” ื”ืคื•ืขืœืช):

oc logs {sparkapplications-pod-name} -n {project}

ื ื™ืชืŸ ืœื ื”ืœ ืžืฉื™ืžื•ืช Spark ื’ื ื‘ืืžืฆืขื•ืช ื›ืœื™ ื”ืฉื™ืจื•ืช ื”ืžื™ื•ื—ื“ sparkctl. ื›ื“ื™ ืœื”ืชืงื™ืŸ ืื•ืชื•, ืฉื›ืคืœ ืืช ื”ืžืื’ืจ ืขื ืงื•ื“ ื”ืžืงื•ืจ ืฉืœื•, ื”ืชืงืŸ ืืช Go ื•ื‘ื ื” ืืช ื›ืœื™ ื”ืฉื™ืจื•ืช ื”ื–ื”:

git clone https://github.com/GoogleCloudPlatform/spark-on-k8s-operator.git
cd spark-on-k8s-operator/
wget https://dl.google.com/go/go1.13.3.linux-amd64.tar.gz
tar -xzf go1.13.3.linux-amd64.tar.gz
sudo mv go /usr/local
mkdir $HOME/Projects
export GOROOT=/usr/local/go
export GOPATH=$HOME/Projects
export PATH=$GOPATH/bin:$GOROOT/bin:$PATH
go -version
cd sparkctl
go build -o sparkctl
sudo mv sparkctl /usr/local/bin

ื”ื‘ื” ื ื‘ื—ืŸ ืืช ืจืฉื™ืžืช ื”ืžืฉื™ืžื•ืช ื”ืจืฆื•ืช ืฉืœ Spark:

sparkctl list -n {project}

ื‘ื•ืื• ื ื™ืฆื•ืจ ืชื™ืื•ืจ ืขื‘ื•ืจ ื”ืžืฉื™ืžื” Spark:

vi spark-app.yaml

apiVersion: "sparkoperator.k8s.io/v1beta1"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: {project}
spec:
  type: Scala
  mode: cluster
  image: "gcr.io/spark-operator/spark:v2.4.0"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar"
  sparkVersion: "2.4.0"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1000m"
    memory: "512m"
    labels:
      version: 2.4.0
    serviceAccount: spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 2.4.0
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

ื”ื‘ื” ื ืจื™ืฅ ืืช ื”ืžืฉื™ืžื” ื”ืžืชื•ืืจืช ื‘ืืžืฆืขื•ืช sparkctl:

sparkctl create spark-app.yaml -n {project}

ื”ื‘ื” ื ื‘ื—ืŸ ืืช ืจืฉื™ืžืช ื”ืžืฉื™ืžื•ืช ื”ืจืฆื•ืช ืฉืœ Spark:

sparkctl list -n {project}

ื”ื‘ื” ื ื‘ื—ืŸ ืืช ืจืฉื™ืžืช ื”ืื™ืจื•ืขื™ื ืฉืœ ืžืฉื™ืžืช Spark ืฉื”ื•ืฉืงื”:

sparkctl event spark-pi -n {project} -f

ื”ื‘ื” ื ื‘ื—ืŸ ืืช ื”ืกื˜ื˜ื•ืก ืฉืœ ืžืฉื™ืžืช Spark ื”ืคื•ืขืœืช:

sparkctl status spark-pi -n {project}

ืœืกื™ื›ื•ื, ื‘ืจืฆื•ื ื™ ืœืฉืงื•ืœ ืืช ื”ื—ืกืจื•ื ื•ืช ืฉื”ืชื’ืœื• ื‘ืฉื™ืžื•ืฉ ื‘ื’ืจืกื” ื”ื™ืฆื™ื‘ื” ื”ื ื•ื›ื—ื™ืช ืฉืœ Spark (2.4.5) ื‘-Kubernetes:

  1. ื”ื—ื™ืกืจื•ืŸ ื”ืจืืฉื•ืŸ ื•ืื•ืœื™ ื”ืขื™ืงืจื™ ื”ื•ื ื”ื™ืขื“ืจ Data Locality. ืœืžืจื•ืช ื›ืœ ื”ื—ืกืจื•ื ื•ืช ืฉืœ YARN, ื”ื™ื• ื’ื ื™ืชืจื•ื ื•ืช ืœืฉื™ืžื•ืฉ ื‘ื•, ืœืžืฉืœ ืขืงืจื•ืŸ ืžืกื™ืจืช ืงื•ื“ ืœื ืชื•ื ื™ื (ื•ืœื ื ืชื•ื ื™ื ืœืงื•ื“). ื”ื•ื“ื•ืช ืœื•, ื‘ื•ืฆืขื• ืžืฉื™ืžื•ืช Spark ื‘ืฆืžืชื™ื ืฉื‘ื”ื ื ืžืฆืื• ื”ื ืชื•ื ื™ื ื”ืžืขื•ืจื‘ื™ื ื‘ื—ื™ืฉื•ื‘ื™ื, ื•ื”ื–ืžืŸ ืฉื ื“ืจืฉ ืœื”ืขื‘ืจืช ื ืชื•ื ื™ื ื‘ืจืฉืช ื”ืฆื˜ืžืฆื ืžืฉืžืขื•ืชื™ืช. ื‘ืขืช ืฉื™ืžื•ืฉ ื‘-Kubernetes, ืื ื• ืžืชืžื•ื“ื“ื™ื ืขื ื”ืฆื•ืจืš ืœื”ืขื‘ื™ืจ ื ืชื•ื ื™ื ื”ืžืขื•ืจื‘ื™ื ื‘ืžืฉื™ืžื” ื‘ืจื—ื‘ื™ ื”ืจืฉืช. ืื ื”ื ื’ื“ื•ืœื™ื ืžืกืคื™ืง, ื–ืžืŸ ื‘ื™ืฆื•ืข ื”ืžืฉื™ืžื•ืช ื™ื›ื•ืœ ืœื”ื’ื“ื™ืœ ื‘ืื•ืคืŸ ืžืฉืžืขื•ืชื™, ื•ื’ื ืœื“ืจื•ืฉ ื›ืžื•ืช ื“ื™ ื’ื“ื•ืœื” ืฉืœ ืฉื˜ื— ื“ื™ืกืง ืฉื”ื•ืงืฆื” ืœืžื•ืคืขื™ ืžืฉื™ืžื•ืช Spark ืขื‘ื•ืจ ื”ืื—ืกื•ืŸ ื”ื–ืžื ื™ ืฉืœื”ื. ื ื™ืชืŸ ืœืฆืžืฆื ืืช ื”ื—ื™ืกืจื•ืŸ ื”ื–ื” ืขืœ ื™ื“ื™ ืฉื™ืžื•ืฉ ื‘ืชื•ื›ื ื” ืžื™ื•ื—ื“ืช ื”ืžื‘ื˜ื™ื—ื” ืืช ืžืงื•ืžื™ื•ืช ื”ื ืชื•ื ื™ื ื‘-Kubernetes (ืœื“ื•ื’ืžื”, Alluxio), ืืš ื”ืžืฉืžืขื•ืช ื”ื™ื ืœืžืขืฉื” ื”ืฆื•ืจืš ืœืื—ืกืŸ ืขื•ืชืง ืฉืœื ืฉืœ ื”ื ืชื•ื ื™ื ื‘ืฆืžืชื™ื ืฉืœ ืืฉื›ื•ืœ Kubernetes.
  2. ื”ื—ื™ืกืจื•ืŸ ื”ื—ืฉื•ื‘ ื”ืฉื ื™ ื”ื•ื ื”ื‘ื™ื˜ื—ื•ืŸ. ื›ื‘ืจื™ืจืช ืžื—ื“ืœ, ืชื›ื•ื ื•ืช ื”ืงืฉื•ืจื•ืช ืœืื‘ื˜ื—ื” ืœื’ื‘ื™ ื”ืคืขืœืช ืžืฉื™ืžื•ืช Spark ืžื•ืฉื‘ืชื•ืช, ื”ืฉื™ืžื•ืฉ ื‘-Kerberos ืื™ื ื• ืžื›ื•ืกื” ื‘ืชื™ืขื•ื“ ื”ืจืฉืžื™ (ืื ื›ื™ ื”ืืคืฉืจื•ื™ื•ืช ื”ืžืชืื™ืžื•ืช ื”ื•ืฆื’ื• ื‘ื’ืจืกื” 3.0.0, ืฉืชื“ืจื•ืฉ ืขื‘ื•ื“ื” ื ื•ืกืคืช), ื•ืชื™ืขื•ื“ ื”ืื‘ื˜ื—ื” ืขื‘ื•ืจ ื‘ืืžืฆืขื•ืช Spark (https://spark.apache.org/docs/2.4.5/security.html) ืจืง YARN, Mesos ื•-Standalone Cluster ืžื•ืคื™ืขื™ื ื›ื—ื ื•ื™ื•ืช ืžืคืชื—. ื™ื—ื“ ืขื ื–ืืช, ืœื ื ื™ืชืŸ ืœืฆื™ื™ืŸ ื™ืฉื™ืจื•ืช ืืช ื”ืžืฉืชืžืฉ ืฉืชื—ืชื™ื• ืžื•ืคืขืœื•ืช ืžืฉื™ืžื•ืช Spark - ืื ื• ืžืฆื™ื™ื ื™ื ืจืง ืืช ื—ืฉื‘ื•ืŸ ื”ืฉื™ืจื•ืช ืชื—ืชื™ื• ื”ื•ื ื™ืขื‘ื•ื“, ื•ื”ืžืฉืชืžืฉ ื ื‘ื—ืจ ืขืœ ืกืžืš ืžื“ื™ื ื™ื•ืช ื”ืื‘ื˜ื—ื” ื”ืžื•ื’ื“ืจืช. ื‘ื”ืงืฉืจ ื–ื”, ื ืขืฉื” ืฉื™ืžื•ืฉ ื‘ืžืฉืชืžืฉ ื”ืฉื•ืจืฉ, ืฉืื™ื ื• ื‘ื˜ื•ื— ื‘ืกื‘ื™ื‘ื” ืคืจื•ื“ื•ืงื˜ื™ื‘ื™ืช, ืื• ื‘ืžืฉืชืžืฉ ืขื UID ืืงืจืื™, ืฉืื™ื ื• ื ื•ื— ื‘ืขืช ื”ืคืฆืช ื–ื›ื•ื™ื•ืช ื’ื™ืฉื” ืœื ืชื•ื ื™ื (ื ื™ืชืŸ ืœืคืชื•ืจ ื–ืืช ืขืœ ื™ื“ื™ ื™ืฆื™ืจืช PodSecurityPolicies ื•ืงื™ืฉื•ืจื ืœ- ื—ืฉื‘ื•ื ื•ืช ืฉื™ืจื•ืช ืชื•ืืžื™ื). ื ื›ื•ืŸ ืœืขื›ืฉื™ื•, ื”ืคืชืจื•ืŸ ื”ื•ื ืœืžืงื ืืช ื›ืœ ื”ืงื‘ืฆื™ื ื”ื“ืจื•ืฉื™ื ื™ืฉื™ืจื•ืช ืœืชืžื•ื ืช Docker, ืื• ืœืฉื ื•ืช ืืช ืกืงืจื™ืคื˜ ื”ื”ืฉืงื” ืฉืœ Spark ื›ื“ื™ ืœื”ืฉืชืžืฉ ื‘ืžื ื’ื ื•ืŸ ืœืื—ืกื•ืŸ ื•ืื—ื–ื•ืจ ืกื•ื“ื•ืช ืฉืื•ืžืฆื• ื‘ืืจื’ื•ืŸ ืฉืœืš.
  3. ื”ืคืขืœืช ืขื‘ื•ื“ื•ืช Spark ื‘ืืžืฆืขื•ืช Kubernetes ืขื“ื™ื™ืŸ ื‘ืžืฆื‘ ื ื™ืกื™ื•ื ื™ ื•ื™ื™ืชื›ื ื• ืฉื™ื ื•ื™ื™ื ืžืฉืžืขื•ืชื™ื™ื ื‘ื—ืคืฆื™ื ืฉื‘ื”ื ื ืขืฉื” ืฉื™ืžื•ืฉ (ืงื•ื‘ืฆื™ ืชืฆื•ืจื”, ืชืžื•ื ื•ืช ื‘ืกื™ืก Docker ื•ืกืงืจื™ืคื˜ื™ ื”ืฉืงื”) ื‘ืขืชื™ื“. ื•ืื›ืŸ, ื‘ืขืช ื”ื›ื ืช ื”ื—ื•ืžืจ, ื ื‘ื“ืงื• ื’ืจืกืื•ืช 2.3.0 ื•-2.4.5, ื”ื”ืชื ื”ื’ื•ืช ื”ื™ื™ืชื” ืฉื•ื ื” ืžืฉืžืขื•ืชื™ืช.

ื‘ื•ื ื ื—ื›ื” ืœืขื“ื›ื•ื ื™ื - ืœืื—ืจื•ื ื” ืฉื•ื—ืจืจื” ื’ืจืกื” ื—ื“ืฉื” ืฉืœ Spark (3.0.0), ืฉื”ื‘ื™ืื” ืฉื™ื ื•ื™ื™ื ืžืฉืžืขื•ืชื™ื™ื ื‘ืขื‘ื•ื“ื” ืฉืœ Spark ื‘-Kubernetes, ืืš ืฉืžืจื” ืขืœ ื”ืกื˜ื˜ื•ืก ื”ื ื™ืกื™ื•ื ื™ ืฉืœ ืชืžื™ื›ื” ื‘ืžื ื”ืœ ืžืฉืื‘ื™ื ื–ื”. ืื•ืœื™ ื”ืขื“ื›ื•ื ื™ื ื”ื‘ืื™ื ื‘ืืžืช ื™ืืคืฉืจื• ืœื”ืžืœื™ืฅ โ€‹โ€‹ื‘ืื•ืคืŸ ืžืœื ืขืœ ื ื˜ื™ืฉืช YARN ื•ื”ืคืขืœืช ืžืฉื™ืžื•ืช Spark ื‘-Kubernetes ืœืœื ื—ืฉืฉ ืœืื‘ื˜ื—ืช ื”ืžืขืจื›ืช ืฉืœื›ื ื•ืœืœื ืฆื•ืจืš ื‘ืฉื™ื ื•ื™ ืขืฆืžืื™ ืฉืœ ืจื›ื™ื‘ื™ื ืคื•ื ืงืฆื™ื•ื ืœื™ื™ื.

ืกึฐื ึทืคึผึดื™ืจ.

ืžืงื•ืจ: www.habr.com

ื”ื•ืกืคืช ืชื’ื•ื‘ื”