Ezigbo ndị na-agụ akwụkwọ, ehihie ọma. Taa, anyị ga-ekwu ntakịrị banyere Apache Spark na atụmanya mmepe ya.
N'ime ụwa nke Big Data ọgbara ọhụrụ, Apache Spark bụ ọkọlọtọ maka ịmepụta ọrụ nhazi data. Tụkwasị na nke a, a na-ejikwa ya mepụta ngwa ngwa ngwa ngwa na-arụ ọrụ na echiche micro batch, nhazi na mbupu data na obere akụkụ (Spark Structured Streaming). Na omenala, ọ bụla akụkụ nke mkpokọta Hadoop, na-eji YARN (ma ọ bụ mgbe ụfọdụ Apache Mesos) dị ka onye njikwa akụrụngwa. Site na 2020, ojiji ya n'ụdị ọdịnala ya bụ ajụjụ maka ọtụtụ ụlọ ọrụ n'ihi enweghị nkesa Hadoop dị mma - mmepe nke HDP na CDH akwụsịla, CDH adịghị emepe nke ọma ma nwee ọnụ ahịa dị elu, na ndị na-eweta Hadoop fọdụrụnụ nwere. ma ọ bụ kwụsịrị ịdị adị ma ọ bụ nwee ọdịnihu adịghị mma. Ya mere, mwepụta nke Apache Spark na-eji Kubernetes bụ mmasị na-arịwanye elu n'etiti obodo na nnukwu ụlọ ọrụ - ịghọ ọkọlọtọ na nhazi nhazi na njikwa akụ na igwe ojii na nke ọha, ọ na-edozi nsogbu ahụ site na nhazi usoro ihe onwunwe na-adịghị mma nke ọrụ Spark na YARN ma na-enye. ikpo okwu na-emepe emepe nke nwere ọtụtụ nkesa azụmahịa na nke mepere emepe maka ụlọ ọrụ nke nha na ọnyá niile. Tụkwasị na nke ahụ, n'azụ nke ewu ewu, ọtụtụ ejirilarị nweta nrụnye ole na ole nke onwe ha ma nwekwuo nkà ha n'iji ya eme ihe, nke na-eme ka njem ahụ dị mfe.
Malite na ụdị 2.3.0, Apache Spark nwetara nkwado gọọmentị maka ịrụ ọrụ na-arụ ọrụ na ụyọkọ Kubernetes na taa, anyị ga-ekwu maka ntozu oke nke usoro a, nhọrọ dị iche iche maka iji ya na ọnyà ndị a ga-ezute n'oge mmejuputa.
Nke mbụ, ka anyị leba anya na usoro nke ịmepụta ọrụ na ngwa dabere na Apache Spark ma gosipụta ụdị ikpe nke ịchọrọ iji rụọ ọrụ na ụyọkọ Kubernetes. N'ịkwado ọkwa a, a na-eji OpenShift dị ka nkesa na iwu dị mkpa maka ọrụ ahịrị iwu ya (oc) ga-enye. Maka nkesa Kubernetes ndị ọzọ, enwere ike iji iwu kwekọrọ na ọkọlọtọ Kubernetes Command line utility (kubectl) ma ọ bụ analogues ha (dịka ọmụmaatụ, maka iwu oc adm).
Akpa eji eme ihe - spark-nobe
N'oge mmepe nke ọrụ na ngwa ngwa, onye nrụpụta kwesịrị ịrụ ọrụ iji mebie mgbanwe data. N'ụzọ doro anya, enwere ike iji stubs mee ihe maka ebumnuche ndị a, mana mmepe na ntinye aka nke ezigbo (ọ bụ ezie na ule) ihe atụ nke usoro njedebe egosila na ọ dị ngwa ngwa ma dị mma na klas nke ọrụ a. N'okwu ahụ mgbe anyị na-emezigharị na ezigbo ihe atụ nke sistemụ njedebe, ọnọdụ abụọ ga-ekwe omume:
onye nrụpụta na-arụ ọrụ Spark na mpaghara na ọnọdụ kwụ ọtọ;
onye nrụpụta na-arụ ọrụ Spark na ụyọkọ Kubernetes na loop ule.
Nhọrọ nke mbụ nwere ikike ịdị adị, mana ọ gụnyere ọtụtụ ọghọm:
A ghaghị inye onye nrụpụta ọ bụla ohere site na ebe ọrụ gaa na oge niile nke usoro njedebe ọ chọrọ;
a chọrọ ego zuru oke na igwe na-arụ ọrụ iji rụọ ọrụ a na-emepụta.
Nhọrọ nke abụọ enweghị ọghọm ndị a, ebe ọ bụ na iji ụyọkọ Kubernetes na-enye gị ohere ikenye ọdọ mmiri akụrụngwa dị mkpa maka ịrụ ọrụ ma nye ya ohere dị mkpa iji kwụsị usoro usoro, na-agbanwe agbanwe na-enye ohere ịnweta ya site na iji Kubernetes nlereanya maka. ndị niile so na mmepe otu. Ka anyị gosipụta ya dị ka ikpe izizi mbụ - ịmalite ọrụ Spark site na igwe nrụpụta mpaghara na ụyọkọ Kubernetes na sekit ule.
Ka anyị kwukwuo maka usoro ịtọlite Spark ka ọ na-agba ọsọ na mpaghara. Iji malite iji Spark, ịkwesịrị ịwụnye ya:
mkdir /opt/spark
cd /opt/spark
wget http://mirror.linux-ia64.org/apache/spark/spark-2.4.5/spark-2.4.5.tgz
tar zxvf spark-2.4.5.tgz
rm -f spark-2.4.5.tgz
Anyị na-anakọta ngwugwu ndị dị mkpa maka ịrụ ọrụ na Kubernetes:
cd spark-2.4.5/
./build/mvn -Pkubernetes -DskipTests clean package
Nrụpụta zuru oke na-ewe oge buru ibu, yana imepụta onyonyo Docker wee mee ya na ụyọkọ Kubernetes, naanị ị ga-achọ faịlụ ite sitere na ndekọ “mgbakọ /”, yabụ naanị ị nwere ike wuo isiokwu a:
Onyonyo emepụtara na-agụnye naanị Spark na ihe ndabere dị mkpa, a na-akwado koodu executable na anya (dịka ọmụmaatụ, na HDFS).
Nke mbụ, ka anyị wuo onyonyo Docker nwere ihe atụ nnwale nke ọrụ Spark. Iji mepụta onyonyo Docker, Spark nwere akụrụngwa akpọrọ "docker-image-tool". Ka anyị mụọ enyemaka na ya:
./bin/docker-image-tool.sh --help
Site n'enyemaka ya, ị nwere ike ịmepụta ihe oyiyi Docker wee bulite ha na ndekọ dịpụrụ adịpụ, mana na ndabara ọ nwere ọtụtụ ọghọm:
na-emepụta ihe oyiyi Docker 3 ozugbo - maka Spark, PySpark na R;
anaghị ekwe ka ị kọwapụta aha onyonyo.
Ya mere, anyị ga-eji ụdị ọrụ a gbanwetụrụ enyere n'okpuru:
vi bin/docker-image-tool-upd.sh
#!/usr/bin/env bash
function error {
echo "$@" 1>&2
exit 1
}
if [ -z "${SPARK_HOME}" ]; then
SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi
. "${SPARK_HOME}/bin/load-spark-env.sh"
function image_ref {
local image="$1"
local add_repo="${2:-1}"
if [ $add_repo = 1 ] && [ -n "$REPO" ]; then
image="$REPO/$image"
fi
if [ -n "$TAG" ]; then
image="$image:$TAG"
fi
echo "$image"
}
function build {
local BUILD_ARGS
local IMG_PATH
if [ ! -f "$SPARK_HOME/RELEASE" ]; then
IMG_PATH=$BASEDOCKERFILE
BUILD_ARGS=(
${BUILD_PARAMS}
--build-arg
img_path=$IMG_PATH
--build-arg
datagram_jars=datagram/runtimelibs
--build-arg
spark_jars=assembly/target/scala-$SPARK_SCALA_VERSION/jars
)
else
IMG_PATH="kubernetes/dockerfiles"
BUILD_ARGS=(${BUILD_PARAMS})
fi
if [ -z "$IMG_PATH" ]; then
error "Cannot find docker image. This script must be run from a runnable distribution of Apache Spark."
fi
if [ -z "$IMAGE_REF" ]; then
error "Cannot find docker image reference. Please add -i arg."
fi
local BINDING_BUILD_ARGS=(
${BUILD_PARAMS}
--build-arg
base_img=$(image_ref $IMAGE_REF)
)
local BASEDOCKERFILE=${BASEDOCKERFILE:-"$IMG_PATH/spark/docker/Dockerfile"}
docker build $NOCACHEARG "${BUILD_ARGS[@]}"
-t $(image_ref $IMAGE_REF)
-f "$BASEDOCKERFILE" .
}
function push {
docker push "$(image_ref $IMAGE_REF)"
}
function usage {
cat <<EOF
Usage: $0 [options] [command]
Builds or pushes the built-in Spark Docker image.
Commands:
build Build image. Requires a repository address to be provided if the image will be
pushed to a different registry.
push Push a pre-built image to a registry. Requires a repository address to be provided.
Options:
-f file Dockerfile to build for JVM based Jobs. By default builds the Dockerfile shipped with Spark.
-p file Dockerfile to build for PySpark Jobs. Builds Python dependencies and ships with Spark.
-R file Dockerfile to build for SparkR Jobs. Builds R dependencies and ships with Spark.
-r repo Repository address.
-i name Image name to apply to the built image, or to identify the image to be pushed.
-t tag Tag to apply to the built image, or to identify the image to be pushed.
-m Use minikube's Docker daemon.
-n Build docker image with --no-cache
-b arg Build arg to build or push the image. For multiple build args, this option needs to
be used separately for each build arg.
Using minikube when building images will do so directly into minikube's Docker daemon.
There is no need to push the images into minikube in that case, they'll be automatically
available when running applications inside the minikube cluster.
Check the following documentation for more information on using the minikube Docker daemon:
https://kubernetes.io/docs/getting-started-guides/minikube/#reusing-the-docker-daemon
Examples:
- Build image in minikube with tag "testing"
$0 -m -t testing build
- Build and push image with tag "v2.3.0" to docker.io/myrepo
$0 -r docker.io/myrepo -t v2.3.0 build
$0 -r docker.io/myrepo -t v2.3.0 push
EOF
}
if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
usage
exit 0
fi
REPO=
TAG=
BASEDOCKERFILE=
NOCACHEARG=
BUILD_PARAMS=
IMAGE_REF=
while getopts f:mr:t:nb:i: option
do
case "${option}"
in
f) BASEDOCKERFILE=${OPTARG};;
r) REPO=${OPTARG};;
t) TAG=${OPTARG};;
n) NOCACHEARG="--no-cache";;
i) IMAGE_REF=${OPTARG};;
b) BUILD_PARAMS=${BUILD_PARAMS}" --build-arg "${OPTARG};;
esac
done
case "${@: -1}" in
build)
build
;;
push)
if [ -z "$REPO" ]; then
usage
exit 1
fi
push
;;
*)
usage
exit 1
;;
esac
Site n'enyemaka ya, anyị na-achịkọta ihe oyiyi Spark bụ isi nke nwere ọrụ nnwale maka ịgbakọ Pi site na iji Spark (ebe a {docker-registry-url} bụ URL nke ndekọ ihe oyiyi Docker gị, {repo} bụ aha ebe nchekwa n'ime ndekọ ahụ, nke dabara na oru ngo na OpenShift , {image-name} - aha ihe oyiyi (ọ bụrụ na a na-eji nkewa nke ọkwa atọ nke ihe oyiyi, dịka ọmụmaatụ, dị ka na ndekọ aha nke Red Hat OpenShift images), {tag} - mkpado nke a. ụdị onyonyo a):
Ka anyị lelee na onyonyo agbakọtara dị na OKD. Iji mee nke a, mepee URL na ihe nchọgharị ahụ na ndepụta nke onyonyo nke ọrụ ahụ kwekọrọ (ebe a bụ aha ọrụ n'ime ụyọkọ OpenShift, {OKD-WEBUI-URL} bụ URL nke OpenShift Web console). ) - https://{OKD-WEBUI-URL}/console /project/{project}/browse/images/{image-name}.
Iji mee ihe aga-eme, ekwesịrị ịmepụta akaụntụ ọrụ yana ohere iji mee pods dị ka mgbọrọgwụ (anyị ga-atụle isi ihe a ma emechaa):
— aha — aha ọrụ nke ga-ekere òkè na nhazi aha Kubernetes pods;
—klas — klaasị nke faịlụ executable, nke a na-akpọ mgbe ọrụ malitere;
-conf - paramita nhazi nhazi;
spark.executor.instances - ọnụ ọgụgụ nke ndị na-eme ihe ngosi Spark ga-amalite;
spark.kubernetes.authenticate.driver.serviceAccountName - aha akaụntụ ọrụ Kubernetes ejiri mee ihe mgbe ị na-ebupụta pods (iji kọwaa ọnọdụ nchekwa na ike mgbe gị na Kubernetes API na-emekọrịta ihe);
spark.kubernetes.namespace - Kubernetes namespace nke a ga-ewepụta ihe ọkwọ ụgbọ ala na onye mmebe;
spark.submit.deployMode - usoro nke ịmalite Spark (maka ọkọlọtọ spark-nye "ụyọkọ" na-eji, maka Spark Operator na mgbe e mesịrị nsụgharị nke Spark "onye ahịa");
Iji mee ka usoro mmepe ahụ dị mfe, enwere ike iji nhọrọ ọzọ, nke a na-emepụta ihe oyiyi nkịtị nke Spark, nke ọrụ niile na-arụ ọrụ, na-ebipụta foto nke faịlụ ndị nwere ike ime na nchekwa mpụga (dịka ọmụmaatụ, Hadoop) ma kọwaa mgbe ị na-akpọ. spark-nyefere dị ka njikọ. N'okwu a, ị nwere ike ịme ụdị ọrụ Spark dị iche iche na-ewughachi ihe oyiyi Docker, na-eji, dịka ọmụmaatụ, WebHDFS iji bipụta ihe oyiyi. Anyị na-eziga arịrịọ ka ịmepụta faịlụ (ebe a {host} bụ onye ọbịa nke ọrụ WebHDFS, {port} bụ ọdụ ụgbọ mmiri nke ọrụ WebHDFS, {ụzọ-to-file-on-hdfs} bụ ụzọ achọrọ na faịlụ ahụ. na HDFS):
curl -i -X PUT "http://{host}:{port}/webhdfs/v1/{path-to-file-on-hdfs}?op=CREATE
Ị ga-enweta nzaghachi dị ka nke a (ebe a bụ URL nke kwesịrị iji budata faịlụ ahụ):
Bunye faịlụ Spark executable n'ime HDFS (ebe a {ụzọ-to-local-file} bụ ụzọ na faịlụ Spark executable na onye ọbịa ugbu a):
curl -i -X PUT -T {path-to-local-file} "{location}"
Mgbe nke a gachara, anyị nwere ike ime spark-submit site na iji faịlụ Spark ebugoro na HDFS (ebe a bụ aha klaasị nke kwesịrị ịmalite iji rụchaa ọrụ ahụ):
Ekwesiri iburu n'uche na iji nweta HDFS ma hụ na ọrụ ahụ na-arụ ọrụ, ị nwere ike ịgbanwe Dockerfile na script entrypoint.sh - tinye ntụziaka na Dockerfile iji detuo ụlọ akwụkwọ ndị dabere na / opt / spark / jars directory na tinye faịlụ nhazi HDFS na SPARK_CLASSPATH na ntinye. sh.
Okwu ikpe nke abụọ - Apache Livy
Ọzọkwa, mgbe arụpụtara ọrụ ma ọ dị mkpa ka a nwalee nsonaazụ ya, ajụjụ na-ebilite nke ịmalite ya dịka akụkụ nke usoro CI / CD na nyochaa ọnọdụ nke mmezu ya. N'ezie, ị nwere ike ịgba ya site na iji oku nrubeisi mpaghara, mana nke a na-eme ka akụrụngwa CI/CD gbagwojuru anya ebe ọ na-achọ ịwụnye na ịhazi Spark na ndị ọrụ CI sava / ndị na-agba ọsọ na ịtọlite na Kubernetes API. Maka nke a, mmejuputa atumatu ahọrọla iji Apache Livy dị ka API REST maka ịgba ọsọ Spark akwadoro n'ime ụyọkọ Kubernetes. Site n'enyemaka ya, ị nwere ike ịme ọrụ Spark na ụyọkọ Kubernetes site na iji arịrịọ cURL mgbe niile, nke a na-eme ngwa ngwa dabere na ngwọta CI ọ bụla, na ntinye ya n'ime ụyọkọ Kubernetes na-edozi okwu nke nkwenye mgbe gị na Kubernetes API na-emekọrịta ihe.
Ka anyị gosipụta ya dị ka ikpe ojiji nke abụọ - na-arụ ọrụ Spark dị ka akụkụ nke usoro CI/CD na ụyọkọ Kubernetes na loop ule.
Obere maka Apache Livy - ọ na-arụ ọrụ dị ka ihe nkesa HTTP na-enye interface Weebụ yana RESTful API nke na-enye gị ohere ibido ọkụ-nrubeisi site na ịgafe oke dị mkpa. Omenala ebula ya dị ka akụkụ nke nkesa HDP, mana enwere ike ibuga ya na OKD ma ọ bụ nrụnye Kubernetes ọ bụla site na iji ngosipụta kwesịrị ekwesị yana ihe onyonyo Docker, dị ka nke a - github.com/ttauveron/k8s-big-data-experiments/tree/master/livy-spark-2.3. Maka ikpe anyị, e wuru ihe oyiyi Docker yiri ya, gụnyere ụdị Spark 2.4.5 site na Dockerfile na-esonụ:
Enwere ike wuo ma bulite onyonyo emepụtara na ebe nchekwa Docker gị dị, dị ka ebe nchekwa OKD dị n'ime. Iji bugharịa ya, jiri ihe ngosi na-esonụ ({registry-url} - URL nke ndekọ ihe oyiyi Docker, {image-name} - Aha oyiyi Docker, {tag} - Docker image mkpado, {livy-url} - URL chọrọ ebe nkesa ga-enweta Livy; a na-eji ihe ngosi "Route" ma ọ bụrụ na ejiri Red Hat OpenShift mee ihe dị ka nkesa Kubernetes, ma ọ bụghị ya, a na-eji Ingress ma ọ bụ ọrụ nke ụdị NodePort na-arụ ọrụ):
Mgbe itinye ya n'ọrụ wee bido pọd ahụ nke ọma, interface eserese Livy dị na njikọ a: http://{livy-url}/ui. Site na Livy, anyị nwere ike bipụta ọrụ Spark anyị site na iji arịrịọ REST si, dịka ọmụmaatụ, onye akwụkwọ ozi. Edepụtara ihe atụ nke mkpokọta nwere arịrịọ n'okpuru (arụmụka nhazi na mgbanwe dị mkpa maka ịrụ ọrụ nke ọrụ ewepụtara nwere ike ịfefe n'usoro "args"):
Ka anyị mee arịrịọ mbụ sitere na mkpokọta ahụ, gaa na interface OKD wee lelee na ewepụtala ọrụ ahụ nke ọma - https://{OKD-WEBUI-URL}/console/project/{project}/browse/pods. N'otu oge ahụ, nnọkọ ga-apụta na interface Livy (http://{livy-url}/ui), n'ime nke, iji Livy API ma ọ bụ eserese eserese, ị nwere ike soro ọganihu nke ọrụ ahụ ma mụọ nnọkọ ahụ. ndekọ.
Ugbu a, ka anyị gosi otú Livy si arụ ọrụ. Iji mee nke a, ka anyị nyochaa ndekọ nke akpa Livy n'ime pọd ya na sava Livy - https://{OKD-WEBUI-URL}/console/project/{project}/browse/pods/{livy-pod-name }?tab=akwụkwọ ndekọ. Site na ha anyị nwere ike ịhụ na mgbe ị na-akpọ Livy REST API n'ime akpa aha ya bụ "livy", a na-egbu ọkụ-submit, dị ka nke anyị ji n'elu (ebe a {livy-pod-name} bụ aha nke pọd ahụ mepụtara. ya na sava Livy). Nchịkọta a na-ewebata ajụjụ nke abụọ na-enye gị ohere ịrụ ọrụ ndị na-akwado Spark executable site na iji sava Livy.
Ojiji nke atọ - Spark Operator
Ugbu a a nwalere ọrụ ahụ, ajụjụ nke ịgba ọsọ ya na-ebilite mgbe niile. Ụzọ obodo ị ga-esi na-arụ ọrụ mgbe niile na ụyọkọ Kubernetes bụ ụlọ ọrụ CronJob ma ị nwere ike iji ya, mana n'oge a iji ndị na-arụ ọrụ iji jikwaa ngwa na Kubernetes bụ ihe a ma ama na maka Spark nwere onye ọrụ tozuru oke, nke bụkwa onye na-arụ ọrụ nke ọma. ejiri na ngwọta ọkwa ụlọ ọrụ (dịka ọmụmaatụ, Lightbend FastData Platform). Anyị na-akwado iji ya - ụdị Spark kwụsiri ike ugbu a (2.4.5) nwere nhọrọ nhazi nwere oke maka ịgba ọsọ Spark na Kubernetes, ebe isi na-esote (3.0.0) na-ekwupụta nkwado zuru oke maka Kubernetes, mana ụbọchị mwepụta ya ka amabeghị. . Onye ọrụ Spark na-akwụ ụgwọ maka adịghị ike a site n'ịgbakwunye nhọrọ nhazi dị mkpa (dịka ọmụmaatụ, ịkwanye ConfigMap nwere nhazi ohere Hadoop na Spark pods) na ikike ịme ọrụ a na-ahazi mgbe niile.
Ka anyị gosipụta ya dị ka ikpe ojiji nke atọ - na-arụ ọrụ Spark mgbe niile na ụyọkọ Kubernetes na loop mmepụta.
Iji ngosipụta sitere na ebe nchekwa gọọmentị (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/tree/master/manifest). Ọ dị mma ịmara ihe ndị a - Cloudflow gụnyere onye ọrụ nwere ụdị API v1beta1. Ọ bụrụ na ejiri ụdị nrụnye a, nkọwa nkọwa ngwa Spark kwesịrị ịdabere na mkpado ọmụmaatụ na Git nwere ụdị API dabara adaba, dịka ọmụmaatụ, "v1beta1-0.9.0-2.4.0". Enwere ike ịchọta ụdị onye ọrụ na nkọwa nke CRD gụnyere n'ime onye na-arụ ọrụ na ọkọwa okwu “ụdị”:
oc get crd sparkapplications.sparkoperator.k8s.io -o yaml
Ọ bụrụ na arụnyere onye ọrụ ahụ nke ọma, pọd na-arụ ọrụ na onye na-arụ ọrụ Spark ga-apụta na ọrụ ahụ kwekọrọ (dịka ọmụmaatụ, cloudflow-fdp-sparkoperator na oghere Cloudflow maka nrụnye Cloudflow) na ụdị akụ Kubernetes kwekọrọ aha ya bụ "sparkapplications" ga-apụta. . Ị nwere ike inyocha ngwa Spark dị site na iji iwu a:
oc get sparkapplications -n {project}
Iji rụọ ọrụ site na iji Spark Operator, ị ga-eme ihe atọ:
mepụta onyonyo Docker nke gụnyere ọba akwụkwọ niile dị mkpa, yana nhazi na faịlụ enwere ike ime ya. Na foto a na-achọsi ike, nke a bụ ihe oyiyi emepụtara na ọkwa CI / CD ma nwalee na ụyọkọ ule;
bipụta onyonyo Docker na ndekọ nke enwere ike ịnweta site na ụyọkọ Kubernetes;
Ngosipụta a na-akọwapụta akaụntụ ọrụ nke, tupu ibipụta ihe ngosi ahụ, ị ga-enwerịrị ike imepụta njikọ dị mkpa nke na-enye ikike ohere dị mkpa maka ngwa Spark iji soro Kubernetes API na-emekọrịta ihe (ọ bụrụ na ọ dị mkpa). N'ọnọdụ anyị, ngwa ahụ chọrọ ikike ịmepụta Pods. Ka anyị mepụta njikọ dị mkpa:
Ọ dịkwa mma ịmara na nkọwapụta ngosipụta a nwere ike ịgụnye paramita "hadoopConfigMap", nke na-enye gị ohere ịkọwapụta ConfigMap nwere nhazi Hadoop na-ebughị ụzọ tinye faịlụ kwekọrọ na onyonyo Docker. Ọ dịkwa mma maka ịrụ ọrụ mgbe niile - site na iji paramita "nhazi oge", enwere ike ịkọwa usoro maka ịrụ ọrụ enyere.
Mgbe nke ahụ gasịrị, anyị na-echekwa akwụkwọ akụkọ anyị na faịlụ spark-pi.yaml wee tinye ya na ụyọkọ Kubernetes anyị:
oc apply -f spark-pi.yaml
Nke a ga-emepụta ihe ụdị "sparkapplications":
oc get sparkapplications -n {project}
> NAME AGE
> spark-pi 22h
N'okwu a, a ga-emepụta pọd nwere ngwa, ọnọdụ nke a ga-egosipụta na "sparkapplications" emepụtara. Ị nwere ike ịlele ya site na iwu a:
oc get sparkapplications spark-pi -o yaml -n {project}
Mgbe arụchara ọrụ ahụ, POD ga-aga na ọnọdụ "Emechara", nke ga-emelitekwa na "sparkapplications". Enwere ike ịlele ndekọ ngwa na ihe nchọgharị ma ọ bụ jiri iwu na-esonụ (ebe a {sparkapplications-pod-name} bụ aha pod nke ọrụ na-agba ọsọ):
oc logs {sparkapplications-pod-name} -n {project}
Enwere ike ijikwa ọrụ spark site na iji sparkctl pụrụ iche. Iji wụnye ya, mechie ebe nchekwa ahụ na koodu isi mmalite ya, wụnye Go wee wuo ike a:
git clone https://github.com/GoogleCloudPlatform/spark-on-k8s-operator.git
cd spark-on-k8s-operator/
wget https://dl.google.com/go/go1.13.3.linux-amd64.tar.gz
tar -xzf go1.13.3.linux-amd64.tar.gz
sudo mv go /usr/local
mkdir $HOME/Projects
export GOROOT=/usr/local/go
export GOPATH=$HOME/Projects
export PATH=$GOPATH/bin:$GOROOT/bin:$PATH
go -version
cd sparkctl
go build -o sparkctl
sudo mv sparkctl /usr/local/bin
Ka anyị lelee ndepụta ihe omume nke ọrụ Spark ewepụtara:
sparkctl event spark-pi -n {project} -f
Ka anyị lelee ọnọdụ ọrụ Spark na-agba ọsọ:
sparkctl status spark-pi -n {project}
N'ikpeazụ, ọ ga-amasị m ịtụle ọghọm ndị achọpụtara na iji ụdị Spark (2.4.5) kwụsiri ike ugbu a na Kubernetes:
Nke mbụ na, ikekwe, nnukwu ọghọm bụ enweghị Data Mpaghara. N'agbanyeghị adịghị ike niile nke YARN, enwerekwa uru dị na iji ya, dịka ọmụmaatụ, ụkpụrụ nke ịnyefe koodu na data (kama data na koodu). N'ihi ya, a na-arụ ọrụ Spark na ọnụ ebe data etinyere na mgbakọ ahụ dị, na oge ọ na-ewe iji nyefee data na netwọk ahụ belatara nke ukwuu. Mgbe anyị na-eji Kubernetes, anyị na-eche mkpa ịkwaga data na-etinye aka na ọrụ n'ofe netwọkụ ahụ. Ọ bụrụ na ha buru oke ibu, oge mmebe ọrụ nwere ike ịbawanye nke ukwuu, ma chọkwara nnukwu ohere diski ekenye na oge ọrụ Spark maka nchekwa nwa oge ha. Enwere ike ibelata ọghọm a site na iji sọftụwia pụrụ iche nke na-eme ka mpaghara data dị na Kubernetes (dịka ọmụmaatụ, Alluxio), mana nke a pụtara n'ezie mkpa ịchekwa data zuru oke na ọnụ nke ụyọkọ Kubernetes.
Ihe ọghọm nke abụọ dị mkpa bụ nchekwa. Site na ndabara, atụmatụ ndị metụtara nchekwa gbasara ịgba ọsọ Spark agaghị enwe nkwarụ, ekpuchighị iji Kerberos na akwụkwọ gọọmentị (n'agbanyeghị na ewepụtara nhọrọ kwekọrọ na ụdị 3.0.0, nke ga-achọ ọrụ ọzọ), yana akwụkwọ nchekwa maka iji Spark (https://spark.apache.org/docs/2.4.5/security.html) naanị YARN, Mesos na Standalone Cluster pụtara dị ka ụlọ ahịa isi. N'otu oge ahụ, onye ọrụ a na-arụ ọrụ Spark enweghị ike ịkọwa kpọmkwem - naanị anyị na-akọwapụta akaụntụ ọrụ nke ọ ga-arụ ọrụ, ma họrọ onye ọrụ dabere na atumatu nchekwa ahaziri. N'akụkụ a, a na-eji onye ọrụ mgbọrọgwụ eme ihe, nke na-adịghị mma na gburugburu ebe obibi na-arụpụta ihe, ma ọ bụ onye ọrụ nwere UID na-enweghị ihe ọ bụla, nke na-adịghị mma mgbe ị na-ekesa ikike ịnweta data (nke a nwere ike idozi site na ịmepụta PodSecurityPolicies na ijikọta ha na . akaụntụ ọrụ kwekọrọ). Ugbu a, ihe ngwọta bụ itinye faịlụ niile dị mkpa ozugbo na onyonyo Docker, ma ọ bụ gbanwee edemede mmalite Spark iji usoro maka ịchekwa na iweghachite ihe nzuzo anabatara na nzukọ gị.
Na-arụ ọrụ Spark na-eji Kubernetes ka nọ na nnwale yana enwere ike inwe mgbanwe dị ukwuu na arịa ndị ejiri (faịlụ nhazi, ihe onyonyo Docker, na edemede mmalite) n'ọdịnihu. Na n'ezie, mgbe ị na-akwadebe ihe ahụ, a nwalere nsụgharị 2.3.0 na 2.4.5, omume ahụ dị nnọọ iche.
Ka anyị chere maka mmelite - ụdị ọhụrụ nke Spark (3.0.0) ka ewepụtara n'oge na-adịbeghị anya, nke wetara mgbanwe dị ukwuu na ọrụ Spark na Kubernetes, mana jigidere ọnọdụ nnwale nke nkwado maka onye njikwa akụ a. Ikekwe mmelite na-esote ga-eme ka o kwe omume ịkwado ịhapụ YARN na ịrụ ọrụ Spark na Kubernetes n'atụghị egwu maka nchekwa nke sistemu gị na-enweghị mkpa ịmegharị ihe ndị na-arụ ọrụ n'onwe ya.