Kukonza Spark pa YARN

Habr, moni! Dzulo pa msonkhano woperekedwa ku Apache Spark, kuchokera kwa anyamata ochokera ku Rambler&Co, panali mafunso ambiri kuchokera kwa omwe akutenga nawo gawo okhudzana ndi kukonza chida ichi. Tinaganiza zotsatira mapazi ake ndi kumuuza zomwe takumana nazo. Mutuwu si wophweka - kotero tikukupemphani kuti mugawane zomwe mwakumana nazo mu ndemanga, mwinamwake ifenso timamvetsetsa ndikugwiritsa ntchito chinachake cholakwika.

Chiyambi chaching'ono cha momwe timagwiritsira ntchito Spark. Tili ndi pulogalamu ya miyezi itatu "Big Data Specialist", ndipo mu gawo lachiwiri lonse ophunzira athu agwiritsa ntchito chida ichi. Chifukwa chake, ntchito yathu, monga okonza, ndikukonza gululo kuti ligwiritsidwe ntchito pazochitika zotere.

Chodabwitsa chakugwiritsa ntchito kwathu ndikuti chiwerengero cha anthu omwe amagwira ntchito nthawi imodzi pa Spark chikhoza kukhala chofanana ndi gulu lonse. Mwachitsanzo, pa semina, pamene aliyense ayesa chinachake panthawi imodzimodzi ndikubwereza pambuyo pa mphunzitsi wathu. Ndipo izi sizochuluka - nthawi zina mpaka anthu 40. Mwina palibe makampani ambiri padziko lapansi omwe amakumana ndi vuto lotere.

Kenako, ndikuuzani momwe komanso chifukwa chake tidasankha magawo ena osinthira.

Tiyeni tiyambe kuyambira pachiyambi. Spark ili ndi njira zitatu zoyendetsera gulu: yoyimirira, kugwiritsa ntchito Mesos, ndi kugwiritsa ntchito YARN. Tinasankha kusankha njira yachitatu chifukwa inali yomveka kwa ife. Tili ndi gulu la hadoop kale. Ophunzira athu akudziwa kale kamangidwe kake. Tiyeni tigwiritse ntchito YARN.

spark.master=yarn

Zinanso zosangalatsa. Iliyonse mwa njira zitatuzi zotumizira ili ndi njira ziwiri zotumizira: kasitomala ndi gulu. Zochokera zolemba ndi maulalo osiyanasiyana pa intaneti, tinganene kuti kasitomala ndi oyenera ntchito zokambirana - mwachitsanzo, kudzera jupyter notebook, ndi masango ndi oyenera kwambiri zothetsera kupanga. M'malo athu, tinali ndi chidwi ndi ntchito yolumikizana, chifukwa chake:

spark.deploy-mode=client

Kawirikawiri, kuyambira pano Spark idzagwira ntchito pa WARN, koma izi sizinali zokwanira kwa ife. Popeza tili ndi pulogalamu yokhudzana ndi deta yayikulu, nthawi zina omwe adatenga nawo gawo analibe zokwanira zomwe zidapezedwa mkati mwa kudulidwa kwazinthu. Kenako tidapeza chinthu chosangalatsa - kugawa kwazinthu zosinthika. Mwachidule, mfundo ndi iyi: ngati muli ndi ntchito yovuta ndipo masango ndi yaulere (mwachitsanzo, m'mawa), ndiye kuti kugwiritsa ntchito njirayi Spark kungakupatseni zina zowonjezera. Kufunika kumawerengeredwa pamenepo motsatira njira yochenjera. Sitidzapita mwatsatanetsatane - zimagwira ntchito bwino.

spark.dynamicAllocation.enabled=true

Tidayika izi, ndipo poyambira Spark idagwa ndipo sinayambe. Ndiko kulondola, chifukwa ndinayenera kuwerenga zolemba mosamala kwambiri. Ikunena kuti kuti chilichonse chikhale bwino, muyeneranso kuyambitsanso parameter yowonjezera.

spark.shuffle.service.enabled=true

Chifukwa chiyani kuli kofunikira? Pamene ntchito yathu sikufunikanso zinthu zambiri, Spark ayenera kuzibwezera ku dziwe wamba. Gawo lotenga nthawi kwambiri pafupifupi ntchito iliyonse ya MapReduce ndi gawo la Shuffle. Parameter iyi imakulolani kuti musunge deta yomwe imapangidwa panthawiyi ndikumasula otsogolera moyenerera. Ndipo executor ndi njira yomwe imawerengera zonse za wogwira ntchito. Ili ndi chiwerengero cha ma processor cores ndi kukumbukira kwina.

Parameter iyi yawonjezedwa. Zonse zinkaoneka kuti zikuyenda bwino. Zinadziwika kuti otenga nawo mbali amapatsidwa zinthu zambiri akafuna. Koma vuto lina linabuka - panthawi ina ena adadzuka ndipo amafunanso kugwiritsa ntchito Spark, koma zonse zinali zotanganidwa kumeneko, ndipo sanali osangalala. Iwo akhoza kumveka. Tinayamba kuyang'ana zolembazo. Zinapezeka kuti pali magawo ena angapo omwe angagwiritsidwe ntchito kukopa ndondomekoyi. Mwachitsanzo, ngati woperekayo ali mu standby mode, pambuyo pa nthawi yanji zomwe zingagwiritsidwe ntchito?

spark.dynamicAllocation.executorIdleTimeout=120s

Kwa ife, ngati opha anu sachita chilichonse kwa mphindi ziwiri, chonde abwezereni ku dziwe wamba. Koma chizindikiro ichi sichinali chokwanira nthawi zonse. Zinali zoonekeratu kuti munthuyo sanali kuchita chilichonse kwa nthawi yaitali, ndipo chuma sichinali kumasulidwa. Zinapezeka kuti palinso chizindikiro chapadera - pambuyo pa nthawi yoti musankhe otsogolera omwe ali ndi deta yosungidwa. Mwachikhazikitso, parameter iyi inali yopanda malire! Tinakonza.

spark.dynamicAllocation.cachedExecutorIdleTimeout=600s

Ndiye kuti, ngati otsogolera anu sachita chilichonse kwa mphindi 5, apatseni ku dziwe wamba. Munjira iyi, kuthamanga kwa kutulutsa ndi kutulutsa zinthu kwa ogwiritsa ntchito ambiri kwakhala koyenera. Kuchuluka kwa kusakhutira kwachepa. Koma tidaganiza zopitira patsogolo ndikuchepetsa kuchuluka kwa ochita pa pulogalamu iliyonse - makamaka pa aliyense wotenga nawo gawo.

spark.dynamicAllocation.maxExecutors=19

Tsopano, ndithudi, pali anthu osakhutira kumbali ina - "tsango liribe kanthu, ndipo ndili ndi oweruza 19 okha," koma mungachite chiyani? Simungasangalatse aliyense.

Ndipo nkhani ina yaying'ono yokhudzana ndi zenizeni za nkhani yathu. Mwanjira ina, anthu angapo anachedwa pa phunziro lothandiza, ndipo pazifukwa zina Spark sanayambire kwa iwo. Tinayang'ana kuchuluka kwa zida zaulere - zikuwoneka kuti zilipo. Spark iyenera kuyamba. Mwamwayi, pofika nthawi imeneyo zolembazo zinali zitawonjezedwa kale ku subcortex kwinakwake, ndipo tinakumbukira kuti pamene tidakhazikitsidwa, Spark imayang'ana doko loyambira. Ngati doko loyamba mumtunduwo liri lotanganidwa, limasunthira ku lotsatira mwadongosolo. Ngati ndi yaulere, imagwira. Ndipo pali parameter yomwe ikuwonetsa kuchuluka kwa kuyesa kwa izi. Osasintha ndi 16. Chiwerengerocho ndi chocheperapo ndi chiwerengero cha anthu a m’gulu lathu m’kalasi. Chifukwa chake, atayesa 16, Spark adasiya nati sindingathe kuyamba. Takonza izi.

spark.port.maxRetries=50

Kenako ndikuuzeni za zosintha zina zomwe sizikugwirizana kwambiri ndi zomwe zili patsamba lathu.

Kuti muyambitse Spark mwachangu, tikulimbikitsidwa kusungitsa chikwatu cha mitsuko chomwe chili mu SPARK_HOME chikwatu chakunyumba ndikuchiyika pa HDFS. Ndiye sadzataya nthawi kukweza jarnik ndi antchito.

spark.yarn.archive=hdfs:///tmp/spark-archive.zip

Ndikulimbikitsidwanso kugwiritsa ntchito kryo ngati serializer kuti mugwire ntchito mwachangu. Ndiwokometsedwa kwambiri kuposa wokhazikika.

spark.serializer=org.apache.spark.serializer.KryoSerializer

Ndipo palinso vuto lomwe lakhalapo kwanthawi yayitali ndi Spark lomwe nthawi zambiri limagwa pamtima. Nthawi zambiri izi zimachitika panthawi yomwe antchito awerengera zonse ndikutumiza zotsatira kwa dalaivala. Tidazikulitsa izi. Mwachikhazikitso, ndi 1GB, tinapanga 3.

spark.driver.maxResultSize=3072

Ndipo potsiriza, monga mchere. Momwe mungasinthire Spark ku mtundu 2.1 pakugawa kwa HortonWorks - HDP 2.5.3.0. Mtundu uwu wa HDP uli ndi mtundu wa 2.0 wokhazikitsidwa kale, koma tidadzipangira tokha kuti Spark ikupanga mwachangu, ndipo mtundu uliwonse watsopano umakonza zolakwika zina kuphatikiza zowonjezera, kuphatikiza python API, kotero tinaganiza , zomwe zikufunika kuchitidwa ndikusintha.

Tsitsani mtunduwo kuchokera patsamba lovomerezeka la Hadoop 2.7. Anatsegula ndikuyika mufoda ya HDP. Tinayika ma symlink ngati pakufunika. Timaziyambitsa - sizimayamba. Akulemba cholakwika chosadziwika bwino.

java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig

Pambuyo pa googling, tidazindikira kuti Spark adaganiza zodikirira mpaka Hadoop atabadwa, ndipo adaganiza zogwiritsa ntchito mtundu watsopano wa jersey. Iwonso amatsutsana wina ndi mnzake pamutuwu mu JIRA. Yankho lake linali kutsitsa mtundu wa jeresi 1.17.1. Ikani izi mu chikwatu cha mitsuko mu SPARK_HOME, zipninso ndikuzikweza ku HDFS.

Tidakumana ndi cholakwika ichi, koma china chatsopano komanso chosinthika chidabuka.

org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master

Nthawi yomweyo, timayesetsa kuyendetsa mtundu 2.0 - zonse zili bwino. Yesani kulingalira zomwe zikuchitika. Tidayang'ana zolemba za pulogalamuyi ndikuwona izi:

/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar

Kawirikawiri, pazifukwa zina hdp.version sinathetse. Pambuyo pa googling, tinapeza yankho. Muyenera kupita ku zoikamo za YARN ku Ambari ndikuwonjezera parameter pamenepo kuti mupange ulusi wamba:

hdp.version=2.5.3.0-37

Matsenga awa adathandiza, ndipo Spark adanyamuka. Tinayesa ma laputopu athu angapo a jupyter. Zonse zikuyenda. Takonzekera phunziro loyamba la Spark Loweruka (mawa)!

DUP. Pa nthawi ya phunzirolo, vuto lina linaonekera. Panthawi ina, YARN idasiya kupereka zotengera za Spark. Mu YARN kunali koyenera kukonza parameter, yomwe mwachisawawa inali 0.2:

yarn.scheduler.capacity.maximum-am-resource-percent=0.8

Ndiye kuti, 20% yokha yazinthu zomwe zidatenga nawo gawo pakugawa zinthu. Titasintha magawo, tidakwezanso YARN. Vutoli lidathetsedwa ndipo ena onse adatha kuyambitsanso nkhani ya spark.

Source: www.habr.com

Kuwonjezera ndemanga