Hello, dadka deggan Khabrovsk. Sidaan horeyba u qornay, bishaan OTUS waxay bilaabaysaa laba koorso oo barashada mashiinka hal mar, kuwaas oo kala ah saldhig ΠΈ horumarsan. Marka tan la eego, waxaan sii wadeynaa inaan wadaagno waxyaabo waxtar leh.
Ujeedada maqaalkani waa inaan ka hadalno khibradeena ugu horeysay ee isticmaalka MLflow.
Waxaan bilaabi doonaa dib u eegista MLflow Laga soo bilaabo server-keeda raadraaca oo gal dhammaan mareegta daraasadda. Markaa waxaanu wadaagi doonaa khibradeena ku xidhidhiyaha Spark iyo MLflow anagoo adeegsanayna UDF.
Dulucda
Waan ku jirnaa Caafimaadka Alpha Waxaan isticmaalnaa barashada mashiinka iyo sirdoonka macmalka ah si aan awood ugu siinno dadka inay masuul ka noqdaan caafimaadkooda iyo fayoobidooda. Taasi waa sababta moodooyinka barashada mashiinka ay xudunta u yihiin alaabada sayniska xogta ee aan horumarinay, waana sababta naloo soo jiitay MLflow, oo ah goob il furan oo daboolaysa dhammaan dhinacyada barashada mashiinka nolosha meertada.
MLflow
Hadafka ugu weyn ee MLflow waa in la bixiyo lakab dheeri ah oo ku saabsan barashada mashiinka taas oo u oggolaanaysa saynisyahannada xogta inay la shaqeeyaan ku dhawaad ββmaktabad kasta oo barashada mashiinka (h2o, keras, xumeyn, tooshka, sklearsan ΠΈ tensorflow), iyada oo shaqadeeda gaadhsiinaysa heer kale.
daydo - qaab caadi ah oo loogu talagalay soo gudbinta moodooyinka qalabka kala duwan ee geynta.
MLflow ( alfa wakhtiga qorista) waa goob furan oo kuu ogolaanaysa inaad maarayso mashiinka barashada meertada nolosha, oo ay ku jirto tijaabinta, dib u isticmaalida, iyo geynta
Dejinta MLflow
Si aad u isticmaasho MLflow waxaad u baahan tahay inaad marka hore dejiso deegaankaaga Python oo dhan, tan ayaan u isticmaali doonaa PyEnv (si aad Python ugu rakibto Mac, eeg halkan). Sidan ayaan ku abuuri karnaa jawi macmal ah halkaas oo aan ku rakibi doono dhammaan maktabadaha lagama maarmaanka u ah in lagu socodsiiyo.
```
pyenv install 3.7.0
pyenv global 3.7.0 # Use Python 3.7
mkvirtualenv mlflow # Create a Virtual Env with Python 3.7
workon mlflow
```
Xusuusin: Waxaan isticmaalnaa PyArrow si aan u socodsiino moodooyinka sida UDF. Noocyada PyArrow iyo Numpy waxay u baahdeen in la hagaajiyo sababtoo ah noocyada dambe way isku dhaceen midba midka kale.
Bilaw Dabagalka UI
MLflow Tracking waxay noo ogolaataa inaan galno oo waydiino tijaabooyinka anagoo adeegsanayna Python iyo REST API. Intaa waxaa dheer, waxaad go'aamin kartaa meesha aad ku kaydin karto moodooyinka artifacts (localhost, Amazon S3, Kaydinta Blob Azure, Keydka Google Cloud ama Adeegga SFTP). Maadaama aan u isticmaalno AWS ee Alpha Health, kaydinta farshaxankeena waxay noqon doontaa S3.
# Running a Tracking Server
mlflow server
--file-store /tmp/mlflow/fileStore
--default-artifact-root s3://<bucket>/mlflow/artifacts/
--host localhost
--port 5000
MLflow waxay ku talinaysaa isticmaalka kaydinta faylka joogtada ah. Kaydinta feylku waa halka uu seerfarku ku kaydin doono socodsiinta oo tijaabin doono xogta badan. Markaad bilaabayso server-ka, hubi inay tilmaamayso kaydka faylka joogtada ah. Halkan tijaabada waxaan si fudud u isticmaali doonaa /tmp.
Xusuusnow haddii aan rabno inaan isticmaalno server-ka mlflow si aan u wadno tijaabooyin hore, waa inay ku jiraan kaydinta faylka. Si kastaba ha ahaatee, xitaa tan la'aanteed waxaan u isticmaali karnaa iyaga gudaha UDF, maadaama aan u baahanahay kaliya wadada loo maro qaabka.
Fiiro gaar ah: Maskaxda ku hay in Dabagalka UI iyo macmiilka moodelku ay tahay inay galaangal u yeeshaan goobta farshaxanimada. Taasi waa, iyada oo aan loo eegin xaqiiqda ah in Tracking UI uu ku nool yahay tusaale ahaan EC2, marka uu ku shaqeynayo MLflow gudaha, mashiinku waa inuu si toos ah u galo S3 si uu u qoro moodooyinka artifact.
Dabagalka UI waxa ay ku kaydisaa agabka baaldiga S3
Moodooyinka socda
Sida ugu dhakhsaha badan ee server-ka raadraaca uu shaqeeyo, waxaad bilaabi kartaa tababarka moodooyinka.
Tusaale ahaan, waxaan u isticmaali doonaa beddelka khamriga ee tusaalaha MLflow gudaha Sklearn.
Sidaan horeyba uga hadalnay, MLflow wuxuu kuu oggolaanayaa inaad gasho cabbirada moodeelka, cabbirka, iyo farshaxannada si aad ula socotid sida ay u kobcayaan soo noqnoqoshada. Habkani aad buu faa'iido u leeyahay sababtoo ah sidan ayaan u soo saari karnaa qaabka ugu fiican anagoo la xiriirnayna server-ka raadraaca ama fahamka koodka soo dhameystiray soo celinta loo baahan yahay iyadoo la isticmaalayo git hash logs of commitments.
Si aan u bixinno moodalka server-ka, waxaan u baahanahay server raadraaca socda (eeg interface interface) iyo Run ID ee moodeelka.
Orod aqoonsiga
# Serve a sklearn model through 127.0.0.0:5005
MLFLOW_TRACKING_URI=http://0.0.0.0:5000 mlflow sklearn serve
--port 5005
--run_id 0f8691808e914d1087cf097a08730f17
--model-path model
Si aan ugu adeegno moodooyinka isticmaalaya MLflow waxay u adeegaan shaqeynta, waxaan u baahan doonaa marin u hel Tracking UI si aan u helno macluumaadka ku saabsan moodeelka si fudud anagoo cadeynayna --run_id.
Marka moodeelku uu la xiriiro server-ka Dabagalka, waxaan heli karnaa nambar cusub oo dhamaadka ah.
In kasta oo xaqiiqda ah in server-ku uu yahay mid awood leh oo ku filan inuu u adeego moodooyinka waqtiga dhabta ah, tababaro oo isticmaal shaqeynta serverka ( isha: mlflow // docs // moodooyinka # maxaliga ah), Isticmaalka Spark ( Dufcaddii ama streaming) waa xal ka sii xoog badan sababtoo ah qaybinta.
Bal qiyaas haddii aad kaliya tababarka ku samaysay khadka tooska ah ka dibna aad isticmaashay qaabka wax soo saarka dhammaan xogtaada. Tani waa meesha Spark iyo MLflow ay ka iftiimaan.
Si aan u tuso sida aan ugu dabaqno moodooyinka MLflow ee Spark dataframes, waxaan u baahanahay inaan dejino buugaagta xusuus-qorka ee Jupyter si aan ula shaqeyno PySpark.
Ku bilow inaad ku rakibto nooca ugu dambeeyay ee xasilloon Apache Spark:
cd ~/Downloads/
tar -xzf spark-2.4.3-bin-hadoop2.7.tgz
mv ~/Downloads/spark-2.4.3-bin-hadoop2.7 ~/
ln -s ~/spark-2.4.3-bin-hadoop2.7 ~/sparkΜ
Isagoo go'aansaday notebook-dir, waxaan ku kaydin karnaa buugaagteena xusuus qorka galka la rabo.
Ka bilaabaya Jupyter PySpark
Maadaama aan awoodnay inaan Jupiter u habeyno darawal PySpark ah, waxaan hadda ku socodsiin karnaa Jupyter notebook macnaha PySpark.
(mlflow) afranzi:~$ pyspark
[I 19:05:01.572 NotebookApp] sparkmagic extension enabled!
[I 19:05:01.573 NotebookApp] Serving notebooks from local directory: /Users/afranzi/Projects/notebooks
[I 19:05:01.573 NotebookApp] The Jupyter Notebook is running at:
[I 19:05:01.573 NotebookApp] http://localhost:8888/?token=c06252daa6a12cfdd33c1d2e96c8d3b19d90e9f6fc171745
[I 19:05:01.573 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 19:05:01.574 NotebookApp]
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://localhost:8888/?token=c06252daa6a12cfdd33c1d2e96c8d3b19d90e9f6fc171745
Sida kor ku xusan, MLflow waxay bixisaa sifo loogu talagalay qorista naqshadaha moodada ee S3. Isla marka aan gacanta ku hayno qaabka la doortay, waxaan fursad u haysanaa inaan u soo dejino UDF ahaan anagoo adeegsanayna moduleka mlflow.pyfunc.
Ilaa hadda, waxaan ka hadalnay sida loo isticmaalo PySpark leh MLflow, oo ku shaqeynaya saadaasha tayada khamriga ee dhammaan xogta khamriga. Laakiin maxaa dhacaya haddii aad u baahan tahay inaad isticmaasho modules Python MLflow ka Scala Spark?
Waxaan tijaabinay tan anagoo kala qaybinayna macnaha Spark inta u dhaxaysa Scala iyo Python. Taasi waa, waxaan ka diiwaan gashanay MLflow UDF ee Python, waxaanan ka isticmaalnay Scala (haa, malaha maaha xalka ugu fiican, laakiin waxa aan haysano).
Scala Spark + MLflow
Tusaalahan waxaan ku dari doonaa Toree Kernel galay Jupiter-ka jira.
Ku rakib Spark + Toree + Jupyter
pip install toree
jupyter toree install --spark_home=${SPARK_HOME} --sys-prefix
jupyter kernelspec list
```
```
Available kernels:
apache_toree_scala /Users/afranzi/.virtualenvs/mlflow/share/jupyter/kernels/apache_toree_scala
python3 /Users/afranzi/.virtualenvs/mlflow/share/jupyter/kernels/python3
```
Sida aad ka arki karto buug-yaraha ku lifaaqan, UDF waxa ay wadaagaan Spark iyo PySpark. Waxaan rajeyneynaa in qaybtan ay faa'iido u yeelan doonto kuwa jecel Scala oo raba inay geeyaan moodooyinka barashada mashiinka wax soo saarka.
Sahankan MLflow ka dib, waxaan ku kalsoonahay inaan horay u socono oo aan u isticmaali doono dhuumahayada Spark iyo nidaamyada talo bixinta.
Way fiicnaan lahayd in la meel dhigo kaydinta faylka iyo kaydka kaydka halkii nidaamka faylka. Tani waa inay ina siinaysaa meelo badan oo dhamaadka ah oo isticmaali kara kaydinta fayl isku mid ah. Tusaale ahaan, isticmaal xaalado badan Horayba ΠΈ Athena oo leh metastore xabag la mid ah.
Isku soo wada duuboo, waxaan jeclaan lahaa inaan dhaho waad ku mahadsan tahay bulshada MLFlow sida aad shaqadayada xogta uga dhigtay mid xiiso badan.
Haddii aad ku ciyaareyso agagaarka MLflow, ha ka labalabeyn inaad noo soo qorto oo noo sheegto sida aad u isticmaasho, iyo xitaa si ka sii badan haddii aad u isticmaasho wax soo saarka.