Sawubona, izakhamuzi zaseKhabrovsk. Njengoba sesike sabhala, kule nyanga i-OTUS yethula izifundo ezimbili zokufunda ngomshini ngesikhathi esisodwa, okungukuthi
Inhloso yalesi sihloko ukukhuluma ngesipiliyoni sethu sokuqala sisebenzisa
Sizoqala isibuyekezo
Umongo
Singaphakathi
I-MLflow
Umgomo oyinhloko we-MLflow ukuhlinzeka ngesendlalelo esengeziwe phezu kokufundwa komshini esingavumela ososayensi bedatha ukuthi basebenze cishe nanoma yimuphi umtapo wolwazi wokufunda womshini (
I-MLflow inikeza izingxenye ezintathu:
- Tracking - ukurekhoda nezicelo zokuhlolwa: ikhodi, idatha, ukumisa kanye nemiphumela. Ukuqapha inqubo yokudala imodeli kubaluleke kakhulu.
- Projects - Ifomethi yokupakisha ezosebenza kunoma iyiphi iplatifomu (isb.
I-SageMaker ) - models - ifomethi evamile yokuhambisa amamodeli kumathuluzi ahlukahlukene okuthunyelwa.
I-MLflow (ku-alpha ngesikhathi sokubhala) iyinkundla yomthombo ovulekile ekuvumela ukuthi ulawule umjikelezo wokuphila wokufunda komshini, okuhlanganisa ukuhlola, ukusebenzisa kabusha, kanye nokusetshenziswa.
Isetha i-MLflow
Ukuze usebenzise i-MLflow udinga kuqala usethe yonke indawo yakho yePython, kulokhu sizokusebenzisa
```
pyenv install 3.7.0
pyenv global 3.7.0 # Use Python 3.7
mkvirtualenv mlflow # Create a Virtual Env with Python 3.7
workon mlflow
```
Masifake imitapo yolwazi edingekayo.
```
pip install mlflow==0.7.0
Cython==0.29
numpy==1.14.5
pandas==0.23.4
pyarrow==0.11.0
```
Qaphela: Sisebenzisa i-PyArrow ukusebenzisa amamodeli afana ne-UDF. Izinguqulo ze-PyArrow ne-Numpy zazidinga ukulungiswa ngoba izinguqulo zakamuva zazingqubuzana zodwa.
Yethula i-UI yokulandela ngomkhondo
Ukulandelela kwe-MLflow kusivumela ukuthi singene futhi sibuze izivivinyo sisebenzisa i-Python kanye
# Running a Tracking Server
mlflow server
--file-store /tmp/mlflow/fileStore
--default-artifact-root s3://<bucket>/mlflow/artifacts/
--host localhost
--port 5000
I-MLflow incoma ukusebenzisa isitoreji sefayela esiqhubekayo. Isitoreji sefayela yilapho iseva izogcina khona imethadatha esebenzayo neyokuhlola. Uma uqala iseva, qiniseka ukuthi ikhomba esitolo samafayela eziphikelelayo. Lapha ngokuhlolwa sizomane sisebenzise /tmp
.
Khumbula ukuthi uma sifuna ukusebenzisa iseva ye-mlflow ukuze senze izivivinyo ezindala, kufanele zibe khona endaweni yokugcina ifayela. Nokho, nangaphandle kwalokhu besingawasebenzisa ku-UDF, njengoba sidinga indlela eya kumodeli kuphela.
Qaphela: Khumbula ukuthi i-UI yokulandelela kanye neklayenti eliyimodeli kufanele bakwazi ukufinyelela indawo ye-artifact. Okungukuthi, kungakhathaliseki ukuthi i-UI Yokulandelela ihlala kusenzakalo se-EC2, lapho usebenzisa i-MLflow endaweni, umshini kufanele ube nokufinyelela okuqondile ku-S3 ukuze ubhale amamodeli e-artifact.
Ukulandelela i-UI kugcina ama-artifact ebhakedeni le-S3
Amamodeli asebenzayo
Ngokushesha nje lapho iseva yokulandelela isebenza, ungaqala ukuqeqesha amamodeli.
Njengesibonelo, sizosebenzisa ukuguqulwa kwewayini kusuka kusibonelo se-MLflow ku
MLFLOW_TRACKING_URI=http://localhost:5000 python wine_quality.py
--alpha 0.9
--l1_ration 0.5
--wine_file ./data/winequality-red.csv
Njengoba sesixoxile kakade, i-MLflow ikuvumela ukuthi ungene kumapharamitha wemodeli, ama-metrics, nama-artifacts ukuze ukwazi ukulandelela ukuthi avela kanjani ngokuphindaphinda. Lesi sici siwusizo kakhulu ngoba ngale ndlela singakwazi ukukhiqiza kabusha imodeli engcono kakhulu ngokuthinta iseva Yokulandelela noma ukuqonda ukuthi iyiphi ikhodi ephindaphindayo edingekayo sisebenzisa amalogi we-git hash wokuzibophezela.
with mlflow.start_run():
... model ...
mlflow.log_param("source", wine_path)
mlflow.log_param("alpha", alpha)
mlflow.log_param("l1_ratio", l1_ratio)
mlflow.log_metric("rmse", rmse)
mlflow.log_metric("r2", r2)
mlflow.log_metric("mae", mae)
mlflow.set_tag('domain', 'wine')
mlflow.set_tag('predict', 'quality')
mlflow.sklearn.log_model(lr, "model")
Ukuphindaphinda kwewayini
Ingxenye yeseva yemodeli
Iseva yokulandelela i-MLflow, eyethulwe kusetshenziswa umyalo othi βmlflow serverβ, ine-REST API yokulandelela ukugijima nokubhala idatha ohlelweni lwamafayela wendawo. Ungacacisa ikheli leseva yokulandelela usebenzisa okuguquguqukayo kwendawo ethi βMLFLOW_TRACKING_URIβ futhi i-MLflow Tracking API izoxhumana ngokuzenzakalela neseva yokulandelela kuleli kheli ukuze udale/uthole ulwazi lokuqalisa, amamethrikhi wokungena, njll.
Ukuze sinikeze imodeli ngeseva, sidinga iseva yokulandelela esebenzayo (bona isixhumi esibonakalayo sokuqalisa) kanye ne-Run ID yemodeli.
Qalisa i-ID
# Serve a sklearn model through 127.0.0.0:5005
MLFLOW_TRACKING_URI=http://0.0.0.0:5000 mlflow sklearn serve
--port 5005
--run_id 0f8691808e914d1087cf097a08730f17
--model-path model
Ukuze sinikeze amamodeli sisebenzisa ukusebenza kwe-MLflow, sizodinga ukufinyelela ku-UI Yokulandelela ukuze sithole ulwazi mayelana nemodeli ngokumane sicacise. --run_id
.
Uma imodeli isithinta iseva yokulandelela, singathola iphoyinti lokugcina lemodeli.
# Query Tracking Server Endpoint
curl -X POST
http://127.0.0.1:5005/invocations
-H 'Content-Type: application/json'
-d '[
{
"fixed acidity": 3.42,
"volatile acidity": 1.66,
"citric acid": 0.48,
"residual sugar": 4.2,
"chloridessssss": 0.229,
"free sulfur dsioxide": 19,
"total sulfur dioxide": 25,
"density": 1.98,
"pH": 5.33,
"sulphates": 4.39,
"alcohol": 10.8
}
]'
> {"predictions": [5.825055635303461]}
Amamodeli agijimayo avela ku-Spark
Ngaphandle kweqiniso lokuthi iseva yokulandelela inamandla ngokwanele ukuthi inikeze amamodeli ngesikhathi sangempela, yiqeqeshe futhi usebenzise ukusebenza kweseva (umthombo:
Cabanga ukuthi uvele wenze ukuqeqeshwa ungaxhunyiwe ku-inthanethi wabe esesebenzisa imodeli yokuphuma kuyo yonke idatha yakho. Lapha yilapho i-Spark ne-MLflow kukhanya khona.
Faka i-PySpark + Jupyter + Spark
Source:
Qalisa i-PySpark - Jupyter
Ukuze sibonise ukuthi siwasebenzisa kanjani amamodeli e-MLflow kuma-dataframe e-Spark, sidinga ukusetha amabhukumaka e-Jupyter ukuze sisebenze ndawonye ne-PySpark.
Qala ngokufaka inguqulo yakamuva ezinzile
cd ~/Downloads/
tar -xzf spark-2.4.3-bin-hadoop2.7.tgz
mv ~/Downloads/spark-2.4.3-bin-hadoop2.7 ~/
ln -s ~/spark-2.4.3-bin-hadoop2.7 ~/sparkΜ
Faka i-PySpark ne-Jupyter endaweni ebonakalayo:
pip install pyspark jupyter
Setha okuguquguqukayo kwemvelo:
export SPARK_HOME=~/spark
export PATH=$SPARK_HOME/bin:$PATH
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --notebook-dir=${HOME}/Projects/notebooks"
Ngemva kokunquma notebook-dir
, singagcina izincwadi zethu zokubhalela kufolda esiyifunayo.
Kwethulwa i-Jupyter kusuka ku-PySpark
Njengoba sikwazile ukumisa i-Jupiter njengomshayeli we-PySpark, manje sesingakwazi ukusebenzisa i-Jupyter notebook kumongo we-PySpark.
(mlflow) afranzi:~$ pyspark
[I 19:05:01.572 NotebookApp] sparkmagic extension enabled!
[I 19:05:01.573 NotebookApp] Serving notebooks from local directory: /Users/afranzi/Projects/notebooks
[I 19:05:01.573 NotebookApp] The Jupyter Notebook is running at:
[I 19:05:01.573 NotebookApp] http://localhost:8888/?token=c06252daa6a12cfdd33c1d2e96c8d3b19d90e9f6fc171745
[I 19:05:01.573 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 19:05:01.574 NotebookApp]
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://localhost:8888/?token=c06252daa6a12cfdd33c1d2e96c8d3b19d90e9f6fc171745
Njengoba kushiwo ngenhla, i-MLflow inikeza isici sezinto zokwenziwa zemodeli yokungena ku-S3. Ngokushesha nje lapho sesiphethe imodeli ekhethiwe ezandleni zethu, sinethuba lokuyingenisa njenge-UDF sisebenzisa imojuli mlflow.pyfunc
.
import mlflow.pyfunc
model_path = 's3://<bucket>/mlflow/artifacts/1/0f8691808e914d1087cf097a08730f17/artifacts/model'
wine_path = '/Users/afranzi/Projects/data/winequality-red.csv'
wine_udf = mlflow.pyfunc.spark_udf(spark, model_path)
df = spark.read.format("csv").option("header", "true").option('delimiter', ';').load(wine_path)
columns = [ "fixed acidity", "volatile acidity", "citric acid",
"residual sugar", "chlorides", "free sulfur dioxide",
"total sulfur dioxide", "density", "pH",
"sulphates", "alcohol"
]
df.withColumn('prediction', wine_udf(*columns)).show(100, False)
I-PySpark - Ikhipha izibikezelo zekhwalithi yewayini
Kuze kube manje, sikhulume ngendlela yokusebenzisa i-PySpark nge-MLflow, esebenzisa izibikezelo zekhwalithi yewayini kuyo yonke idathasethi yewayini. Kepha kuthiwani uma udinga ukusebenzisa amamojula wePython MLflow kusuka ku-Scala Spark?
Sikuvivinye nalokhu ngokuhlukanisa umongo we-Spark phakathi kwe-Scala ne-Python. Okusho ukuthi, sibhalise i-MLflow UDF ku-Python, futhi sayisebenzisa ku-Scala (yebo, mhlawumbe akusona isixazululo esingcono kakhulu, kodwa esinakho).
I-Scala Spark + MLflow
Kulesi sibonelo sizokwengeza
Faka i-Spark + Toree + Jupyter
pip install toree
jupyter toree install --spark_home=${SPARK_HOME} --sys-prefix
jupyter kernelspec list
```
```
Available kernels:
apache_toree_scala /Users/afranzi/.virtualenvs/mlflow/share/jupyter/kernels/apache_toree_scala
python3 /Users/afranzi/.virtualenvs/mlflow/share/jupyter/kernels/python3
```
Njengoba ubona ebhukwini lokubhalela elinamathiselwe, i-UDF yabelwa phakathi kwe-Spark ne-PySpark. Sithemba ukuthi le ngxenye izoba wusizo kulabo abathanda i-Scala futhi abafuna ukusebenzisa amamodeli okufunda ngomshini ekukhiqizeni.
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.{Column, DataFrame}
import scala.util.matching.Regex
val FirstAtRe: Regex = "^_".r
val AliasRe: Regex = "[\s_.:@]+".r
def getFieldAlias(field_name: String): String = {
FirstAtRe.replaceAllIn(AliasRe.replaceAllIn(field_name, "_"), "")
}
def selectFieldsNormalized(columns: List[String])(df: DataFrame): DataFrame = {
val fieldsToSelect: List[Column] = columns.map(field =>
col(field).as(getFieldAlias(field))
)
df.select(fieldsToSelect: _*)
}
def normalizeSchema(df: DataFrame): DataFrame = {
val schema = df.columns.toList
df.transform(selectFieldsNormalized(schema))
}
FirstAtRe = ^_
AliasRe = [s_.:@]+
getFieldAlias: (field_name: String)String
selectFieldsNormalized: (columns: List[String])(df: org.apache.spark.sql.DataFrame)org.apache.spark.sql.DataFrame
normalizeSchema: (df: org.apache.spark.sql.DataFrame)org.apache.spark.sql.DataFrame
Out[1]:
[s_.:@]+
In [2]:
val winePath = "~/Research/mlflow-workshop/examples/wine_quality/data/winequality-red.csv"
val modelPath = "/tmp/mlflow/artifactStore/0/96cba14c6e4b452e937eb5072467bf79/artifacts/model"
winePath = ~/Research/mlflow-workshop/examples/wine_quality/data/winequality-red.csv
modelPath = /tmp/mlflow/artifactStore/0/96cba14c6e4b452e937eb5072467bf79/artifacts/model
Out[2]:
/tmp/mlflow/artifactStore/0/96cba14c6e4b452e937eb5072467bf79/artifacts/model
In [3]:
val df = spark.read
.format("csv")
.option("header", "true")
.option("delimiter", ";")
.load(winePath)
.transform(normalizeSchema)
df = [fixed_acidity: string, volatile_acidity: string ... 10 more fields]
Out[3]:
[fixed_acidity: string, volatile_acidity: string ... 10 more fields]
In [4]:
%%PySpark
import mlflow
from mlflow import pyfunc
model_path = "/tmp/mlflow/artifactStore/0/96cba14c6e4b452e937eb5072467bf79/artifacts/model"
wine_quality_udf = mlflow.pyfunc.spark_udf(spark, model_path)
spark.udf.register("wineQuality", wine_quality_udf)
Out[4]:
<function spark_udf.<locals>.predict at 0x1116a98c8>
In [6]:
df.createOrReplaceTempView("wines")
In [10]:
%%SQL
SELECT
quality,
wineQuality(
fixed_acidity,
volatile_acidity,
citric_acid,
residual_sugar,
chlorides,
free_sulfur_dioxide,
total_sulfur_dioxide,
density,
pH,
sulphates,
alcohol
) AS prediction
FROM wines
LIMIT 10
Out[10]:
+-------+------------------+
|quality| prediction|
+-------+------------------+
| 5| 5.576883967129615|
| 5| 5.50664776916154|
| 5| 5.525504822954496|
| 6| 5.504311247097457|
| 5| 5.576883967129615|
| 5|5.5556903912725755|
| 5| 5.467882654744997|
| 7| 5.710602976324739|
| 7| 5.657319539336507|
| 5| 5.345098606538708|
+-------+------------------+
In [17]:
spark.catalog.listFunctions.filter('name like "%wineQuality%").show(20, false)
+-----------+--------+-----------+---------+-----------+
|name |database|description|className|isTemporary|
+-----------+--------+-----------+---------+-----------+
|wineQuality|null |null |null |true |
+-----------+--------+-----------+---------+-----------+
Izinyathelo ezilandelayo
Noma i-MLflow ikunguqulo ye-Alpha ngesikhathi sokubhala, ibukeka ithembisa impela. Amandla nje wokusebenzisa izinhlaka zokufunda zemishini eminingi futhi uwasebenzise kusuka endaweni eyodwa athatha amasistimu wokuncoma awayise ezingeni elilandelayo.
Ngaphezu kwalokho, i-MLflow iletha Onjiniyela Bedatha kanye Nochwepheshe Besayensi Yedatha eduze, ibeka isendlalelo esifanayo phakathi kwabo.
Ngemva kwalokhu kuhlolwa kwe-MLflow, siyaqiniseka ukuthi sizoqhubekela phambili futhi siyisebenzisele amapayipi ethu e-Spark namasistimu okuncoma.
Kungaba kuhle ukuvumelanisa isitoreji sefayela nesizindalwazi esikhundleni sesistimu yefayela. Lokhu kufanele kusinikeze iziphetho eziningi ezingasebenzisa isitoreji sefayela esifanayo. Isibonelo, sebenzisa izimo eziningi
Ukufingqa, ngithanda ukubonga umphakathi we-MLFlow ngokwenza umsebenzi wethu ngedatha uthandeke kakhulu.
Uma udlala nge-MLflow, ungangabazi ukusibhalela futhi usitshele ukuthi uyisebenzisa kanjani, futhi nakakhulu uma uyisebenzisa ekukhiqizeni.
Thola okwengeziwe mayelana nezifundo:
Funda kabanzi:
Izingozi Nezixwayiso Zokusebenzisa Ukuhlaziywa Kwengxenye Eyinhloko Ezinkingeni Zokufunda Ezigadiwe Ukuthumela Imodeli Yokufunda Yomshini Nge-Docker - Ingxenye 1 Ukuthumela Imodeli Yokufunda Yomshini Nge-Docker - Ingxenye 2
Source: www.habr.com