Molweni, abahlali baseKhabrovsk. Njengoko sele sibhale, kule nyanga i-OTUS isungula iikhosi zokufunda zoomatshini ezimbini ngaxeshanye, ezizezi
Injongo yeli nqaku kukuthetha ngamava ethu okuqala usebenzisa
Siza kuqalisa uphononongo
Umxholo
Singaphakathi
MLflow
Eyona njongo iphambili yeMLflow kukubonelela ngoluhlu olongezelelweyo phezu komatshini wokufunda oza kuvumela izazinzulu zedatha ukuba zisebenze phantse naliphi na ithala leencwadi lokufunda koomatshini (
I-MLflow ibonelela ngamacandelo amathathu:
- umkhondo -Ukurekhoda kunye nezicelo zovavanyo: ikhowudi, idatha, uqwalaselo kunye neziphumo. Ukubeka iliso kwinkqubo yokudala imodeli kubaluleke kakhulu.
- iiprojekthi -Ifomati yokupakisha ukuze iqhutywe kulo naliphi na iqonga (umzekelo.
Umenzi weSage ) - imifuziselo -ifomathi eqhelekileyo yokuhambisa iimodeli kwizixhobo ezahlukeneyo zokuhambisa.
I-MLflow (kwi-alpha ngexesha lokubhala) liqonga lomthombo ovulekileyo elikuvumela ukuba ulawule umjikelo wobomi bokufunda komatshini, kubandakanya ukuvavanywa, ukusetyenziswa kwakhona, kunye nokuthunyelwa.
Ukumisela i-MLflow
Ukusebenzisa i-MLflow kufuneka uqale usete yonke indawo yakho yePython, kuba siya kuyisebenzisa
```
pyenv install 3.7.0
pyenv global 3.7.0 # Use Python 3.7
mkvirtualenv mlflow # Create a Virtual Env with Python 3.7
workon mlflow
```
Masifakele amathala eencwadi afunekayo.
```
pip install mlflow==0.7.0
Cython==0.29
numpy==1.14.5
pandas==0.23.4
pyarrow==0.11.0
```
Qaphela: Sisebenzisa iPyArrow ukuqhuba imifuziselo efana ne-UDF. Iinguqulelo zePyArrow kunye neNumpy bekufuneka zilungiswe kuba iinguqulelo zamva bezingqubana enye kwenye.
Qalisa i-UI yokuKhangela
I-MLflow Tracking ivumela ukuba singene kwaye sibuze iimvavanyo usebenzisa iPython kunye
# Running a Tracking Server
mlflow server
--file-store /tmp/mlflow/fileStore
--default-artifact-root s3://<bucket>/mlflow/artifacts/
--host localhost
--port 5000
I-MLflow incoma ukusebenzisa ugcino lwefayile oluzingisileyo. Ugcino lwefayile kulapho umncedisi aya kugcina i-i run kunye nemetadata yovavanyo. Xa uqalisa umncedisi, qiniseka ukuba ikhomba kwivenkile yefayile eqhubekayo. Apha kuvavanyo siza kusebenzisa ngokulula /tmp
.
Khumbula ukuba sifuna ukusebenzisa iseva ye-mlflow ukuqhuba imifuniselo emidala, kufuneka ibekho kugcino lwefayile. Nangona kunjalo, nangaphandle koku besingazisebenzisa kwi-UDF, kuba sifuna kuphela indlela eya kumzekelo.
Qaphela: Gcina ukhumbula ukuba i-UI yokuKhangela kunye nomxhasi wemodeli kufuneka babe nokufikelela kwindawo ye-artifact. Oko kukuthi, kungakhathaliseki ukuba i-UI yokulandelela ihlala kwimeko ye-EC2, xa uqhuba i-MLflow ekuhlaleni, umatshini kufuneka ube nokufikelela ngokuthe ngqo kwi-S3 ukubhala iimodeli ze-artifact.
Ukulandelela i-UI igcina izinto zakudala kwibhakethi ye-S3
Iimodeli ezibalekayo
Ngokukhawuleza ukuba i-server yokulandelela isebenza, ungaqala ukuqeqesha iimodeli.
Njengomzekelo, siya kusebenzisa ukuguqulwa kwewayini kumzekelo weMLflow kwi
MLFLOW_TRACKING_URI=http://localhost:5000 python wine_quality.py
--alpha 0.9
--l1_ration 0.5
--wine_file ./data/winequality-red.csv
Njengoko sele sixoxile, i-MLflow ikuvumela ukuba ungene kwimodeli yeeparamitha, iimethrikhi, kunye nezinto zakudala ukuze ukwazi ukulandelela ukuba zivela njani na ngaphezulu kokuphindaphindwa. Olu phawu luluncedo kakhulu kuba ngale ndlela sinokuphinda sivelise eyona modeli ingcono ngokuqhagamshelana neseva yokuKhangela okanye ukuqonda ukuba yeyiphi ikhowudi eyenze iphindaphindo efunekayo sisebenzisa igit hash logs zokuzibophelela.
with mlflow.start_run():
... model ...
mlflow.log_param("source", wine_path)
mlflow.log_param("alpha", alpha)
mlflow.log_param("l1_ratio", l1_ratio)
mlflow.log_metric("rmse", rmse)
mlflow.log_metric("r2", r2)
mlflow.log_metric("mae", mae)
mlflow.set_tag('domain', 'wine')
mlflow.set_tag('predict', 'quality')
mlflow.sklearn.log_model(lr, "model")
Uphindaphindo lwewayini
Inxalenye yeseva yemodeli
Iseva yokulandelela i-MLflow, iqaliswe ngokusebenzisa i-"mlflow server" umyalelo, ine-REST API yokulandela umkhondo kunye nokubhala idatha kwinkqubo yefayile yendawo. Ungacacisa idilesi yomncedisi wokulandelela usebenzisa i-mobile variable "MLFLOW_TRACKING_URI" kunye ne-MLflow yokulandelela i-API iya kuqhagamshelana ngokuzenzekelayo nomncedisi wokulandelela kule dilesi ukwenza / ukufumana ulwazi lokuqaliswa, i-log metrics, njl.
umthombo:
Amaxwebhu// Ukusebenzisa iseva yokulandela umkhondo
Ukubonelela ngemodeli ngomncedisi, sifuna umncedisi wokulandelela osebenzayo (jonga ujongano lokuqalisa) kunye ne-ID ye-Run yemodeli.
Qhuba isazisi
# Serve a sklearn model through 127.0.0.0:5005
MLFLOW_TRACKING_URI=http://0.0.0.0:5000 mlflow sklearn serve
--port 5005
--run_id 0f8691808e914d1087cf097a08730f17
--model-path model
Ukusebenzela imifuziselo usebenzisa i-MLflow ukukhonza usebenziso, siyakufuna ufikelelo kwi-UI yokuKhangela ngokufumana ulwazi malunga nemodeli ngokulula ngokukhankanya. --run_id
.
Nje ukuba imodeli iqhagamshelane neseva yokuKhangela, sinokufumana isiphelo semodeli entsha.
# Query Tracking Server Endpoint
curl -X POST
http://127.0.0.1:5005/invocations
-H 'Content-Type: application/json'
-d '[
{
"fixed acidity": 3.42,
"volatile acidity": 1.66,
"citric acid": 0.48,
"residual sugar": 4.2,
"chloridessssss": 0.229,
"free sulfur dsioxide": 19,
"total sulfur dioxide": 25,
"density": 1.98,
"pH": 5.33,
"sulphates": 4.39,
"alcohol": 10.8
}
]'
> {"predictions": [5.825055635303461]}
Iimodeli ezibalekayo ezivela eSpark
Ngaphandle kwenyani yokuba iseva yokuKhangela inamandla ngokwaneleyo okusebenzela imifuziselo ngexesha lokwenyani, baqeqeshe kwaye basebenzise ukusebenza kweseva (umthombo:
Khawufane ucinge ukuba wenze uqeqesho ngaphandle kweintanethi emva koko wasebenzisa imodeli yokuphuma kuyo yonke idatha yakho. Kulapho iSpark kunye neMLflow zikhanya khona.
Faka iPySpark + Jupyter + Spark
umthombo:
Qalisa iPySpark-Jupyter
Ukubonisa indlela esisebenzisa ngayo iimodeli zeMLflow kwiispark dataframes, kufuneka siseke iincwadi zamanqaku zeJupyter ukuze zisebenze kunye nePySpark.
Qala ngokufakela inguqulelo yamva nje ezinzileyo
cd ~/Downloads/
tar -xzf spark-2.4.3-bin-hadoop2.7.tgz
mv ~/Downloads/spark-2.4.3-bin-hadoop2.7 ~/
ln -s ~/spark-2.4.3-bin-hadoop2.7 ~/sparkΜ
Faka iPySpark kunye neJupyter kwindawo ebonakalayo:
pip install pyspark jupyter
Seta izinto eziguquguqukayo zokusingqongileyo:
export SPARK_HOME=~/spark
export PATH=$SPARK_HOME/bin:$PATH
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --notebook-dir=${HOME}/Projects/notebooks"
Ukuba uzimisele notebook-dir
, sinokugcina iincwadana zethu kwifolda esiyifunayo.
Ukuphehlelela iJupyter esuka kwiPySpark
Ekubeni sikwazile ukuqwalasela iJupiter njengomqhubi wePySpark, ngoku sinokuqhuba i-Jupyter notebook kumxholo wePySpark.
(mlflow) afranzi:~$ pyspark
[I 19:05:01.572 NotebookApp] sparkmagic extension enabled!
[I 19:05:01.573 NotebookApp] Serving notebooks from local directory: /Users/afranzi/Projects/notebooks
[I 19:05:01.573 NotebookApp] The Jupyter Notebook is running at:
[I 19:05:01.573 NotebookApp] http://localhost:8888/?token=c06252daa6a12cfdd33c1d2e96c8d3b19d90e9f6fc171745
[I 19:05:01.573 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 19:05:01.574 NotebookApp]
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://localhost:8888/?token=c06252daa6a12cfdd33c1d2e96c8d3b19d90e9f6fc171745
Njengoko kukhankanyiwe ngasentla, i-MLflow ibonelela ngenqaku lemodeli yokugawulwa kwezinto zakudala kwi-S3. Nje ukuba sinomzekelo okhethiweyo ezandleni zethu, sinethuba lokuyingenisa njenge-UDF sisebenzisa imodyuli mlflow.pyfunc
.
import mlflow.pyfunc
model_path = 's3://<bucket>/mlflow/artifacts/1/0f8691808e914d1087cf097a08730f17/artifacts/model'
wine_path = '/Users/afranzi/Projects/data/winequality-red.csv'
wine_udf = mlflow.pyfunc.spark_udf(spark, model_path)
df = spark.read.format("csv").option("header", "true").option('delimiter', ';').load(wine_path)
columns = [ "fixed acidity", "volatile acidity", "citric acid",
"residual sugar", "chlorides", "free sulfur dioxide",
"total sulfur dioxide", "density", "pH",
"sulphates", "alcohol"
]
df.withColumn('prediction', wine_udf(*columns)).show(100, False)
I-PySpark-Imveliso yoqikelelo lomgangatho wewayini
Ukuza kuthi ga ngoku, sithethile malunga nendlela yokusebenzisa iPySpark ngeMLflow, iqhuba uqikelelo lomgangatho wewayini kuyo yonke idatha yedatha. Kodwa kuthekani ukuba ufuna ukusebenzisa iimodyuli zePython MLflow ukusuka kwiScala Spark?
Sivavanye oku kwakhona ngokwahlula umxholo weSpark phakathi kweScala kunye nePython. Oko kukuthi, sabhalisa i-MLflow UDF kwi-Python, kwaye sayisebenzisa ukusuka kwi-Scala (ewe, mhlawumbi kungekhona isisombululo esihle, kodwa into esinayo).
Scala Spark + MLflow
Kulo mzekelo siya kongeza
Faka iSpark + Toree + Jupyter
pip install toree
jupyter toree install --spark_home=${SPARK_HOME} --sys-prefix
jupyter kernelspec list
```
```
Available kernels:
apache_toree_scala /Users/afranzi/.virtualenvs/mlflow/share/jupyter/kernels/apache_toree_scala
python3 /Users/afranzi/.virtualenvs/mlflow/share/jupyter/kernels/python3
```
Njengoko unokubona kwincwadana eqhotyoshelweyo, i-UDF kwabelwana ngayo phakathi kweSpark nePySpark. Siyathemba ukuba le nxalenye iya kuba luncedo kwabo bathanda iScala kwaye bafuna ukuthumela iimodeli zokufunda ngomatshini kwimveliso.
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.{Column, DataFrame}
import scala.util.matching.Regex
val FirstAtRe: Regex = "^_".r
val AliasRe: Regex = "[\s_.:@]+".r
def getFieldAlias(field_name: String): String = {
FirstAtRe.replaceAllIn(AliasRe.replaceAllIn(field_name, "_"), "")
}
def selectFieldsNormalized(columns: List[String])(df: DataFrame): DataFrame = {
val fieldsToSelect: List[Column] = columns.map(field =>
col(field).as(getFieldAlias(field))
)
df.select(fieldsToSelect: _*)
}
def normalizeSchema(df: DataFrame): DataFrame = {
val schema = df.columns.toList
df.transform(selectFieldsNormalized(schema))
}
FirstAtRe = ^_
AliasRe = [s_.:@]+
getFieldAlias: (field_name: String)String
selectFieldsNormalized: (columns: List[String])(df: org.apache.spark.sql.DataFrame)org.apache.spark.sql.DataFrame
normalizeSchema: (df: org.apache.spark.sql.DataFrame)org.apache.spark.sql.DataFrame
Out[1]:
[s_.:@]+
In [2]:
val winePath = "~/Research/mlflow-workshop/examples/wine_quality/data/winequality-red.csv"
val modelPath = "/tmp/mlflow/artifactStore/0/96cba14c6e4b452e937eb5072467bf79/artifacts/model"
winePath = ~/Research/mlflow-workshop/examples/wine_quality/data/winequality-red.csv
modelPath = /tmp/mlflow/artifactStore/0/96cba14c6e4b452e937eb5072467bf79/artifacts/model
Out[2]:
/tmp/mlflow/artifactStore/0/96cba14c6e4b452e937eb5072467bf79/artifacts/model
In [3]:
val df = spark.read
.format("csv")
.option("header", "true")
.option("delimiter", ";")
.load(winePath)
.transform(normalizeSchema)
df = [fixed_acidity: string, volatile_acidity: string ... 10 more fields]
Out[3]:
[fixed_acidity: string, volatile_acidity: string ... 10 more fields]
In [4]:
%%PySpark
import mlflow
from mlflow import pyfunc
model_path = "/tmp/mlflow/artifactStore/0/96cba14c6e4b452e937eb5072467bf79/artifacts/model"
wine_quality_udf = mlflow.pyfunc.spark_udf(spark, model_path)
spark.udf.register("wineQuality", wine_quality_udf)
Out[4]:
<function spark_udf.<locals>.predict at 0x1116a98c8>
In [6]:
df.createOrReplaceTempView("wines")
In [10]:
%%SQL
SELECT
quality,
wineQuality(
fixed_acidity,
volatile_acidity,
citric_acid,
residual_sugar,
chlorides,
free_sulfur_dioxide,
total_sulfur_dioxide,
density,
pH,
sulphates,
alcohol
) AS prediction
FROM wines
LIMIT 10
Out[10]:
+-------+------------------+
|quality| prediction|
+-------+------------------+
| 5| 5.576883967129615|
| 5| 5.50664776916154|
| 5| 5.525504822954496|
| 6| 5.504311247097457|
| 5| 5.576883967129615|
| 5|5.5556903912725755|
| 5| 5.467882654744997|
| 7| 5.710602976324739|
| 7| 5.657319539336507|
| 5| 5.345098606538708|
+-------+------------------+
In [17]:
spark.catalog.listFunctions.filter('name like "%wineQuality%").show(20, false)
+-----------+--------+-----------+---------+-----------+
|name |database|description|className|isTemporary|
+-----------+--------+-----------+---------+-----------+
|wineQuality|null |null |null |true |
+-----------+--------+-----------+---------+-----------+
Amanyathelo alandelayo
Nangona i-MLflow ikwinguqulelo ye-Alpha ngexesha lokubhalwa, ibonakala ithembisa kakhulu. Ukukwazi nje ukuqhuba iinkqubo ezininzi zokufunda koomatshini kwaye uzisebenzise ukusuka kwisiphelo esinye kuthatha iinkqubo zokuncoma ukuya kwinqanaba elilandelayo.
Ukongeza, i-MLflow izisa iiNjineli zeDatha kunye neengcali zeSayensi yeDatha ngokusondeleyo kunye, ibeka umaleko oqhelekileyo phakathi kwabo.
Emva kolu phononongo lwe-MLflow, siqinisekile ukuba siya kuqhubela phambili kwaye siyisebenzisele imibhobho yethu ye-Spark kunye neenkqubo zokuncoma.
Kuya kuba kuhle ukulungelelanisa ugcino lwefayile kunye nesiseko sedatha endaweni yenkqubo yefayile. Oku kufuneka kusinike iziphelo ezininzi ezinokusebenzisa ugcino lwefayile efanayo. Ngokomzekelo, sebenzisa iimeko ezininzi
Ukushwankathela, ndingathanda ukuthi enkosi kuluntu lwe-MLFlow ngokwenza umsebenzi wethu ngedatha ube nomdla ngakumbi.
Ukuba udlala ngeMLflow, ungathandabuzi ukusibhalela kwaye usixelele ukuba uyisebenzisa njani, kwaye ngakumbi ukuba uyisebenzisa kwimveliso.
Fumana ngakumbi malunga nezifundo:
Funda ngokugqithisileyo:
Imingcipheko kunye nezilumkiso zokuSebenzisa uHlahlelo lweCandelo eliyiNtloko kwiiNgxaki zokuFunda eziLawulweyo Ukusasaza iModeli yokuFunda yoMatshini ngeDocker-Icandelo loku-1 Ukusasaza iModeli yokuFunda yoMatshini ngeDocker-Icandelo loku-2
umthombo: www.habr.com