Te whakawhānui i te korakora me te MLflow

Kia ora, Khabrovites. I tuhia e matou, i tenei marama ka whakarewahia e OTUS nga akoranga e rua mo te ako miihini i te wa kotahi, ara turanga и matatau. I runga i tenei kaupapa, ka tohatoha tonu matou i nga rauemi whai hua.

Ko te kaupapa o tenei tuhinga ko te korero mo to maatau wheako tuatahi me MLRere.

Ka timata tatou i te arotake MLRere mai i tana tūmau aroturuki me te whakatairanga i nga waahanga katoa o te ako. Na ka tohatohahia e matou te wheako ki te hono i a Spark me MLflow ma te whakamahi i te UDF.

Horopaki

Kei roto tatou Arepa Hauora ka whakamahia e matou te ako miihini me te mohiotanga horihori hei whakamana i nga tangata ki te tiaki i o raatau hauora me o raatau oranga. Koinei te take i noho ai nga tauira ako miihini ki te uho o nga hua raraunga ka whakawhanakehia e matou, me te aha i aro ai a MLflow, he papaa tuwhera e kapi ana i nga ahuatanga katoa o te huringa ora ako miihini.

MLRere

Ko te whainga matua o MLflow ko te whakarato i tetahi paparanga ki runga ake o te ako miihini ka taea e nga kaiputaiao raraunga te mahi me te tata ki nga whare pukapuka ako miihini (h2o, pehi, mleap, pytorch, sklearn и tensorflow), te kawe i ana mahi ki te taumata e whai ake nei.

E toru nga waahanga e whakaratohia ana e MLflow:

  • aroturuki - te tuhi me nga tono mo nga whakamatautau: waehere, raraunga, whirihoranga me nga hua. He mea nui ki te whai i te tukanga o te hanga tauira.
  • kaupapa - Ko te whakatakotoranga kiki kia rere i runga i tetahi papaaho (hei tauira, Kaihanga Kaihanga)
  • tauira he whakatakotoranga noa mo te tuku tauira ki nga momo taputapu whakatakotoranga.

Ko te MLflow (alpha i te wa e tuhi ana) he papaa tuwhera e taea ai e koe te whakahaere i te huringa ora ako miihini, tae atu ki te whakamatautau, te whakamahi ano, me te tuku.

Te whakatu MLflow

Hei whakamahi i te MLflow, me whakarite tuatahi koe i te taiao Python katoa, mo tenei ka whakamahia e matou PyEnv (ki te whakauru i te Python ki runga i te Mac, tirohia konei). Na ka taea e tatou te hanga i tetahi taiao mariko hei whakauru i nga whare pukapuka katoa e tika ana hei whakahaere.

```
pyenv install 3.7.0
pyenv global 3.7.0 # Use Python 3.7
mkvirtualenv mlflow # Create a Virtual Env with Python 3.7
workon mlflow
```

Tāutahia ngā whare pukapuka e hiahiatia ana.

```
pip install mlflow==0.7.0 
            Cython==0.29  
            numpy==1.14.5 
            pandas==0.23.4 
            pyarrow==0.11.0
```

Tuhipoka: Kei te whakamahi matou i te PyArrow hei whakahaere tauira penei i nga UDF. Ko nga putanga o PyArrow me Numpy me whakatika na te mea he taupatupatu nga putanga hou ki a raua ano.

Whakarewa UI Aroturuki

Ka taea e te Aroturuki MLflow te takiuru me te uiui i nga whakamatautau me te Python me te okiokinga API. I tua atu, ka taea e koe te tautuhi ki hea hei rokiroki i nga taonga tauira (localhost, Amazon S3, Rokiroki Azure Blob, Kei te Kohikohi Cloud Cloud ranei SFTP tūmau). I te mea ka whakamahia e matou te AWS i te Alpha Health, ko S3 te waahi rokiroki mo nga taonga.

# Running a Tracking Server
mlflow server 
    --file-store /tmp/mlflow/fileStore 
    --default-artifact-root s3://<bucket>/mlflow/artifacts/ 
    --host localhost
    --port 5000

E taunaki ana a MLflow ki te whakamahi i te rokiroki konae. Ko te rokiroki konae ko te waahi ka penapenahia e te tūmau te whakahaere me te whakamatautau metadata. I te wa e timata ana te tūmau, me tohu ki te rokiroki kōnae mau tonu. I konei, mo te whakamatautau, ka whakamahi noa matou /tmp.

Kia maumahara ki te hiahia tatou ki te whakamahi i te tūmau mlflow ki te whakahaere i nga whakamatautau tawhito, me noho ki te toa konae. Heoi, ahakoa kaore tenei, ka taea e taatau ki te whakamahi i roto i te UDF, na te mea ko te huarahi ki te tauira anake.

Kia mahara: Me whai waahi te UI Aroturuki me te tauira tauira ki te waahi o te taonga. Arā, ahakoa ko te UI Aroturuki kei roto i te tauira EC2, i te wa e whakahaere ana te MLflow ki te rohe, me uru tika te miihini ki te S3 ki te tuhi tauira toi.

Te whakawhānui i te korakora me te MLflow
Ko te Aroturuki UI ka penapena taonga ki te peere S3

Nga Tauira Rere

I te wa e rere ana te tūmau Aroturuki, ka taea e koe te timata ki te whakangungu i nga tauira.

Hei tauira, ka whakamahia e matou te whakarereketanga waina mai i te tauira MLflow i roto Sklearn.

MLFLOW_TRACKING_URI=http://localhost:5000 python wine_quality.py 
  --alpha 0.9
  --l1_ration 0.5
  --wine_file ./data/winequality-red.csv

Ka rite ki ta matou i kii ai, ka taea e MLflow te tuhi i nga tawhā, inenga, me te tauira taonga toi kia taea ai e koe te whai i te ahua o te whakawhanaketanga hei taapiri. He tino whaihua tenei ahuatanga, na te mea ka taea e maatau te whakaputa i te tauira pai rawa atu ma te whakapiri atu ki te tūmau Aroturuki, ki te mohio ranei ko tehea waehere i mahi i te whitiwhitinga e hiahiatia ana ma te whakamahi i nga raarangi git hash o nga commits.

with mlflow.start_run():

    ... model ...

    mlflow.log_param("source", wine_path)
    mlflow.log_param("alpha", alpha)
    mlflow.log_param("l1_ratio", l1_ratio)

    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("r2", r2)
    mlflow.log_metric("mae", mae)

    mlflow.set_tag('domain', 'wine')
    mlflow.set_tag('predict', 'quality')
    mlflow.sklearn.log_model(lr, "model")

Te whakawhānui i te korakora me te MLflow
whitiwhiti waina

Whakamutunga mo te tauira

Ko te tūmau aroturuki MLflow i whakarewahia me te whakahau "mlflow server" he API REST mo te aroturuki i nga oma me te tuhi raraunga ki te punaha konae a rohe. Ka taea e koe te whakapūtā te wāhitau o te tūmau aroturuki mā te whakamahi i te taurangi taiao "MLFLOW_TRACKING_URI" ka whakapā aunoatia e te API aroturuki MLflow ki te tūmau aroturuki i tēnei wāhi noho ki te waihanga/whiwhi korero whakarewatanga, inenga takiuru, aha atu.

Source: Tuhinga// Te whakahaere i te tūmau aroturuki

Hei whakarato i te tauira ki te tūmau, me hiahia he tūmau aroturuki e rere ana (tirohia te atanga whakarewatanga) me te Run ID o te tauira.

Te whakawhānui i te korakora me te MLflow
Whakahaere ID

# Serve a sklearn model through 127.0.0.0:5005
MLFLOW_TRACKING_URI=http://0.0.0.0:5000 mlflow sklearn serve 
  --port 5005  
  --run_id 0f8691808e914d1087cf097a08730f17 
  --model-path model

Hei mahi i nga tauira ma te whakamahi i te taumahinga mahi MLflow, me uru tatou ki te UI Aroturuki ki te tiki korero mo te tauira ma te tautuhi noa --run_id.

Ina whakapā atu te tauira ki te Tūmau Aroturuki, ka taea e tatou te tiki i tetahi pito mutunga tauira hou.

# Query Tracking Server Endpoint
curl -X POST 
  http://127.0.0.1:5005/invocations 
  -H 'Content-Type: application/json' 
  -d '[
	{
		"fixed acidity": 3.42, 
		"volatile acidity": 1.66, 
		"citric acid": 0.48, 
		"residual sugar": 4.2, 
		"chloridessssss": 0.229, 
		"free sulfur dsioxide": 19, 
		"total sulfur dioxide": 25, 
		"density": 1.98, 
		"pH": 5.33, 
		"sulphates": 4.39, 
		"alcohol": 10.8
	}
]'

> {"predictions": [5.825055635303461]}

Rere tauira mai i Spark

Ahakoa te mea he kaha te tūmau Aroturuki ki te mahi tauira i roto i te waa, whakangungu me te whakamahi i te mahi a te tūmau (puna: mlflow // docs // tauira #local), ma te whakamahi i te Spark (te puranga, te rerema ranei) he otinga kaha ake na te tohatoha.

Whakaarohia kua mahi whakangungu tuimotu koe katahi ka tono i te tauira putanga ki o raraunga katoa. Koinei te waahi ka uru mai a Spark me MLflow ki a raatau ake.

Tāutahia te PySpark + Jupyter + Spark

Source: Tīmatahia te PySpark - Jupyter

Hei whakaatu me pehea te whakamahi i nga tauira MLflow ki nga anga raraunga Spark, me whakarite nga pukatuhi Jupyter hei mahi tahi me PySpark.

Tīmata mā te tāuta i te putanga pūmau hōu Apache Spark:

cd ~/Downloads/
tar -xzf spark-2.4.3-bin-hadoop2.7.tgz
mv ~/Downloads/spark-2.4.3-bin-hadoop2.7 ~/
ln -s ~/spark-2.4.3-bin-hadoop2.7 ~/spark̀

Tāutahia a PySpark me Jupyter ki te taiao mariko:

pip install pyspark jupyter

Whakaritehia nga taurangi taiao:

export SPARK_HOME=~/spark
export PATH=$SPARK_HOME/bin:$PATH
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --notebook-dir=${HOME}/Projects/notebooks"

Kua tautuhia notebook-dir, ka taea e matou te penapena i a matou pukatuhi ki te kōpaki e hiahiatia ana.

Rere Jupyter mai i PySpark

I te mea i taea e matou te whakatu a Jupiter hei taraiwa PySpark, ka taea e matou te whakahaere i te pukapuka Jupyter i roto i te horopaki PySpark.

(mlflow) afranzi:~$ pyspark
[I 19:05:01.572 NotebookApp] sparkmagic extension enabled!
[I 19:05:01.573 NotebookApp] Serving notebooks from local directory: /Users/afranzi/Projects/notebooks
[I 19:05:01.573 NotebookApp] The Jupyter Notebook is running at:
[I 19:05:01.573 NotebookApp] http://localhost:8888/?token=c06252daa6a12cfdd33c1d2e96c8d3b19d90e9f6fc171745
[I 19:05:01.573 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 19:05:01.574 NotebookApp]

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://localhost:8888/?token=c06252daa6a12cfdd33c1d2e96c8d3b19d90e9f6fc171745

Te whakawhānui i te korakora me te MLflow

Ka rite ki te korero i runga ake nei, ka whakaratohia e MLflow te mahi o te tuhi i nga taonga tauira i roto i te S3. I te wa kei a matou te tauira kua tohua ki o matou ringaringa, ka whai waahi matou ki te kawemai hei UDF ma te whakamahi i te waahanga mlflow.pyfunc.

import mlflow.pyfunc

model_path = 's3://<bucket>/mlflow/artifacts/1/0f8691808e914d1087cf097a08730f17/artifacts/model'
wine_path = '/Users/afranzi/Projects/data/winequality-red.csv'
wine_udf = mlflow.pyfunc.spark_udf(spark, model_path)

df = spark.read.format("csv").option("header", "true").option('delimiter', ';').load(wine_path)
columns = [ "fixed acidity", "volatile acidity", "citric acid",
            "residual sugar", "chlorides", "free sulfur dioxide",
            "total sulfur dioxide", "density", "pH",
            "sulphates", "alcohol"
          ]
          
df.withColumn('prediction', wine_udf(*columns)).show(100, False)

Te whakawhānui i te korakora me te MLflow
PySpark - Matapae i te kounga waina

Tae noa ki tenei wa, kua korero matou me pehea te whakamahi i te PySpark me te MLflow ma te whakahaere i te matapae kounga waina i runga i te huinga raraunga waina katoa. Engari me pehea koe ki te whakamahi i nga waahanga Python MLflow mai i Scala Spark?

I whakamatauria ano e matou tenei ma te wehewehe i te horopaki Spark i waenga i a Scala me Python. Arā, i rehitatia e matou te MLflow UDF ki Python, ka whakamahia mai i a Scala (ae, ehara pea i te otinga pai, engari he aha kei a matou).

Scala Spark + MLflow

Mo tenei tauira, ka tapiritia e matou Toree Kernel ki roto i te Hupita o mua.

Tāuta Spark + Toree + Jupyter

pip install toree
jupyter toree install --spark_home=${SPARK_HOME} --sys-prefix
jupyter kernelspec list
```
```
Available kernels:
  apache_toree_scala    /Users/afranzi/.virtualenvs/mlflow/share/jupyter/kernels/apache_toree_scala
  python3               /Users/afranzi/.virtualenvs/mlflow/share/jupyter/kernels/python3
```

Ka kitea e koe mai i te pukatuhi kua apitihia, ka tirihia te UDF i waenga i a Spark me PySpark. Ko te tumanako ka whai hua tenei waahanga mo te hunga e aroha ana ki a Scala me te hiahia ki te tuku tauira ako miihini ki te whakaputa.

import org.apache.spark.sql.functions.col
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.{Column, DataFrame}
import scala.util.matching.Regex

val FirstAtRe: Regex = "^_".r
val AliasRe: Regex = "[\s_.:@]+".r

def getFieldAlias(field_name: String): String = {
    FirstAtRe.replaceAllIn(AliasRe.replaceAllIn(field_name, "_"), "")
}

def selectFieldsNormalized(columns: List[String])(df: DataFrame): DataFrame = {
    val fieldsToSelect: List[Column] = columns.map(field =>
        col(field).as(getFieldAlias(field))
    )
    df.select(fieldsToSelect: _*)
}

def normalizeSchema(df: DataFrame): DataFrame = {
    val schema = df.columns.toList
    df.transform(selectFieldsNormalized(schema))
}

FirstAtRe = ^_
AliasRe = [s_.:@]+

getFieldAlias: (field_name: String)String
selectFieldsNormalized: (columns: List[String])(df: org.apache.spark.sql.DataFrame)org.apache.spark.sql.DataFrame
normalizeSchema: (df: org.apache.spark.sql.DataFrame)org.apache.spark.sql.DataFrame
Out[1]:
[s_.:@]+
In [2]:
val winePath = "~/Research/mlflow-workshop/examples/wine_quality/data/winequality-red.csv"
val modelPath = "/tmp/mlflow/artifactStore/0/96cba14c6e4b452e937eb5072467bf79/artifacts/model"

winePath = ~/Research/mlflow-workshop/examples/wine_quality/data/winequality-red.csv
modelPath = /tmp/mlflow/artifactStore/0/96cba14c6e4b452e937eb5072467bf79/artifacts/model
Out[2]:
/tmp/mlflow/artifactStore/0/96cba14c6e4b452e937eb5072467bf79/artifacts/model
In [3]:
val df = spark.read
              .format("csv")
              .option("header", "true")
              .option("delimiter", ";")
              .load(winePath)
              .transform(normalizeSchema)

df = [fixed_acidity: string, volatile_acidity: string ... 10 more fields]
Out[3]:
[fixed_acidity: string, volatile_acidity: string ... 10 more fields]
In [4]:
%%PySpark
import mlflow
from mlflow import pyfunc

model_path = "/tmp/mlflow/artifactStore/0/96cba14c6e4b452e937eb5072467bf79/artifacts/model"
wine_quality_udf = mlflow.pyfunc.spark_udf(spark, model_path)

spark.udf.register("wineQuality", wine_quality_udf)
Out[4]:
<function spark_udf.<locals>.predict at 0x1116a98c8>
In [6]:
df.createOrReplaceTempView("wines")
In [10]:
%%SQL
SELECT 
    quality,
    wineQuality(
        fixed_acidity,
        volatile_acidity,
        citric_acid,
        residual_sugar,
        chlorides,
        free_sulfur_dioxide,
        total_sulfur_dioxide,
        density,
        pH,
        sulphates,
        alcohol
    ) AS prediction
FROM wines
LIMIT 10
Out[10]:
+-------+------------------+
|quality|        prediction|
+-------+------------------+
|      5| 5.576883967129615|
|      5|  5.50664776916154|
|      5| 5.525504822954496|
|      6| 5.504311247097457|
|      5| 5.576883967129615|
|      5|5.5556903912725755|
|      5| 5.467882654744997|
|      7| 5.710602976324739|
|      7| 5.657319539336507|
|      5| 5.345098606538708|
+-------+------------------+

In [17]:
spark.catalog.listFunctions.filter('name like "%wineQuality%").show(20, false)

+-----------+--------+-----------+---------+-----------+
|name       |database|description|className|isTemporary|
+-----------+--------+-----------+---------+-----------+
|wineQuality|null    |null       |null     |true       |
+-----------+--------+-----------+---------+-----------+

Nga mahi ka whai ake

Ahakoa kei te Alpha a MLflow i te wa e tuhi ana, he ahua tino pai. Ma te kaha ki te whakahaere i nga anga ako miihini maha me te whakamahi mai i te waahi mutunga kotahi ka eke nga punaha taunaki ki te taumata e whai ake nei.

I tua atu, ko te MLflow ka whakatata atu ki nga miihini Raraunga me nga Kairangataiao Raraunga, ka whakatakoto he paparanga noa i waenga i a raatau.

Whai muri i tenei tirotirohanga mo te MLflow, e mohio ana matou ki te haere whakamua ki te whakamahi mo o maatau paipa Spark me nga punaha tūtohu.

He pai ki te tukutahi i te rokiroki o nga konae me te papaa raraunga hei utu mo te punaha konae. Ma tenei ka hoatu ki a maatau nga pito mutunga maha ka taea te whakamahi i te tiritahi konae. Hei tauira, whakamahia nga waahi maha presto и Athena ki te taua metastore Kāpia.

Hei whakarapopototanga, ka mihi au ki te hapori MLFlow mo te whakahihiri i a maatau mahi me nga raraunga.

Mena ka takaro koe ki te MLflow, tena koa tuhi mai ki a maatau ka korero mai me pehea e whakamahia ai e koe, me te mea ka whakamahia e koe ki te whakaputa.

Ako atu mo nga akoranga:
ako miihini. Akoranga taketake
ako miihini. akoranga matatau

Pānuitia atu:

Source: will.com

Tāpiri i te kōrero