ืืขืื, ืืึทืืจืึธืืืกืง ืชืืฉืืื. ืืื ืืืจ ืฉืืื ืืขืฉืจืืื, ืืขื ืืืืฉ OTUS ืืื ืืึธื ืืฉืื ื ืฆืืืื ืืึทืฉืื ืืขืจื ืขื ืงืึธืจืกืึทื ืืื ืึทืืึธื, ื ืืืืื
ืืขืจ ืฆืื ืคืื ืืขื ืึทืจืืืงื ืืื ืฆื ืจืขืื ืืืขืื ืืื ืืืขืจ ืขืจืฉืืขืจ ืืขืจืคืึทืจืื ื ื ืืฆื
ืืืจ ืืืขืื ืึธื ืืืืื ืื ืจืขืฆืขื ืืืข
ืงืึธื ืืขืงืกื
ืืืจ ืืขื ืขื ืืื
MLflow
ืืขืจ ืืืืคึผื ืฆืื ืคืื MLflow ืืื ืฆื ืฆืืฉืืขืื ืึทื ื ืึธื ืฉืืืืข ืืืืฃ ืฉืคึผืืฅ ืคืื ืืึทืฉืื ืืขืจื ืขื ืืืึธืก ืืืึธืื ืืึธืื ืืึทืื ืกืืืึทื ืืืก ืฆื ืึทืจืืขืื ืืื ืึผืืขื ืงืืื ืืึทืฉืื ืืขืจื ืขื ืืืืืืึธืืขืง (
MLflow ืืื ืืจืื ืงืึทืืคึผืึธืื ืึทื ืฅ:
- ืืจืึทืงืื ื - ืจืขืงืึธืจืืื ื ืืื ืจืืงืืืขืก ืคึฟืึทืจ ืืงืกืคึผืขืจืึทืืึทื ืฅ: ืงืึธื, ืืึทืื, ืงืึทื ืคืืืืขืจืืืฉืึทื ืืื ืจืขืืืืืึทืื. ืืึธื ืืืึธืจืื ื ืืขื ืคึผืจืึธืฆืขืก ืคืื ืงืจืืืืืื ื ืึท ืืึธืืขื ืืื ืืืืขืจ ืืืืืืืง.
- ืคึผืจืึทืืืฉืขืงืก - ืคึผืึทืงืงืึทืืื ื ืคึฟืึธืจืืึทื ืฆื ืืืืคื ืืืืฃ ืงืืื ืคึผืืึทืืคืึธืจืืข (ืืืฉื.
ืกืึทืืขืืึทืงืขืจ ) - ืืึธืืขืืก - ืึท ืคึผืจืึธืกื ืคึฟืึธืจืืึทื ืคึฟืึทืจ ืกืึทืืืืืื ื ืืึธืืขืืก ืฆื ืคืึทืจืฉืืื ืืืคึผืืืืืึทื ื ืืืฉืืจืื.
MLflow (ืืื ืึทืืฃ ืืื ืืขืจ ืฆืืื ืคืื ืฉืจืืืื) ืืื ืึทื ืึธืคึฟื ืืงืืจ ืคึผืืึทืืคืึธืจืืข ืืืึธืก ืึทืืึทืื ืืืจ ืฆื ืคืืจื ืื ืืืืคืกืืืง ืคืื ืืึทืฉืื ืืขืจื ืขื, ืึทืจืืึทื ืืขืจืขืื ื ืืงืกืคึผืขืจืึทืืึทื ืืืืฉืึทื, ืจืืืืก ืืื ืืืคึผืืืืืึทื ื.
ืืึทืฉืืขืืืงื MLflow
ืฆื ื ืืฆื MLflow ืืืจ ืืึทืจืคึฟื ืฆื ืขืจืฉืืขืจ ืฉืืขืื ืืืื ืืึทื ืฅ ืคึผืืืืึธื ืกืืืืืืข, ืคึฟืึทืจ ืืขื ืืืจ ืืืขืื ื ืืฆื
```
pyenv install 3.7.0
pyenv global 3.7.0 # Use Python 3.7
mkvirtualenv mlflow # Create a Virtual Env with Python 3.7
workon mlflow
```
ืืึธืืืจ ืื ืกืืึทืืืจื ืื ืคืืจืืื ืื ืืืืืจืขืจืื.
```
pip install mlflow==0.7.0
Cython==0.29
numpy==1.14.5
pandas==0.23.4
pyarrow==0.11.0
```
ืืึทืืขืจืงืื ื: ืืืจ ื ืืฆื PyArrow ืฆื ืืืืคื ืืึธืืขืืก ืึทืืึท ืืื UDF. ืื ืืืขืจืกืืขืก ืคืื PyArrow ืืื Numpy ืืืจืฃ ืืืื ืคืึทืจืคืขืกืืืงื ืืืืึทื ืื ืืขืฆืืข ืืืขืจืกืืขืก ืืึธืื ืงืึธื ืคืืืงื ืืื ืืขืืขืจ ืื ืืขืจืข.
ืงืึทืืขืจ ืืจืึทืงืื ื ืื
MLflow ืืจืึทืงืื ื ืึทืืึทืื ืืื ืื ืฆื ืงืืึธืฅ ืืื ืึธื ืคึฟืจืขื ืืงืกืคึผืขืจืึทืืึทื ืฅ ื ืืฆื Python ืืื
# Running a Tracking Server
mlflow server
--file-store /tmp/mlflow/fileStore
--default-artifact-root s3://<bucket>/mlflow/artifacts/
--host localhost
--port 5000
MLflow ืจืขืงืึทืืขื ืื ื ืืฆื ืคึผืขืจืกืืกืืขื ื ืืขืงืข ืกืืึธืจืืืืฉ. ืืขืงืข ืกืืึธืจืืืืฉ ืืื ืืื ืืขืจ ืกืขืจืืืขืจ ืืืขื ืงืจืึธื ืืืืคื ืืื ืขืงืกืคึผืขืจืืืขื ื ืืขืืึทืืึทืืึท. ืืืขื ืืืจ ืึธื ืืืืื ืืขื ืกืขืจืืืขืจ, ืืึทืื ืืืืขืจ ืึทื ืขืก ืืืืืื ืฆื ืื ืคึผืขืจืกืืกืืขื ื ืืขืงืข ืงืจืึธื. ืืึธ ืคึฟืึทืจ ืืขืจ ืขืงืกืคึผืขืจืืืขื ื ืืืจ ืืืขืื ืคืฉืื ื ืืฆื /tmp
.
ืืขืืขื ืงื ืึทื ืืืื ืืืจ ืืืืื ืฆื ื ืืฆื ืื mlflow ืกืขืจืืืขืจ ืฆื ืืืืคื ืึทืื ืืงืกืคึผืขืจืึทืืึทื ืฅ, ืืื ืืืื ืืืื ืคืึธืจืฉืืขืื ืืื ืืขืจ ืืขืงืข ืกืืึธืจืืืืฉ. ืึธืืขืจ, ืืคืืื ืึธื ืืขื, ืืืจ ืงืขื ื ืืฆื ืืื ืืื ืื UDF, ืืืืึทื ืืืจ ื ืึธืจ ืืึทืจืคึฟื ืื ืืืขื ืฆื ืื ืืึธืืขื.
ืืึทืืขืจืงืื ื: ืืึทืืื ืืื ืืืื ืื ื ืึทื ืืจืึทืงืื ื ืื ืืื ืืขืจ ืืึธืืขื ืงืืืขื ื ืืืื ืืึธืื ืึทืงืกืขืก ืฆื ืื ืึทืจืืึทืคืึทืงื ืึธืจื. ืืึธืก ืืื, ืจืึทืืึทืจืืืึทืก ืคืื ืื ืคืึทืงื ืึทื ืื ืืจืึทืงืื ื ืื ืจืืืืืื ืืื ืึทื EC2 ืืืึทืฉืคึผืื, ืืืขื ืคืืืกื ืืืง MLflow ืืึธืืงืึทืื, ืื ืืึทืฉืื ืืืื ืืึธืื ืืืจืขืงื ืึทืงืกืขืก ืฆื S3 ืฆื ืฉืจืืึทืื ืึทืจืืึทืคืึทืงื ืืึธืืขืืก.
ืืจืึทืงืื ื ืื ืกืืึธืจื ืึทืจืืึทืคืึทืงืฅ ืืื ืึท S3 ืขืืขืจ
ืคืืืกื ืืืง ืืึธืืขืืก
ืืื ืืึทืื ืืื ืื ืืจืึทืงืื ื ืกืขืจืืืขืจ ืืื ืคืืืกื ืืืง, ืืืจ ืงืขื ืขื ืึธื ืืืืื ืืจืืื ืื ื ืื ืืึธืืขืืก.
ืืื ืึท ืืืืฉืคึผืื, ืืืจ ืืืขืื ื ืืฆื ืื ืืืืึทื ืืึธืืืคืืงืึทืืืึธื ืคืื ืื MLflow ืืืืฉืคึผืื ืืื
MLFLOW_TRACKING_URI=http://localhost:5000 python wine_quality.py
--alpha 0.9
--l1_ration 0.5
--wine_file ./data/winequality-red.csv
ืืื ืืืจ ืืึธืื ืฉืืื ืืืกืงืึทืกื, MLflow ืึทืืึทืื ืืืจ ืฆื ืงืืึธืฅ ืืึธืืขื ืคึผืึทืจืึทืืขืืขืจืก, ืืขืืจืืงืก ืืื ืึทืจืืึทืคืึทืงืฅ ืึทืืื ืืืจ ืงืขื ืขื ืฉืคึผืืจ ืืื ืืื ืืืืึทืืื ืืืืขืจ ืืืขืจืืืฉืึทื ื. ืืขืจ ืฉืืจืื ืืื ืืึธืจ ื ืืฆืืง ืืืืึทื ืึทืืื ืืืจ ืงืขื ืขื ืจืขืคึผืจืึธืืืฆืืจื ืืขืจ ืืขืกืืขืจ ืืึธืืขื ืืืจื ืงืึธื ืืึทืงื ืื ืืจืึทืงืื ื ืกืขืจืืืขืจ ืึธืืขืจ ืคึฟืึทืจืฉืืืื ืืืึธืก ืงืึธื ืืืจืืืขืงืึธืื ืื ืคืืจืืื ืื ืืืขืจืึทืืืึธื ืืื ืื ืืื ืืึทืฉ ืืึธืืก ืคืื ืงืึทืืืฅ.
with mlflow.start_run():
... model ...
mlflow.log_param("source", wine_path)
mlflow.log_param("alpha", alpha)
mlflow.log_param("l1_ratio", l1_ratio)
mlflow.log_metric("rmse", rmse)
mlflow.log_metric("r2", r2)
mlflow.log_metric("mae", mae)
mlflow.set_tag('domain', 'wine')
mlflow.set_tag('predict', 'quality')
mlflow.sklearn.log_model(lr, "model")
ืืืืึทื ืืืขืจืืืฉืึทื ื
ืกืขืจืืืืจืขืจ ืืืื ืคึฟืึทืจ ืื ืืึธืืขื
ืื MLflow ืืจืึทืงืื ื ืกืขืจืืืขืจ, ืืึธื ืืฉื ืืื ืื "mlflow ืกืขืจืืืขืจ" ืืึทืคึฟืขื, ืืื ืึท REST API ืคึฟืึทืจ ืืจืึทืงืื ื ืจืึทื ื ืืื ืฉืจืืืื ืืึทืื ืฆื ืื ืืืืข ืืขืงืข ืกืืกืืขื. ืืืจ ืงืขื ื ืกืคึผืขืฆืืคืืฆืืจื ืื ืืจืึทืงืื ื ืกืขืจืืืขืจ ืึทืืจืขืก ื ืืฆื ืื ืกืืืืืืข ืืืขืจืืึทืืึทืื "MLFLOW_TRACKING_URI" ืืื ืื MLflow ืืจืึทืงืื ื ืึทืคึผื ืืืขื ืืืืืึธืืึทืืืฉ ืงืึธื ืืึทืงื ืื ืืจืึทืงืื ื ืกืขืจืืืขืจ ืืืืฃ ืืขื ืึทืืจืขืก ืฆื ืฉืึทืคึฟื / ืืึทืงืืืขื ืงืึทืืขืจ ืืื ืคึฟืึธืจืืึทืฆืืข, ืงืืึธืฅ ืืขืืจืืงืก, ืขืืง.
ืืงืืจ:
ืืึธืงืก // ืคืืืกื ืืืง ืึท ืืจืึทืงืื ื ืกืขืจืืืขืจ
ืฆื ืฆืืฉืืขืื ืื ืืึธืืขื ืืื ืึท ืกืขืจืืืขืจ, ืืืจ ืืึทืจืคึฟื ืึท ืคืืืกื ืืืง ืืจืึทืงืื ื ืกืขืจืืืขืจ (ืืขื ืงืึทืืขืจ ืฆืืืื ื) ืืื ืื Run ID ืคืื ืื ืืึธืืขื.
ืืืืคื ID
# Serve a sklearn model through 127.0.0.0:5005
MLFLOW_TRACKING_URI=http://0.0.0.0:5000 mlflow sklearn serve
--port 5005
--run_id 0f8691808e914d1087cf097a08730f17
--model-path model
ืฆื ืืื ืขื ืืึธืืขืืก ื ืืฆื ืื MLflow ืืื ืขื ืคืึทื ืืงืฉืึทื ืึทืืืื, ืืืจ ืืึทืจืคึฟื ืึทืงืกืขืก ืฆื ืื ืืจืึทืงืื ื ืื ืฆื ืืึทืงืืืขื ืืื ืคึฟืึธืจืืึทืฆืืข ืืืขืื ืืขื ืืึธืืขื ืคืฉืื ืืืจื ืกืคึผืขืฆืืคืืฆืืจื --run_id
.
ืึทืืึธื ืืขืจ ืืึธืืขื ืงืึธื ืืึทืงื ืื ืืจืึทืงืื ื ืกืขืจืืืขืจ, ืืืจ ืงืขื ืขื ืืึทืงืืืขื ืึท ื ืืึทืข ืืึธืืขื ืขื ืืคึผืืื ื.
# Query Tracking Server Endpoint
curl -X POST
http://127.0.0.1:5005/invocations
-H 'Content-Type: application/json'
-d '[
{
"fixed acidity": 3.42,
"volatile acidity": 1.66,
"citric acid": 0.48,
"residual sugar": 4.2,
"chloridessssss": 0.229,
"free sulfur dsioxide": 19,
"total sulfur dioxide": 25,
"density": 1.98,
"pH": 5.33,
"sulphates": 4.39,
"alcohol": 10.8
}
]'
> {"predictions": [5.825055635303461]}
ืคืืืกื ืืืง ืืึธืืขืืก ืคืื ืกืคึผืึทืจืง
ืืจืึธืฅ ืืขืจ ืคืึทืงื ืึทื ืื ืืจืึทืงืื ื ืกืขืจืืืขืจ ืืื ืฉืืึทืจืง ืืขื ืื ืฆื ืืื ืขื ืืึธืืขืืก ืืื ืคืึทืงืืืฉ ืฆืืื, ืืึทื ืืื ืืื ื ืืฆื ืื ืกืขืจืืืขืจ ืคืึทื ืืงืฉืึทื ืึทืืืื (ืืงืืจ:
ืืืึทืืืฉืึทื ืึทื ืืืจ ืคืฉืื ืืืจืืืขืงืึธืื ืื ืืจืืื ืื ื ืึธืคืคืืื ืข ืืื ืืขืืึธืื ืืขืืืขื ืื ืื ืคึผืจืึธืืืงืฆืืข ืืึธืืขื ืฆื ืึทืืข ืืืื ืืึทืื. ืืึธืก ืืื ืืื Spark ืืื MLflow ืฉืืึทื ืขื.
ืื ืกืืึทืืืจื PySpark + Jupyter + Spark
ืืงืืจ:
ืืึทืงืืืขื ืกืืึทืจืืขื PySpark - Jupyter
ืฆื ืืืืึทืื ืืื ืืืจ ืฆืืืืืื MLflow ืืึธืืขืืก ืฆื ืกืคึผืึทืจืง ืืึทืืึทืคืจืึทืืขืก, ืืืจ ืืึทืจืคึฟื ืฆื ืฉืืขืื ืืืฉืืคึผืืืขืจ ื ืึธืืืืืงืก ืฆื ืึทืจืืขืื ืฆืืืึทืืขื ืืื PySpark.
ืึธื ืืืื ืืืจื ืื ืกืืึธืืื ื ืื ืืขืฆืืข ืกืืึทืืื ืืืขืจืกืืข
cd ~/Downloads/
tar -xzf spark-2.4.3-bin-hadoop2.7.tgz
mv ~/Downloads/spark-2.4.3-bin-hadoop2.7 ~/
ln -s ~/spark-2.4.3-bin-hadoop2.7 ~/sparkฬ
ืื ืกืืึทืืืจื PySpark ืืื Jupyter ืืื ืื ืืืืจืืืึทื ืกืืืืืืข:
pip install pyspark jupyter
ืืึทืฉืืขืืืงื ืกืืืืืืข ืืืขืจืืึทืืึทืื:
export SPARK_HOME=~/spark
export PATH=$SPARK_HOME/bin:$PATH
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --notebook-dir=${HOME}/Projects/notebooks"
ืืืืื ืืืฉืืืกื notebook-dir
, ืืืจ ืงืขื ืขื ืงืจืึธื ืืื ืืืขืจ ื ืึธืืืืืงืก ืืื ืืขืจ ืืขืืืืื ืืขืงืข.
ืืึธื ืืฉืื ื Jupyter ืคึฟืื PySpark
ืืื ื ืืืจ ืืขื ืขื ืืืืืืช ืฆื ืงืึทื ืคืืืืขืจ ืืืฉืืคึผืืืขืจ ืืื ืึท PySpark ืฉืึธืคืขืจ, ืืืจ ืงืขื ืขื ืืืฆื ืืืืคื Jupyter ืืขืคื ืืื ืืขื ืงืึธื ืืขืงืกื ืคืื PySpark.
(mlflow) afranzi:~$ pyspark
[I 19:05:01.572 NotebookApp] sparkmagic extension enabled!
[I 19:05:01.573 NotebookApp] Serving notebooks from local directory: /Users/afranzi/Projects/notebooks
[I 19:05:01.573 NotebookApp] The Jupyter Notebook is running at:
[I 19:05:01.573 NotebookApp] http://localhost:8888/?token=c06252daa6a12cfdd33c1d2e96c8d3b19d90e9f6fc171745
[I 19:05:01.573 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 19:05:01.574 NotebookApp]
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://localhost:8888/?token=c06252daa6a12cfdd33c1d2e96c8d3b19d90e9f6fc171745
ืืื ืืขืจืืื ื ืืืืื, MLflow ืืื ืึท ืฉืืจืื ืคึฟืึทืจ ืืึธืืื ื ืืึธืืขื ืึทืจืืึทืคืึทืงืฅ ืืื S3. ืืื ืืึทืื ืืื ืืืจ ืืึธืื ืื ืืืืกืืขืงืืืื ืืึธืืขื ืืื ืืื ืืืขืจ ืืขื ื, ืืืจ ืืึธืื ืื ืืขืืขืื ืืืื ืฆื ืึทืจืืึทื ืคืืจ ืขืก ืืื ืึท UDF ื ืืฆื ืื ืืึธืืืืข mlflow.pyfunc
.
import mlflow.pyfunc
model_path = 's3://<bucket>/mlflow/artifacts/1/0f8691808e914d1087cf097a08730f17/artifacts/model'
wine_path = '/Users/afranzi/Projects/data/winequality-red.csv'
wine_udf = mlflow.pyfunc.spark_udf(spark, model_path)
df = spark.read.format("csv").option("header", "true").option('delimiter', ';').load(wine_path)
columns = [ "fixed acidity", "volatile acidity", "citric acid",
"residual sugar", "chlorides", "free sulfur dioxide",
"total sulfur dioxide", "density", "pH",
"sulphates", "alcohol"
]
df.withColumn('prediction', wine_udf(*columns)).show(100, False)
PySpark - ืึทืจืืืกืคืืจื ืืืืึทื ืงืืืึทืืืืขื ืคึฟืึธืจืืืกืืึธืื
ืืื ืฆื ืืขื ืคืื ื, ืืืจ ืืึธืื ืืขืจืขืื ืืืขืื ืืื ืฆื ื ืืฆื PySpark ืืื MLflow, ืืื ืืืืึทื ืงืืืึทืืืืขื ืคึฟืึธืจืืืกืืึธืื ืืืืฃ ืื ืืื ืฆืข ืืืืึทื ืืึทืืึทืกืขื. ืึธืืขืจ ืืืึธืก ืืืื ืืืจ ืืึทืจืคึฟื ืฆื ื ืืฆื Python MLflow ืืึทืืืฉืืื ืคึฟืื Scala Spark?
ืืืจ ืืขืกืืขื ืืึธืก ืืืื ืืืจื ืกืคึผืืืืื ื ืื ืกืคึผืึทืจืง ืงืึธื ืืขืงืกื ืฆืืืืฉื ืกืงืึทืืึท ืืื ืคึผืืืืึธื. ืืึธืก ืืื, ืืืจ ืจืขืืืกืืจืืจื MLflow UDF ืืื Python ืืื ืืขืืืืื ื ืขืก ืคึฟืื ืกืงืึทืืึท (ืืึธ, ืืึธืืขืจ ื ืืฉื ืืขืจ ืืขืกืืขืจ ืืืืืื ื, ืึธืืขืจ ืืืึธืก ืืืจ ืืึธืื).
ืกืงืึทืืึท ืกืคึผืึทืจืง + ืืืคืืึธืื
ืคึฟืึทืจ ืืขื ืืืึทืฉืคึผืื ืืืจ ืืืขืื ืืืืื
ืื ืกืืึทืืืจื Spark + Toree + Jupyter
pip install toree
jupyter toree install --spark_home=${SPARK_HOME} --sys-prefix
jupyter kernelspec list
```
```
Available kernels:
apache_toree_scala /Users/afranzi/.virtualenvs/mlflow/share/jupyter/kernels/apache_toree_scala
python3 /Users/afranzi/.virtualenvs/mlflow/share/jupyter/kernels/python3
```
ืืื ืืืจ ืงืขื ืขื ืืขื ืคึฟืื ืื ืึทืืึทืืฉื ืืขืคื, ืื UDF ืืื ืฉืขืจื ืฆืืืืฉื Spark ืืื PySpark. ืืืจ ืืึธืคื ืึทื ืืขืจ ืืืื ืืืขื ืืืื ื ืืฆืืง ืคึฟืึทืจ ืืขื ืข ืืืืก ืืื ืกืงืึทืืึท ืืื ืืืืื ืฆื ื ืืฆื ืืึทืฉืื ืืขืจื ืขื ืืึธืืขืืก ืืื ืคึผืจืึธืืืงืฆืืข.
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.{Column, DataFrame}
import scala.util.matching.Regex
val FirstAtRe: Regex = "^_".r
val AliasRe: Regex = "[\s_.:@]+".r
def getFieldAlias(field_name: String): String = {
FirstAtRe.replaceAllIn(AliasRe.replaceAllIn(field_name, "_"), "")
}
def selectFieldsNormalized(columns: List[String])(df: DataFrame): DataFrame = {
val fieldsToSelect: List[Column] = columns.map(field =>
col(field).as(getFieldAlias(field))
)
df.select(fieldsToSelect: _*)
}
def normalizeSchema(df: DataFrame): DataFrame = {
val schema = df.columns.toList
df.transform(selectFieldsNormalized(schema))
}
FirstAtRe = ^_
AliasRe = [s_.:@]+
getFieldAlias: (field_name: String)String
selectFieldsNormalized: (columns: List[String])(df: org.apache.spark.sql.DataFrame)org.apache.spark.sql.DataFrame
normalizeSchema: (df: org.apache.spark.sql.DataFrame)org.apache.spark.sql.DataFrame
Out[1]:
[s_.:@]+
In [2]:
val winePath = "~/Research/mlflow-workshop/examples/wine_quality/data/winequality-red.csv"
val modelPath = "/tmp/mlflow/artifactStore/0/96cba14c6e4b452e937eb5072467bf79/artifacts/model"
winePath = ~/Research/mlflow-workshop/examples/wine_quality/data/winequality-red.csv
modelPath = /tmp/mlflow/artifactStore/0/96cba14c6e4b452e937eb5072467bf79/artifacts/model
Out[2]:
/tmp/mlflow/artifactStore/0/96cba14c6e4b452e937eb5072467bf79/artifacts/model
In [3]:
val df = spark.read
.format("csv")
.option("header", "true")
.option("delimiter", ";")
.load(winePath)
.transform(normalizeSchema)
df = [fixed_acidity: string, volatile_acidity: string ... 10 more fields]
Out[3]:
[fixed_acidity: string, volatile_acidity: string ... 10 more fields]
In [4]:
%%PySpark
import mlflow
from mlflow import pyfunc
model_path = "/tmp/mlflow/artifactStore/0/96cba14c6e4b452e937eb5072467bf79/artifacts/model"
wine_quality_udf = mlflow.pyfunc.spark_udf(spark, model_path)
spark.udf.register("wineQuality", wine_quality_udf)
Out[4]:
<function spark_udf.<locals>.predict at 0x1116a98c8>
In [6]:
df.createOrReplaceTempView("wines")
In [10]:
%%SQL
SELECT
quality,
wineQuality(
fixed_acidity,
volatile_acidity,
citric_acid,
residual_sugar,
chlorides,
free_sulfur_dioxide,
total_sulfur_dioxide,
density,
pH,
sulphates,
alcohol
) AS prediction
FROM wines
LIMIT 10
Out[10]:
+-------+------------------+
|quality| prediction|
+-------+------------------+
| 5| 5.576883967129615|
| 5| 5.50664776916154|
| 5| 5.525504822954496|
| 6| 5.504311247097457|
| 5| 5.576883967129615|
| 5|5.5556903912725755|
| 5| 5.467882654744997|
| 7| 5.710602976324739|
| 7| 5.657319539336507|
| 5| 5.345098606538708|
+-------+------------------+
In [17]:
spark.catalog.listFunctions.filter('name like "%wineQuality%").show(20, false)
+-----------+--------+-----------+---------+-----------+
|name |database|description|className|isTemporary|
+-----------+--------+-----------+---------+-----------+
|wineQuality|null |null |null |true |
+-----------+--------+-----------+---------+-----------+
ืืืืึทืืขืจ ืกืืขืคึผืก
ืืคืืื MLflow ืืื ืืื ืึทืืฃ ืืืขืจืกืืข ืืื ืืขืจ ืฆืืื ืคืื ืฉืจืืืื, ืขืก ืงืืงื ืืึทื ืฅ ืคึผืจืึทืืึทืกืื ื. ื ืึธืจ ืื ืคืืืืงืืื ืฆื ืืืืคื ืงืืืคื ืืึทืฉืื ืืขืจื ืขื ืคืจืึทืืขืืืึธืจืงืก ืืื ืคืึทืจื ืืฆื ืืื ืคึฟืื ืึท ืืืื ืขื ืืคึผืืื ื ื ืขืื ืจืขืงืึธืืขื ืืึทืืึธืจ ืกืืกืืขืืขื ืฆื ืืขืจ ืืืืึทืืขืจ ืืืจืื.
ืืื ืึทืืืฉืึทื, MLflow ืืจืขื ืื ืืึทืืึท ืขื ืืืฉืึทื ืืจื ืืื ืืึทืืึท ืืืืกื ืฉืึทืคึฟื ืกืคึผืขืฉืึทืืึทืกืฅ ื ืขืขื ืืขืจ ืฆืืืึทืืขื, ืืจืืืคืืืืื ืึท ืคึผืจืึธืกื ืฉืืืืข ืฆืืืืฉื ืืื.
ื ืึธื ืืขื ืืืกืคืึธืจืฉืื ื ืคืื MLflow, ืืืจ ืืขื ืขื ืืืืขืจ ืึทื ืืืจ ืืืขืื ืคืึธืจืืืก ืืื ื ืืฆื ืขืก ืคึฟืึทืจ ืืื ืืืขืจ ืกืคึผืึทืจืง ืคึผืืืคึผืืืื ื ืืื ืจืขืงืึธืืืขื ืืึทืืึธืจ ืกืืกืืขืืขื.
ืขืก ืืืึธืื ืืืื ืคืืึทื ืฆื ืกืื ืืงืจืึทื ืืื ืื ืืขืงืข ืกืืึธืจืืืืฉ ืืื ืื ืืึทืืึทืืืืก ืึทื ืฉืืึธื ืคืื ืื ืืขืงืข ืกืืกืืขื. ืืึธืก ืืึธื ืืขืื ืืื ืื ืงืืืคื ืขื ืืคึผืึธืื ืฅ ืืืึธืก ืงืขื ืขื ื ืืฆื ืื ืืขืืืข ืืขืงืข ืกืืึธืจืืืืฉ. ืคึฟืึทืจ ืืืึทืฉืคึผืื, ื ืืฆื ืงืืืคื ืื ืกืืึทื ืกืื
ืฆื ืกืึทืืขืจืืื, ืืื ืืืึธืื ืืื ืฆื ืืึธืื ืึท ืืึทื ืงืขื ืืืจ ืฆื ืื MLFlow ืงืื ืคึฟืึทืจ ืืืื ืืื ืืืขืจ ืึทืจืืขื ืืื ืืึทืื ืืขืจ ืืฉืืงืึทืืืข.
ืืืื ืืืจ ืฉืคึผืืื ืืื MLflow, ืืึธื ื ืื ืงืืืขื ืงืืขื ืฆื ืฉืจืืึทืื ืฆื ืืื ืื ืืื ืืึธืื ืืื ืื ืืื ืืืจ ื ืืฆื ืขืก, ืืื ืืคืืื ืืขืจ ืืืื ืืืจ ื ืืฆื ืขืก ืืื ืคึผืจืึธืืืงืฆืืข.
ืืขืคึฟืื ืขื ืืื ืืขืจ ืืืขืื ืื ืงืึธืจืกืึทื:
ืืืืขื ืขื ืืขืจ:
ืจืืกืงืก ืืื ืืืึธืจืขื ืขื ืคืื ืึทืคึผืืืืื ื ืืืืคึผื ืงืึธืืคึผืึธื ืขื ื ืึทื ืึทืืืกืืก ืฆื ืกืืคึผืขืจืืืืืื ืืขืจื ืขื ืคึผืจืึธืืืขืืก ืืืคึผืืืืื ื ืึท ืืึทืฉืื ืืขืจื ืขื ืืึธืืขื ืืื ืืึธืงืขืจ - ืืืื 1 ืืืคึผืืืืื ื ืึท ืืึทืฉืื ืืขืจื ืขื ืืึธืืขื ืืื ืืึธืงืขืจ - ืืืื 2
ืืงืืจ: www.habr.com