Kia ora, Khabrovites. I tuhia e matou, i tenei marama ka whakarewahia e OTUS nga akoranga e rua mo te ako miihini i te wa kotahi, ara turanga и matatau. I runga i tenei kaupapa, ka tohatoha tonu matou i nga rauemi whai hua.
Ko te kaupapa o tenei tuhinga ko te korero mo to maatau wheako tuatahi me MLRere.
Ka timata tatou i te arotake MLRere mai i tana tūmau aroturuki me te whakatairanga i nga waahanga katoa o te ako. Na ka tohatohahia e matou te wheako ki te hono i a Spark me MLflow ma te whakamahi i te UDF.
Horopaki
Kei roto tatou Arepa Hauora ka whakamahia e matou te ako miihini me te mohiotanga horihori hei whakamana i nga tangata ki te tiaki i o raatau hauora me o raatau oranga. Koinei te take i noho ai nga tauira ako miihini ki te uho o nga hua raraunga ka whakawhanakehia e matou, me te aha i aro ai a MLflow, he papaa tuwhera e kapi ana i nga ahuatanga katoa o te huringa ora ako miihini.
MLRere
Ko te whainga matua o MLflow ko te whakarato i tetahi paparanga ki runga ake o te ako miihini ka taea e nga kaiputaiao raraunga te mahi me te tata ki nga whare pukapuka ako miihini (h2o, pehi, mleap, pytorch, sklearn и tensorflow), te kawe i ana mahi ki te taumata e whai ake nei.
E toru nga waahanga e whakaratohia ana e MLflow:
aroturuki - te tuhi me nga tono mo nga whakamatautau: waehere, raraunga, whirihoranga me nga hua. He mea nui ki te whai i te tukanga o te hanga tauira.
kaupapa - Ko te whakatakotoranga kiki kia rere i runga i tetahi papaaho (hei tauira, Kaihanga Kaihanga)
tauira he whakatakotoranga noa mo te tuku tauira ki nga momo taputapu whakatakotoranga.
Ko te MLflow (alpha i te wa e tuhi ana) he papaa tuwhera e taea ai e koe te whakahaere i te huringa ora ako miihini, tae atu ki te whakamatautau, te whakamahi ano, me te tuku.
Te whakatu MLflow
Hei whakamahi i te MLflow, me whakarite tuatahi koe i te taiao Python katoa, mo tenei ka whakamahia e matou PyEnv (ki te whakauru i te Python ki runga i te Mac, tirohia konei). Na ka taea e tatou te hanga i tetahi taiao mariko hei whakauru i nga whare pukapuka katoa e tika ana hei whakahaere.
```
pyenv install 3.7.0
pyenv global 3.7.0 # Use Python 3.7
mkvirtualenv mlflow # Create a Virtual Env with Python 3.7
workon mlflow
```
Tuhipoka: Kei te whakamahi matou i te PyArrow hei whakahaere tauira penei i nga UDF. Ko nga putanga o PyArrow me Numpy me whakatika na te mea he taupatupatu nga putanga hou ki a raua ano.
Whakarewa UI Aroturuki
Ka taea e te Aroturuki MLflow te takiuru me te uiui i nga whakamatautau me te Python me te okiokinga API. I tua atu, ka taea e koe te tautuhi ki hea hei rokiroki i nga taonga tauira (localhost, Amazon S3, Rokiroki Azure Blob, Kei te Kohikohi Cloud Cloud ranei SFTP tūmau). I te mea ka whakamahia e matou te AWS i te Alpha Health, ko S3 te waahi rokiroki mo nga taonga.
# Running a Tracking Server
mlflow server
--file-store /tmp/mlflow/fileStore
--default-artifact-root s3://<bucket>/mlflow/artifacts/
--host localhost
--port 5000
E taunaki ana a MLflow ki te whakamahi i te rokiroki konae. Ko te rokiroki konae ko te waahi ka penapenahia e te tūmau te whakahaere me te whakamatautau metadata. I te wa e timata ana te tūmau, me tohu ki te rokiroki kōnae mau tonu. I konei, mo te whakamatautau, ka whakamahi noa matou /tmp.
Kia maumahara ki te hiahia tatou ki te whakamahi i te tūmau mlflow ki te whakahaere i nga whakamatautau tawhito, me noho ki te toa konae. Heoi, ahakoa kaore tenei, ka taea e taatau ki te whakamahi i roto i te UDF, na te mea ko te huarahi ki te tauira anake.
Kia mahara: Me whai waahi te UI Aroturuki me te tauira tauira ki te waahi o te taonga. Arā, ahakoa ko te UI Aroturuki kei roto i te tauira EC2, i te wa e whakahaere ana te MLflow ki te rohe, me uru tika te miihini ki te S3 ki te tuhi tauira toi.
Ko te Aroturuki UI ka penapena taonga ki te peere S3
Nga Tauira Rere
I te wa e rere ana te tūmau Aroturuki, ka taea e koe te timata ki te whakangungu i nga tauira.
Hei tauira, ka whakamahia e matou te whakarereketanga waina mai i te tauira MLflow i roto Sklearn.
Ka rite ki ta matou i kii ai, ka taea e MLflow te tuhi i nga tawhā, inenga, me te tauira taonga toi kia taea ai e koe te whai i te ahua o te whakawhanaketanga hei taapiri. He tino whaihua tenei ahuatanga, na te mea ka taea e maatau te whakaputa i te tauira pai rawa atu ma te whakapiri atu ki te tūmau Aroturuki, ki te mohio ranei ko tehea waehere i mahi i te whitiwhitinga e hiahiatia ana ma te whakamahi i nga raarangi git hash o nga commits.
Ko te tūmau aroturuki MLflow i whakarewahia me te whakahau "mlflow server" he API REST mo te aroturuki i nga oma me te tuhi raraunga ki te punaha konae a rohe. Ka taea e koe te whakapūtā te wāhitau o te tūmau aroturuki mā te whakamahi i te taurangi taiao "MLFLOW_TRACKING_URI" ka whakapā aunoatia e te API aroturuki MLflow ki te tūmau aroturuki i tēnei wāhi noho ki te waihanga/whiwhi korero whakarewatanga, inenga takiuru, aha atu.
Hei whakarato i te tauira ki te tūmau, me hiahia he tūmau aroturuki e rere ana (tirohia te atanga whakarewatanga) me te Run ID o te tauira.
Whakahaere ID
# Serve a sklearn model through 127.0.0.0:5005
MLFLOW_TRACKING_URI=http://0.0.0.0:5000 mlflow sklearn serve
--port 5005
--run_id 0f8691808e914d1087cf097a08730f17
--model-path model
Hei mahi i nga tauira ma te whakamahi i te taumahinga mahi MLflow, me uru tatou ki te UI Aroturuki ki te tiki korero mo te tauira ma te tautuhi noa --run_id.
Ina whakapā atu te tauira ki te Tūmau Aroturuki, ka taea e tatou te tiki i tetahi pito mutunga tauira hou.
Ahakoa te mea he kaha te tūmau Aroturuki ki te mahi tauira i roto i te waa, whakangungu me te whakamahi i te mahi a te tūmau (puna: mlflow // docs // tauira #local), ma te whakamahi i te Spark (te puranga, te rerema ranei) he otinga kaha ake na te tohatoha.
Whakaarohia kua mahi whakangungu tuimotu koe katahi ka tono i te tauira putanga ki o raraunga katoa. Koinei te waahi ka uru mai a Spark me MLflow ki a raatau ake.
Kua tautuhia notebook-dir, ka taea e matou te penapena i a matou pukatuhi ki te kōpaki e hiahiatia ana.
Rere Jupyter mai i PySpark
I te mea i taea e matou te whakatu a Jupiter hei taraiwa PySpark, ka taea e matou te whakahaere i te pukapuka Jupyter i roto i te horopaki PySpark.
(mlflow) afranzi:~$ pyspark
[I 19:05:01.572 NotebookApp] sparkmagic extension enabled!
[I 19:05:01.573 NotebookApp] Serving notebooks from local directory: /Users/afranzi/Projects/notebooks
[I 19:05:01.573 NotebookApp] The Jupyter Notebook is running at:
[I 19:05:01.573 NotebookApp] http://localhost:8888/?token=c06252daa6a12cfdd33c1d2e96c8d3b19d90e9f6fc171745
[I 19:05:01.573 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 19:05:01.574 NotebookApp]
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://localhost:8888/?token=c06252daa6a12cfdd33c1d2e96c8d3b19d90e9f6fc171745
Ka rite ki te korero i runga ake nei, ka whakaratohia e MLflow te mahi o te tuhi i nga taonga tauira i roto i te S3. I te wa kei a matou te tauira kua tohua ki o matou ringaringa, ka whai waahi matou ki te kawemai hei UDF ma te whakamahi i te waahanga mlflow.pyfunc.
Tae noa ki tenei wa, kua korero matou me pehea te whakamahi i te PySpark me te MLflow ma te whakahaere i te matapae kounga waina i runga i te huinga raraunga waina katoa. Engari me pehea koe ki te whakamahi i nga waahanga Python MLflow mai i Scala Spark?
I whakamatauria ano e matou tenei ma te wehewehe i te horopaki Spark i waenga i a Scala me Python. Arā, i rehitatia e matou te MLflow UDF ki Python, ka whakamahia mai i a Scala (ae, ehara pea i te otinga pai, engari he aha kei a matou).
Scala Spark + MLflow
Mo tenei tauira, ka tapiritia e matou Toree Kernel ki roto i te Hupita o mua.
Tāuta Spark + Toree + Jupyter
pip install toree
jupyter toree install --spark_home=${SPARK_HOME} --sys-prefix
jupyter kernelspec list
```
```
Available kernels:
apache_toree_scala /Users/afranzi/.virtualenvs/mlflow/share/jupyter/kernels/apache_toree_scala
python3 /Users/afranzi/.virtualenvs/mlflow/share/jupyter/kernels/python3
```
Ka kitea e koe mai i te pukatuhi kua apitihia, ka tirihia te UDF i waenga i a Spark me PySpark. Ko te tumanako ka whai hua tenei waahanga mo te hunga e aroha ana ki a Scala me te hiahia ki te tuku tauira ako miihini ki te whakaputa.
Ahakoa kei te Alpha a MLflow i te wa e tuhi ana, he ahua tino pai. Ma te kaha ki te whakahaere i nga anga ako miihini maha me te whakamahi mai i te waahi mutunga kotahi ka eke nga punaha taunaki ki te taumata e whai ake nei.
I tua atu, ko te MLflow ka whakatata atu ki nga miihini Raraunga me nga Kairangataiao Raraunga, ka whakatakoto he paparanga noa i waenga i a raatau.
Whai muri i tenei tirotirohanga mo te MLflow, e mohio ana matou ki te haere whakamua ki te whakamahi mo o maatau paipa Spark me nga punaha tūtohu.
He pai ki te tukutahi i te rokiroki o nga konae me te papaa raraunga hei utu mo te punaha konae. Ma tenei ka hoatu ki a maatau nga pito mutunga maha ka taea te whakamahi i te tiritahi konae. Hei tauira, whakamahia nga waahi maha presto и Athena ki te taua metastore Kāpia.
Hei whakarapopototanga, ka mihi au ki te hapori MLFlow mo te whakahihiri i a maatau mahi me nga raraunga.
Mena ka takaro koe ki te MLflow, tena koa tuhi mai ki a maatau ka korero mai me pehea e whakamahia ai e koe, me te mea ka whakamahia e koe ki te whakaputa.