I-DeepPavlov yonjiniyela: #1 amathuluzi e-NLP nokudalwa kwe-chatbot

Sanibonani nonke! Sivula uchungechunge lwama-athikili anikelwe ekuxazululeni izinkinga ezingokoqobo ezihlobene nokucutshungulwa kolimi lwemvelo (I-Natural Language Processing noma i-NLP kalula) futhi sidale ama-ejenti wezingxoxo (izingxoxo) sisebenzisa umtapo wolwazi ovulekile. I-DeepPavlov, esathuthukiswa ithimba lethu e-MIPT Laboratory of Neural Systems and Deep Learning. Umgomo oyinhloko walolu chungechunge ukwethula i-DeepPavlov kubathuthukisi abahlukahlukene futhi ubonise ukuthi ungazixazulula kanjani izinkinga ze-NLP ezisetshenzisiwe ngaphandle kokuba nolwazi olujulile ku- Machine Learning kanye ne-PhD ku-Mathematics.

Imisebenzi ye-NLP ihlanganisa ukunquma imizwa yombhalo, ukuhlukanisa izinhlangano eziqanjwe igama, ukunquma ukuthi lowo oxoxa naye ufunani ku-bot yakho: oda i-pizza noma uthole ulwazi lwangemuva, nokunye okuningi. Ungafunda kabanzi mayelana nemisebenzi ye-NLP nezindlela lapha.

Kulesi sihloko, sizokubonisa indlela yokusebenzisa iseva ye-REST ngamamodeli e-NLP aqeqeshwe ngaphambilini, alungele ukusetshenziswa ngaphandle kokucushwa okwengeziwe noma ukuqeqeshwa.

I-DeepPavlov yonjiniyela: #1 amathuluzi e-NLP nokudalwa kwe-chatbot

Ukufakwa kwe-DeepPavlov

Lapha nangezansi, imiyalo ye-Linux izonikezwa. NgeWindows, bheka yethu imibhalo

  • Dala futhi wenze kusebenze indawo ebonakalayo ngenguqulo yamanje esekelwayo yePython:
    virtualelnv env -p python3.7
    source env/bin/activate
  • Faka i-DeepPavlov endaweni ebonakalayo:
    pip install deeppavlov
    

Kwethulwa iseva ye-REST ngemodeli ye-DeepPavlov

Ngaphambi kokuthi sethule iseva ngemodeli ye-DeepPavlov okokuqala ngqa, kuzoba usizo ukukhuluma ngezici ezithile zokwakheka komtapo wolwazi.

Noma iyiphi imodeli ku-DP iqukethe:

  • Ikhodi ye-Python;
  • Izingxenye ezilandwayo - imiphumela yokuqeqeshwa kwe-serialized kudatha ethile (ukushumeka, izisindo zamanethiwekhi emizwa, njll.);
  • Ifayela lokumisa (ngemuva kwalokhu okuzobizwa ngalo ngokuthi ukulungiselelwa), eliqukethe ulwazi mayelana namakilasi asetshenziswa imodeli, ama-URL ezingxenye ezilandiwe, ukuncika kwePython, njll.

Sizokutshela kabanzi mayelana nokuthi yini engaphansi kwe-hood ye-DeepPavlov ezihlokweni ezilandelayo, ngoba manje kwanele ukuba sazi ukuthi:

  • Noma iyiphi imodeli ku-DeepPavlov ikhonjwa ngegama lokucushwa kwayo;
  • Ukuze usebenzise imodeli, udinga ukulanda izingxenye zayo kumaseva we-DeepPavlov;
  • Futhi, ukuze usebenzise imodeli, udinga ukufaka imitapo yolwazi yePython eyisebenzisayo.

Imodeli yokuqala esizoyethula izoba ngezilimi eziningi Ebizwa ngokuthi I-Entity Recognition (NER). Imodeli ihlukanisa amagama ombhalo ngokohlobo lwamabhizinisi anegama ayingxenye yawo (amagama afanelekile, amagama ezindawo, amagama ezinhlobo zemali, nokunye). Lungiselela igama lenguqulo yakamuva ye-NER:

ner_ontonotes_bert_mult

Sethula iseva ye-REST ngemodeli:

  1. Sifaka ukuncika kwemodeli okucaciswe ekucushweni kwayo endaweni ebonakalayo esebenzayo:
    python -m deeppavlov install ner_ontonotes_bert_mult
    
  2. Landa izingxenye zemodeli ye-serialized kusuka kumaseva we-DeepPavlov:
    python -m deeppavlov download ner_ontonotes_bert_mult
    

    Izingxenye ze-serialized zizolandwa kuhla lwemibhalo lwasekhaya lwe-DeepPavlov, olutholakala ngokuzenzakalelayo

    ~/.deeppavlov

    Lapho ulanda, i-hash yezingxenye esezilandiwe ibhekwa ngokumelene nama-hashes ezingxenye ezitholakala kuseva. Uma kukhona okufanayo, ukulanda kuyeqiwa futhi kusetshenziswe amafayela akhona. Osayizi bezingxenye ezilandiwe bangahluka ngokwesilinganiso ukusuka ku-0.5 ukuya ku-8 Gb, kwezinye izimo bafinyelele ku-20 Gb ngemva kokuvula uziphu.

  3. Sethula iseva ye-REST ngemodeli:
    python -m deeppavlov riseapi ner_ontonotes_bert_mult -p 5005
    

Njengomphumela wokwenza lo myalo, iseva ye-REST enemodeli izokwethulwa ku-port 5005 yomshini wokusingathwa (imbobo ezenzakalelayo ingu-5000).

Ngemva kokuqalisa imodeli, i-Swagger enemibhalo ye-API kanye nekhono lokuhlola lingatholakala ku-URL http://127.0.0.1:5005. Ake sihlole imodeli ngokuyithumela endaweni yokugcina http://127.0.0.1:5005/model THUMELA isicelo ngokuqukethwe okulandelayo kwe-JSON:

{
  "x": [
    "В МФТИ можно добраться на электричке с Савёловского Вокзала.",
    "В юго-западной Руси стог жита оценен в 15 гривен"
  ]
}

Ekuphenduleni kufanele sithole i-JSON elandelayo:

[
  [
    ["В", "МФТИ", "можно", "добраться", "на", "электричке", "с", "Савёловского", "Вокзала", "."],
    ["O", "B-FAC", "O", "O", "O", "O", "O", "B-FAC", "I-FAC", "O"]
  ],
  [
    ["В", "юго", "-", "западной", "Руси", "стог", "жита", "оценен", "в", "15", "гривен"],
    ["O", "B-LOC", "I-LOC", "I-LOC", "I-LOC", "O", "O", "O", "O", "B-MONEY", "I-MONEY"]
  ]
]

Sisebenzisa lezi zibonelo, sizohlaziya i-DeepPavlov REST API.

I-API DeepPavlov

Imodeli ngayinye ye-DeepPavlov ine-agumenti yokufaka okungenani eyodwa. Ku-REST API, izimpikiswano ziqanjwa, amagama azo angokhiye besichazamazwi esingenayo. Ezimweni eziningi, ukuphikisana kuwumbhalo odinga ukucutshungulwa. Ulwazi olwengeziwe mayelana nama-agumenti namanani abuyiswe amamodeli angatholakala esigabeni sama-MODELS samadokhumenti I-DeepPavlov

Esibonelweni, uhlu lwezintambo ezimbili ludluliselwe ku-agumenti x, ngayinye yazo yanikezwa umaki ohlukile. Ku-DeepPavlov, wonke amamodeli athatha njengokufakwayo uhlu (inqwaba) lwamanani acutshungulwa ngokuzimela.

Igama elithi “inqwaba” libhekisela kumkhakha wokufunda komshini futhi libhekisela kunqwaba yamanani okokufaka azimele acutshungulwa i-algorithm noma inethiwekhi ye-neural kanyekanye. Le ndlela ikuvumela ukuthi unciphise (ngokuvamile kakhulu) isikhathi imodeli icubungula ingxenye eyodwa yenqwaba uma kuqhathaniswa nenani elidluliselwe kokokufaka ngokuhlukile. Kodwa umphumela wokucubungula ukhishwa kuphela ngemva kokuba zonke izakhi sezicutshunguliwe. Ngakho-ke, lapho ukhiqiza i-batch engenayo, kuzodingeka ukuthi kucatshangelwe ijubane lemodeli kanye nesikhathi sokucubungula esidingekayo sesici ngasinye saso.

Uma kunezimpikiswano eziningana kumodeli we-DeepPavlov, ngayinye yazo ithola iqoqo layo lamanani, futhi ekuphumeni imodeli ihlale ikhiqiza iqoqo elilodwa lezimpendulo. Izakhi zeqoqo eliphumayo ziyimiphumela yokucubungula izakhi zamaqoqo angenayo ngenkomba efanayo.

Esibonelweni esingenhla, umphumela wemodeli wawuwukuhlukanisa iyunithi yezinhlamvu ngayinye ibe amathokheni (amagama nezimpawu zokubhala) futhi ihlukanise ithokheni ngokuhlobene nebhizinisi eliqanjwe igama (igama lenhlangano, uhlobo lwemali) elimele. Njengamanje imodeli ner_ontonotes_bert_mult ekwazi ukubona izinhlobo eziyi-18 zezinhlangano eziqanjwe igama, incazelo enemininingwane ingatholakala lapha.

Amanye amamodeli angaphandle kwebhokisi ka-DeepPavlov

Ngokungeziwe ku-NER, amamodeli alandelayo angaphandle kwebhokisi ayatholakala ku-DeepPavlov ngesikhathi sokubhala:

Ukuphendula Umbuzo Wombhalo

Phendula umbuzo embhalweni ngesiqephu salo mbhalo. Ukumiswa kwemodeli: squad_ru_bert_infer

Isicelo esiyisibonelo:

{
  "context_raw": [
    "DeepPavlov разрабатывается лабораторией МФТИ.",
    "В юго-западной Руси стог жита оценен в 15 гривен."
  ],
  "question_raw": [
    "Кем разрабатывается DeepPavlov?",
    "Сколько стоил стог жита на Руси?"
  ]
}

Umphumela:

[
  ["лабораторией МФТИ", 27, 31042.484375],
  ["15 гривен", 39, 1049.598876953125]
]

Ukutholwa Kwenhlamba

Ukutholwa kokuba khona kwenhlamba kumuntu okubhekiselwe kuye umbhalo (ngesikhathi sokubhala - ngesiNgisi kuphela). Ukumiswa kwemodeli:thuka_kaggle_conv_bert

Isicelo esiyisibonelo:


{
  "x": [
    "Money talks, bullshit walks.",
    "You are not the brightest one."
  ]
}

Umphumela:

[
  ["Not Insult"],
  ["Insult"]
]

Ukuhlaziywa Kwengqondo

Ukuhlukaniswa kwemizwa yombhalo (enhle, engathathi hlangothi, embi). Ukumiswa kwemodeli: rusentiment_elmo_twitter_cnn

Isicelo esiyisibonelo:

{
  "x": [
    "Мне нравится библиотека DeepPavlov.",
    "Я слышал о библиотеке DeepPavlov.",
    "Меня бесят тролли и анонимусы."
  ]
}

Umphumela:

[
  ["positive"],
  ["neutral"],
  ["negative"]
]

Ukutholwa kwe-Paraphrase

Ukunquma ukuthi imibhalo emibili ehlukene inencazelo efanayo yini. Ukumiswa kwemodeli: stand_paraphraser_zu

Isicelo:

{
  "text_a": [
    "Город погружается в сон, просыпается Мафия.",
    "Президент США пригрозил расторжением договора с Германией."
  ],
  "text_b": [
    "Наступает ночь, все жители города пошли спать, а преступники проснулись.",
    "Германия не собирается поддаваться угрозам со стороны США."
  ]
}

Umphumela:

[
  [1],
  [0]
]

Uhlu lwamanje lwawo wonke amamodeli angaphandle kwebhokisi e-DeepPavlov lungahlala lutholakala lapha.

isiphetho

Kulesi sihloko, sajwayelana ne-DeepPavlov API kanye namanye amakhono okucubungula umbhalo womtapo wolwazi anikezwe ngaphandle kwebhokisi. Kufanele kukhunjulwe ukuthi kunoma yimuphi umsebenzi we-NLP, umphumela omuhle kakhulu uzotholakala lapho uqeqesha imodeli kusethi yedatha ehambisana nendawo yesihloko (isizinda) somsebenzi. Ngaphezu kwalokho, amamodeli amaningi nakakhulu awakwazi ukuqeqeshwa kuzo zonke izimo.

Ezihlokweni ezilandelayo sizobheka izilungiselelo ezengeziwe zomtapo wolwazi, sethula i-DeepPavlov kusuka ku-Docker, bese siqhubekela phambili kumamodeli wokuqeqesha. Futhi ungakhohlwa ukuthi i-DeepPavlov ine inkundla - buza imibuzo yakho mayelana nomtapo wolwazi namamodeli. Ngiyabonga ukulalela kwenu!

Source: www.habr.com

Engeza amazwana