I-DeepPavlov yabaPhuhlisi: #1 Izixhobo ze-NLP kunye neNdalo ye-Chatbot

Molweni nonke! Sivula uthotho lwamanqaku anikelwe ekusombululeni iingxaki ezisebenzayo ezinxulumene nokusetyenzwa kolwimi lwendalo (Inkqubo yoLwimi lweNdalo okanye ngokulula i-NLP) kunye nokudala iiarhente zokuncokola (i-chat bots) kusetyenziswa ithala leencwadi elivulelekileyo. DeepPavlov, ephuhliswa liqela lethu kwi-MIPT Laboratory ye-Neural Systems kunye ne-Deep Learning. Injongo ephambili yomjikelezo kukuzisa uluhlu olubanzi lwabaphuhlisi kwi-DeepPavlov kwaye ubonise indlela onokuthi uzisombulule ngayo iingxaki ze-NLP ezisetyenzisiweyo ngaphandle kokuba nolwazi olunzulu lwe-Machine Learning kunye ne-PhD kwiMathematika.

Imisebenzi ye-NLP ibandakanya ukumisela ithoni yokubhaliweyo, ukwahlulahlula amagama amaqumrhu, ukugqiba ukuba yintoni i-interlocutor efuna kwi-bot yakho: oda i-pizza okanye ufumane ulwazi lwemvelaphi, kunye nokunye okuninzi. Unokufunda ngakumbi malunga nemisebenzi kunye neendlela ze-NLP apha.

Kweli nqaku, siza kukubonisa indlela yokuqhuba iseva ye-REST kunye neemodeli ze-NLP eziqeqeshwe kwangaphambili ezilungele ukusetyenziswa ngaphandle koqwalaselo olongezelelweyo okanye uqeqesho.

I-DeepPavlov yabaPhuhlisi: #1 Izixhobo ze-NLP kunye neNdalo ye-Chatbot

Ukufakela i-DeepPavlov

Imiyalelo yeLinux iya kunikwa apha nangezantsi. NgeWindows bona yethu uxwebhu

  • Yenza kwaye uvule imeko-bume yenyani kunye nenguqulelo yangoku exhaswayo yePython:
    virtualelnv env -p python3.7
    source env/bin/activate
  • Faka i-DeepPavlov kwindawo ebonakalayo:
    pip install deeppavlov
    

Ukwazisa iseva ye-REST ngemodeli ye-DeepPavlov

Ngaphambi kokuba siqale umncedisi kunye nemodeli ye-DeepPavlov okokuqala, kuya kuba luncedo ukuthetha ngezinye iimpawu zoyilo lwethala leencwadi.

Nayiphi na imodeli kwiDP inezi:

  • ikhowudi yePython;
  • Amacandelo anokukhutshelwa - iziphumo zokufunda ezilandelelanisiweyo kwidatha ethile (ufakelo, iintsimbi zeneural network, njl.);
  • Ifayile yoqwalaselo (emva koku kubhekiselwa kuyo njenge-config), equlethe ulwazi malunga neeklasi ezisetyenziswe yimodeli, ii-URL zamacandelo akhutshelweyo, ukuxhomekeka kwePython, kunye nokunye.

Siza kukuxelela ngakumbi malunga nento ephantsi kwe-hood ye-DeepPavlov kumanqaku alandelayo, kuba ngoku kwanele ukuba siyazi ukuba:

  • Nawuphi na umzekelo kwi-DeepPavlov ichongiwe ngegama loqwalaselo lwayo;
  • Ukuqhuba imodeli, kufuneka ukhuphele iinxalenye zayo kwiiseva ze-DeepPavlov;
  • Kwakhona, ukuqhuba imodeli, kufuneka ufake iilayibrari zePython ezisetyenziswa yiyo.

Imodeli yokuqala esiza kuyiqhuba iya kuba ngeelwimi ezininzi ezibizwa ngokuba yi-Entity Recognition (NER). Imodeli ihlela amagama esicatshulwa ngokohlobo lwamaqumrhu anikwe igama lawo (amagama afanelekileyo, amagama eendawo, amagama emali, kunye nezinye). Qwalasela igama lolona guqulelo lwamva nje lwe-NER:

ner_ontonotes_bert_mult

Siqala iseva ye-REST ngemodeli:

  1. Faka imodeli yokuxhomekeka echazwe kuqwalaselo lwayo kwindawo esebenzayo yenyani:
    python -m deeppavlov install ner_ontonotes_bert_mult
    
  2. Khuphela amacandelo emodeli esetyenzisiweyo kwiiseva ze-DeepPavlov:
    python -m deeppavlov download ner_ontonotes_bert_mult
    

    Amacandelo asetyenzisiweyo aya kukhutshelwa kulawulo lwasekhaya lwe-DeepPavlov, olufumaneka ngokungagqibekanga

    ~/.deeppavlov

    Xa ukhutshelwa, i-hash yezinto esele zikhutshiwe ithelekiswa neeheshi zamacandelo abekwe kumncedisi. Ukuba kukho umdlalo, ukukhuphela kuyatsitywa kwaye iifayile ezikhoyo zisetyenziswa. Ubungakanani bezinto ezikhutshelweyo zinokuhluka ngokomndilili ukusuka kwi-0.5 ukuya kwi-8 Gb, kwezinye iimeko zifikelela kwi-20 Gb emva kokuvula.

  3. Siqala iseva ye-REST ngemodeli:
    python -m deeppavlov riseapi ner_ontonotes_bert_mult -p 5005
    

Njengomphumo wokuphumeza lo myalelo, i-REST iseva enemodeli iya kusungulwa kwi-port 5005 yomatshini wokusingatha (i-port engagqibekanga yi-5000).

Emva kokuba imodeli iqalisiwe, i-Swagger kunye namaxwebhu e-API kunye nokukwazi ukuvavanya kunokufumaneka kwi-URL http://127.0.0.1:5005. Masivavanye imodeli ngokuyithumela kwindawo yokugqibela http://127.0.0.1:5005/model POST isicelo esinomxholo olandelayo we-JSON:

{
  "x": [
    "В МФТИ можно добраться на электричке с Савёловского Вокзала.",
    "В юго-западной Руси стог жита оценен в 15 гривен"
  ]
}

Ukuphendula, kufuneka sifumane le JSON ilandelayo:

[
  [
    ["В", "МФТИ", "можно", "добраться", "на", "электричке", "с", "Савёловского", "Вокзала", "."],
    ["O", "B-FAC", "O", "O", "O", "O", "O", "B-FAC", "I-FAC", "O"]
  ],
  [
    ["В", "юго", "-", "западной", "Руси", "стог", "жита", "оценен", "в", "15", "гривен"],
    ["O", "B-LOC", "I-LOC", "I-LOC", "I-LOC", "O", "O", "O", "O", "B-MONEY", "I-MONEY"]
  ]
]

Ukusebenzisa le mizekelo, siya kuhlalutya i-DeepPavlov REST API.

DeepPavlov API

Imodeli nganye ye-DeepPavlov inengxabano enye yegalelo. Kwi-REST API, iingxoxo zithiywe, amagama azo zizitshixo zesichazi-magama esingenayo. Kwiimeko ezininzi, ingxabano sisicatshulwa esizakusetyenzwa. Ulwazi oluthe kratya malunga neengxoxo kunye namaxabiso abuyiswe yimifuziselo inokufumaneka kwicandelo le-MODELS lamaxwebhu. DeepPavlov

Kumzekelo, uludwe lweentambo ezimbini lugqithiselwe kwingxoxo ka-x, nganye kuzo yanikwa imakishwa eyahlukileyo. Kwi-DeepPavlov, zonke iimodeli zithatha njengegalelo uluhlu (ibhetshi) lwamaxabiso acutshungulwa ngokuzimeleyo.

Igama elithi "ibhetshi" libhekisa kwindawo yokufunda koomatshini kwaye libhekisa kwibhetshi yamaxabiso azimeleyo egalelo aqhutywe yi-algorithm okanye inethiwekhi ye-neural ngaxeshanye. Le ndlela ikuvumela ukuba unciphise (kaninzi kakhulu) ixesha lokucubungula into enye yebhetshi ngomzekelo xa kuthelekiswa nexabiso elifanayo eligqithiselwe kwigalelo ngokwahlukileyo. Kodwa umphumo wokucubungula unikezelwa kuphela emva kokucwangciswa kwazo zonke izinto. Ngoko ke, xa uvelisa i-batch engenayo, kuya kufuneka ukuba kuthathelwe ingqalelo isantya somzekelo kunye nexesha elifunekayo lokucubungula into nganye yezinto zayo.

Ukuba kukho iingxabano ezininzi zemodeli ye-DeepPavlov, nganye kuzo ifumana ibhetshi yayo yamaxabiso, kwaye kwisiphumo imodeli ihlala ivelisa ibhetshi enye yeempendulo. Izinto zebhetshi ephumayo ziziphumo zokucubungula izinto zeebhetshi ezingenayo kunye nesalathisi esifanayo.

Kulo mzekelo ungasentla, umphumo womzekelo wawukuchithwa komgca ngamnye kwiimpawu (amagama kunye neempawu zokubhala) kunye nokuhlelwa komqondiso ohambelana nequmrhu eligama (igama lombutho, imali) elimele. Okwangoku imodeli ner_ontonotes_bert_mult iyakwazi ukuqaphela iindidi ezili-18 zamaqumrhu anamagama, inkcazo eneenkcukacha inokufumaneka apha.

Ezinye iimodeli ezingaphandle kwebhokisi ze-DeepPavlov

Ukongeza kwi-NER, ezi modeli zilandelayo ziphuma-kwibhokisi ziyafumaneka kwi-DeepPavlov ngexesha lokubhala:

Isiqendu Ukuphendulwa kwemibuzo

Impendulo yombuzo kwisicatshulwa sisiqwenga sesi sicatshulwa. Ubumbeko lwemodeli: squad_en_bert_infer

Cela umzekelo:

{
  "context_raw": [
    "DeepPavlov разрабатывается лабораторией МФТИ.",
    "В юго-западной Руси стог жита оценен в 15 гривен."
  ],
  "question_raw": [
    "Кем разрабатывается DeepPavlov?",
    "Сколько стоил стог жита на Руси?"
  ]
}

Isiphumo:

[
  ["лабораторией МФТИ", 27, 31042.484375],
  ["15 гривен", 39, 1049.598876953125]
]

Ukufunyanwa kweStroke

Ukuchongwa kobukho besithuko kumntu lowo isicatshulwa sibhekiswa kuye (ngexesha lokubhala - kuphela ngesiNgesi). Imodeli yoqwalaselo: insults_kaggle_conv_bert

Cela umzekelo:


{
  "x": [
    "Money talks, bullshit walks.",
    "You are not the brightest one."
  ]
}

Isiphumo:

[
  ["Not Insult"],
  ["Insult"]
]

Uhlalutyo lwesivakalisi

Ukuhlelwa kweemvakalelo zesicatshulwa (ezilungileyo, ezingathathi hlangothi, ezimbi). Ubumbeko lwemodeli: rusentiment_elmo_twitter_cnn

Cela umzekelo:

{
  "x": [
    "Мне нравится библиотека DeepPavlov.",
    "Я слышал о библиотеке DeepPavlov.",
    "Меня бесят тролли и анонимусы."
  ]
}

Isiphumo:

[
  ["positive"],
  ["neutral"],
  ["negative"]
]

UkuFunyaniswa kwebinzana elinye

Ukumisela ukuba izicatshulwa ezibini ezahlukeneyo zinentsingiselo efanayo. Ubumbeko lwemodeli: stand_paraphraser_en

Isicelo:

{
  "text_a": [
    "Город погружается в сон, просыпается Мафия.",
    "Президент США пригрозил расторжением договора с Германией."
  ],
  "text_b": [
    "Наступает ночь, все жители города пошли спать, а преступники проснулись.",
    "Германия не собирается поддаваться угрозам со стороны США."
  ]
}

Isiphumo:

[
  [1],
  [0]
]

Uluhlu oluhlaziyiweyo lwazo zonke iimodeli ze-DeepPavlov ezingaphandle kwebhokisi zihlala zifumaneka apha.

isiphelo

Kweli nqaku, siye saqhelana ne-DeepPavlov API kunye nezinye zeempawu zokubhaliweyo zethala leencwadi ezinikezelwe ngaphandle kwebhokisi. Ngexesha elifanayo, kufuneka kukhunjulwe ukuba kuyo nayiphi na imisebenzi ye-NLP, isiphumo esihle siya kufumaneka xa imodeli iqeqeshwa kwisethi yedatha ehambelana nommandla wesifundo (isizinda) somsebenzi. Ukongezelela, iimodeli ezingaphezulu, ngokomgaqo, azikwazi ukuqeqeshwa kuzo zonke izihlandlo.

Kumanqaku alandelayo, siza kujonga izicwangciso zethala leencwadi ezongezelelweyo, siqalise i-DeepPavlov esuka kwi-Docker, kwaye emva koko siqhubele phambili kwiimodeli zoqeqesho. Kwaye ungalibali ukuba i-DeepPavlov inakho iforum - buza imibuzo yakho malunga nethala leencwadi kunye neemodeli. Enkosi ngosinaka kwakho!

umthombo: www.habr.com

Yongeza izimvo