I-GitHub ivule uphuhliso ekusebenziseni umatshini wokufunda ukukhangela ikhowudi kunye nohlalutyo

GitHub wazisiwe iprojekthi CodeSearchNet, oye walungiselela imodeli yokufunda yomatshini kunye neeseti zedatha eziyimfuneko ekucazululeni, ukwahlula kunye nokuhlalutya ikhowudi kwiilwimi ezahlukeneyo zokucwangcisa. CodeSearchNet, iyafana ne IMAGEnet, iquka ingqokelela enkulu yeekhowudi eziziqwengana ezinezihlomelo ezenza ngokusesikweni oko kwenziwa yikhowudi. Amacandelo eemodeli zoqeqesho kunye nemizekelo yokusebenzisa iCodeSearchNet ibhalwe kwiPython kusetyenziswa isakhelo seTensorflow kunye isasazwa ngu phantsi kwelayisenisi ye-MIT.

Xa udala i-CodeSearchNet, itekhnoloji yokwahlulahlula itekhnoloji yolwimi lwendalo isetyenzisiwe, okwenza ukuba iinkqubo zokufunda zoomatshini zingathatheli ngqalelo kuphela iimpawu ze-syntactic, kodwa kunye nentsingiselo yezenzo ezenziwa yikhowudi. Inkqubo yeGitHub iyasebenza kwimifuniselo yokukhangela ikhowudi yesemantiki usebenzisa imibuzo kwi ulwimi lwendalo (umzekelo, xa ucela "ukuhlela uluhlu lweentambo", ikhowudi kunye nokuphunyezwa kwe-algorithms ehambelanayo iboniswa).

I-dataset ecetywayo iquka ngaphezu kwe-2 yezigidi ze-code-comment links, ezilungiselelwe ngokusekelwe kwiitekisi zomthombo zamathala eencwadi akhoyo avulekileyo. Ikhowudi ihlanganisa isicatshulwa esipheleleyo somthombo wemisebenzi okanye iindlela zomntu ngamnye, kwaye inkcazo ichaza izenzo ezenziwa ngumsebenzi (amaxwebhu aneenkcukacha anikeziwe). Okwangoku, iiseti zedatha zilungiselelwe iPython, iJavaScript, iRuby, iGo, iJava kunye ne-PHP. Kubonelelwe ngemizekelo yokusebenzisa iiseti zedatha ezicetywayo zokuqeqesha iindidi ezahlukeneyo zothungelwano lwe-neural, kubandakanywa I-Neural-Bag-Of-Words, RNN, Ukuzihoya (BERT) kunye 1D-CNN+Self-Attention Hybrid.

Ukuphuhlisa iindlela zokukhangela ulwimi lwendalo, iseti yeCodeSearchNet Challenge sele ilungisiwe, kuquka
99 eqhelekileyo imibuzo emalunga nama-4 amawaka eengcaphephe zengcaciso echaza eyona khowudi inokubakho kubophelela kwiCodeSearchNet Corpus dataset, equka malunga nezigidi ezi-6 zeendlela kunye nemisebenzi (seta ubungakanani malunga ne-20 GB). Umngeni weCodeSearchNet unokusebenza njengophawu lokuvavanya impumelelo yeendlela ezithile zokukhangela ikhowudi yolwimi lwendalo. Ukusebenzisa izixhobo IKubeflow zilungisiwe
mzekelo ikhowudi yokukhangela injini.

umthombo: opennet.ru

Yongeza izimvo