Microsoft ya buɗe ɗakin karatu na binciken vector da aka yi amfani da shi a cikin Bing

Microsoft aka buga lambobin tushen ɗakin karatu na na'ura SPTAG (Space Partition Tree And Graph) tare da aiwatar da kusan algorithm nema mafi kusa. Laburare ci gaba a cikin sashin bincike na Microsoft Research da cibiyar haɓaka fasahar bincike (Cibiyar Fasahar Binciken Microsoft). A aikace, injin bincike na Bing yana amfani da SPTAG don tantance mafi dacewa sakamakon bisa mahallin tambayoyin nema. An rubuta lambar a C++ da rarraba ta ƙarƙashin lasisin MIT. Gina don Linux da Windows ana tallafawa. Akwai ɗaure don yaren Python.

Duk da cewa ra'ayin yin amfani da ma'ajiyar vector a cikin injunan bincike ya daɗe yana shawagi, a aikace, aiwatar da su yana fuskantar cikas ta hanyar babban ƙarfin aiki na kayan aiki tare da vectors da iyakoki. Haɗa hanyoyin ilmantarwa mai zurfi na inji tare da kusan maƙwabcin bincike algorithms ya ba da damar kawo aiki da haɓakar tsarin vector zuwa matakin yarda ga manyan injunan bincike. Misali, a cikin Bing, don ma'aunin vector sama da biliyan 150, lokacin da za a samo mafi dacewa sakamakon yana tsakanin 8 ms.

Laburaren ya ƙunshi kayan aiki don gina fihirisa da tsara binciken binciken vector, da kuma saitin kayan aiki don kiyaye tsarin bincike na kan layi da aka rarraba wanda ke rufe manyan tarin vectors. Miƙa waɗannan nau'ikan nau'ikan nau'ikan: maginin ƙididdiga don ƙididdigewa, mai neman bincike ta amfani da fihirisar da aka rarraba a cikin gungu na nodes da yawa, uwar garken don masu gudanar da aiki akan nodes, Aggregator don haɗa sabobin da yawa zuwa ɗaya, da abokin ciniki don aika tambayoyin. Ana tallafawa haɗa sabbin vectors a cikin fihirisa da kuma gogewa a kan tashi.

Laburaren yana nuna cewa bayanan da aka sarrafa da kuma gabatar da su a cikin tarin an tsara su a cikin nau'i na nau'i mai alaka da za a iya kwatanta su a kan. Euclidean (L2) ko cosin nisa Tambayar bincike tana mayar da ma'auni waɗanda nisa tsakanin su da ainihin vector ba ta da yawa. SPTAG yana ba da hanyoyi guda biyu don tsara sararin samaniya: SPTAG-KDT (Bishiyar K-girma (K)kd - itace) da kuma jadawali unguwar dangida SPTAG-BKT (k-ma'anar itace (k-yana nufin itace da jadawali na dangi). Hanya ta farko tana buƙatar ƙarancin albarkatu yayin aiki tare da fihirisa, na biyu kuma yana nuna mafi girman daidaiton sakamakon bincike don tarin manyan ɓangarori.

A lokaci guda, binciken vector ba'a iyakance ga rubutu ba kuma ana iya amfani da shi zuwa bayanan multimedia da hotuna, da kuma a cikin tsarin samar da shawarwari ta atomatik. Misali, daya daga cikin abubuwan da suka danganci tsarin PyTorch ya aiwatar da tsarin vector don bincike bisa kamancen abubuwan da ke cikin hotuna, wanda aka gina ta amfani da bayanai daga tarin bayanai da yawa tare da hotunan dabbobi, kuliyoyi da karnuka, wadanda aka canza su zuwa jeri na vectors. . Lokacin da aka karɓi hoto mai shigowa don bincike, ana canza shi ta amfani da ƙirar koyo na injin zuwa vector, dangane da abin da aka zaɓi mafi yawan nau'ikan vectors daga ma'auni ta amfani da SPTAG algorithm kuma ana mayar da hotuna masu alaƙa a sakamakon haka.

source: budenet.ru

Add a comment