Microsoft mepere ọba akwụkwọ nyocha vector ejiri na Bing nweta

Microsoft bipụtara Koodu isi mmalite ọba akwụkwọ igwe mmụta SPTAG (Space Partition Tree and Graph) na mmejuputa algọridim dị nso ọchụchọ agbataobi kacha nso. Ụlọ akwụkwọ mepụtara na ngalaba nyocha nke Microsoft Research na ebe mmepe teknụzụ ọchụchọ (Microsoft Search Technology Center). Na omume, igwe nchọta Bing na-eji SPTAG chọpụta nsonaazụ kacha dị mkpa dabere na ọnọdụ nke ajụjụ ọchụchọ. Edere koodu ahụ na C++ na kesara site n'okpuru ikike MIT. A na-akwado iwu maka Linux na Windows. Enwere njide maka asụsụ Python.

N'agbanyeghị eziokwu na echiche nke iji vector nchekwa na search engines anọwo na-ese n'elu gburugburu ruo ogologo oge, na omume, ha mmejuputa iwu na-egbochi site elu akụ ike nke arụmọrụ na vectors na scalability adịghị ike. Ijikọta usoro mmụta igwe miri emi na ihe nchọta nchọta agbataobi dị nso emeela ka o kwe omume iweta arụmọrụ na scalability nke sistemu vector na ọkwa a na-anabata maka nnukwu ngwa nchọta. Dịka ọmụmaatụ, na Bing, maka ndenye vector nke ihe karịrị ijeri vector 150, oge ị ga-enweta nsonaazụ kacha dị mkpa bụ n'ime 8 ms.

Ọbá akwụkwọ ahụ na-agụnye ngwá ọrụ iji wuo index na ịhazi ihe nchọta vector, yana otu ngwaọrụ maka idowe usoro nchọta n'ịntanetị ekesa na-ekpuchi nnukwu mkpokọta vectors. Wepụta modul ndị a: onye na-ewu ihe nrịbama maka indexing, onye na-achọ ihe site na iji index kesara na ụyọkọ nke ọtụtụ ọnụ, ihe nkesa maka ndị na-agba ọsọ na ọnụ ọnụ, Aggregator maka ijikọta ọtụtụ sava n'ime otu, na onye ahịa maka izipu ajụjụ. A na-akwado ntinye nke vector ọhụrụ n'ime index na ihichapụ vectors na ofufe.

Ọbá akwụkwọ ahụ na-egosi na a na-ahazi data a haziri na nke ewepụtara na mkpokọta ahụ n'ụdị vector ndị metụtara ya nwere ike iji tụnyere dabere na ya. Euclidean (L2) ma ọ bụ kosin anya Ajụjụ ọchụchọ ahụ na-eweghachite vector nke anya dị n'etiti ha na vector mbụ dị ntakịrị. SPTAG na-enye ụzọ abụọ maka ịhazi oghere vector: SPTAG-KDT (K-akụkụ osisi (K)kd-osisi) na eserese agbataobi ikwu) na SPTAG-BKT (k- pụtara osisi (k-pụtara osisi na eserese agbataobi ikwu). Usoro nke mbụ chọrọ obere akụrụngwa mgbe ị na-arụ ọrụ na ndeksi, nke abụọ na-egosipụtakwa izi ezi dị elu nke nsonaazụ ọchụchọ maka nnukwu mkpokọta vectors.

N'otu oge ahụ, nchọpụta vector abụghị nanị na ederede ma nwee ike itinye ya na ozi mgbasa ozi na ihe oyiyi, yana na usoro maka ịmepụta ndụmọdụ na-akpaghị aka. Dịka ọmụmaatụ, otu n'ime ihe atụ ndị dabere na PyTorch framework mejuputa usoro vector maka ịchọ dabere na myirịta nke ihe dị na onyonyo, wuru site na iji data sitere na nchịkọta ntụaka dị iche iche nwere ihe oyiyi nke anụmanụ, nwamba na nkịta, nke gbanwere ka ọ bụrụ nhazi nke vectors. . Mgbe a na-enweta ihe oyiyi na-abata maka ọchụchọ, a na-atụgharị ya site na iji igwe mmụta ihe nlereanya ka ọ bụrụ vector, dabere na nke a na-ahọrọ vectors ndị yiri ya site na index site na iji SPTAG algorithm na ihe oyiyi ndị metụtara ya na-eweghachite ya.

isi: opennet.ru

Tinye a comment