Microsoft yakavhurika yakavhura vector yekutsvaga raibhurari inoshandiswa muBing

Microsoft Company yakabudiswa muchina kudzidza raibhurari source codes SPTAG (Space Partition Tree Uye Girafu) nekuitwa kweiyo algorithm yekufungidzira tsvaga muvakidzani wepedyo. Library developed muchikamu chekutsvagisa cheMicrosoft Research uye yekutsvaga tekinoroji yekuvandudza nzvimbo (Microsoft Search Technology Center). Mukuita, SPTAG inoshandiswa neBing yekutsvaga injini kuona iyo inonyanya kukosha mhinduro zvichienderana nemamiriro ekutsvaga mibvunzo. Iyo kodhi yakanyorwa muC ++ uye inoparadzirwa ne pasi peMIT rezinesi. Kuvaka yeLinux uye Windows inotsigirwa. Pane chinosungirwa mutauro wePython.

Zvisinei nekuti iyo pfungwa yekushandisa vector chengetedzo mumainjini ekutsvaga yave ichitenderera kwenguva yakareba, mukuita, kuita kwavo kunokanganiswa nehukuru hwekushandisa kusimba kwekushanda nemavekita uye scalability zvisingakwanisi. Kubatanidza nzira dzekudzidzira dzemuchina wakadzika pamwe neanosvika pedyo nemuvakidzani kutsvaga algorithms kwaita kuti zvikwanise kuunza kuita uye scalability yevector masisitimu kusvika padanho rinogamuchirwa kune makuru ekutsvaga injini. Semuenzaniso, muBing, kune vector index inopfuura mabhiriyoni zana nemakumi mashanu emagetsi, nguva yekutora zvakanyanya mhedzisiro iri mukati me150 ms.

Raibhurari yacho inosanganisira maturusi ekuvaka index uye kuronga mavheti ekutsvaga, pamwe neseti yezvishandiso zvekuchengetedza yakagoverwa online yekutsvaga system inovhara yakakura kwazvo kuunganidzwa kwemavheji. Yakapihwa ma modules anotevera: index builder ye indexing, muongorori wekutsvaga uchishandisa index yakagoverwa musumbu remanodhi akati wandei, sevha yekumhanyisa vabati pamanodhi, Aggregator yekubatanidza maseva akati wandei kuita imwe, uye mutengi wekutumira mibvunzo. Kuiswa kwemavectors matsva muindex uye kubviswa kwemavheji panhunzi kunotsigirwa.

Iyo raibhurari inoreva kuti iyo data yakagadziriswa uye yakaunzwa muunganidzwa inoumbwa nenzira yeakabatana mavheti anogona kuenzaniswa zvichibva pane. Euclidean (L2) kana cosine kureba Mubvunzo wekutsvaga unodzosa mavector ane chinhambwe pakati pawo neiyo yekutanga vector ishoma. SPTAG inopa nzira mbiri dzekuronga vector nzvimbo: SPTAG-KDT (K-dimensional muti (kd-muti) uye hama yenharaunda girafu) uye SPTAG-BKT (k-zvinoreva muti (k-zvinoreva muti uye girafu yemunharaunda). Nzira yekutanga inoda zvishoma zviwanikwa paunenge uchishanda nendekisi, uye yechipiri inoratidza kururamisa kwepamusoro kwemhedzisiro yekutsvaga kune yakakura kwazvo kuunganidzwa kwevectors.

Panguva imwecheteyo, kutsvaga kwevector hakugumiri pane zvinyorwa uye kunogona kuiswa kune multimedia ruzivo nemifananidzo, pamwe nemasisitimu ekugadzira otomatiki kurudziro. Semuyenzaniso, imwe yemaprototypes yakavakirwa paPyTorch framework yakaisa vector system yekutsvaga zvichienderana nekufanana kwezvinhu mumifananidzo, yakavakwa uchishandisa data kubva kune akati wandei mareferensi akaunganidzwa ane mifananidzo yemhuka, katsi nembwa, izvo zvakashandurwa kuita seti yemavheji. . Kana mufananidzo unouya uchigamuchirwa pakutsvaga, unoshandurwa uchishandiswa muchina wekudzidza muchina muvector, zvichibva pakuti mavectors akafanana anosarudzwa kubva kune index achishandisa SPTAG algorithm uye mifananidzo yakabatanidzwa inodzorerwa semugumisiro.

Source: opennet.ru

Voeg