Microsoft company
Despite the fact that the ideas of using vector storage in search engines have been floating around for a long time, in practice their implementation is hindered by the high resource intensity of operations with vectors and limitations in scalability. The combination of deep machine learning methods with approximate nearest neighbor search algorithms has made it possible to bring the performance and scalability of vector systems to a level acceptable for large search engines. For example, in Bing, for a vector index of more than 150 billion vectors, the time to fetch the most relevant results is 8ms.
The library includes tools for building an index and organizing a search for vectors, as well as a set of tools for maintaining a distributed online search system covering very large collections of vectors.
The library assumes that the data processed and presented in the collection is presented in the form of related vectors that can be compared based on
At the same time, vector search is not limited to text and can be applied to multimedia information and images, as well as for automatic recommendation generation systems. For example, in one of the prototypes based on the PyTorch framework, a vector system for searching based on the similarity of objects in images was implemented, built using data from several reference collections with images of animals, cats and dogs, which were converted into sets of vectors. When an incoming image is received for search, it is converted using a machine learning model into a vector, based on which, using the SPTAG algorithm, the most similar vectors are selected from the index and the associated images are returned as a result.
Source: opennet.ru