Browse all tags in our directory
FastText is an open-source library by Facebook for efficient learning of word representations and text classification. It generates high-dimensional vector embeddings used in vector databases for tasks like semantic search and document clustering.
Gensim is a Python library for topic modeling and vector space modeling, providing tools to generate high-dimensional vector embeddings from text data. These embeddings can be stored and efficiently searched in vector databases, making Gensim directly relevant to vector search use cases.
GloVe is a widely used method for generating word embeddings using co-occurrence statistics from text corpora. These embeddings are commonly used as input to vector databases for semantic search and other vector-based information retrieval tasks.
pymilvus is the official Python SDK for Milvus, allowing developers to interact programmatically with the Milvus vector database. It provides utilities for transforming unstructured data into vector embeddings and supports advanced features such as reranking for optimized search results. The pymilvus[model] variant includes utilities for generating vector embeddings from text using built-in models.
spaCy is an industrial-strength NLP library in Python that provides advanced tools for generating word, sentence, and document embeddings. These embeddings are commonly stored and searched in vector databases for NLP and semantic search applications.
Examples and resources for Weaviate, a popular open-source vector database optimized for storing and searching vector embeddings at scale.
Word2vec is a popular machine learning technique for generating vector embeddings based on the distributional properties of words in large corpora. It is directly relevant to vector databases as it produces the high-dimensional vector representations stored and indexed by these databases for vector search and similarity tasks.