A Python library for generating high-quality sentence, text, and image embeddings. It simplifies the process of converting text into dense vector representations, which are fundamental for similarity search and storage in vector databases.
A Python library for creating sentence, text, and image embeddings, enabling the conversion of text into high-dimensional numerical vectors that capture semantic meaning. It is essential for tasks like semantic search and Retrieval Augmented Generation (RAG), which often leverage vector databases.
spaCy is an industrial-strength NLP library in Python that provides advanced tools for generating word, sentence, and document embeddings. These embeddings are commonly stored and searched in vector databases for NLP and semantic search applications.
A compact and efficient pre-trained sentence embedding model, widely used for generating vector representations of text. It's a popular choice for applications requiring fast and accurate semantic search, often integrated with vector databases.
A utility class from the Hugging Face Transformers library that automatically loads the correct tokenizer for a given pre-trained model. It is crucial for consistent text preprocessing and tokenization, a vital step before generating embeddings for vector database storage.
Gensim is a Python library for topic modeling and vector space modeling, providing tools to generate high-dimensional vector embeddings from text data. These embeddings can be stored and efficiently searched in vector databases, making Gensim directly relevant to vector search use cases.
pymilvus is the official Python SDK for Milvus, allowing developers to interact programmatically with the Milvus vector database. It provides utilities for transforming unstructured data into vector embeddings and supports advanced features such as reranking for optimized search results. The pymilvus[model] variant includes utilities for generating vector embeddings from text using built-in models.
Sentence Transformers (a.k.a. SBERT) is a Python module for accessing, using, and training state-of-the-art embedding and reranker models. It can be used to compute embeddings using Sentence Transformer models or to calculate similarity scores using Cross-Encoder (reranker) models. This unlocks a wide range of applications.
quantize_embeddings()semantic_search(), semantic_search_faiss(), semantic_search_usearch()community_detection()mine_hard_negatives()normalize_embeddings()paraphrase_mining()truncate_embeddings()export_dynamic_quantized_onnx_model(), export_optimized_onnx_model(), export_static_quantized_openvino_model()cos_sim(), dot_score(), euclidean_sim(), manhattan_sim(), pairwise_cos_sim(), pairwise_dot_score(), pairwise_euclidean_sim(), pairwise_manhattan_sim()