



A 0.9B multimodal embedding model with multilingual support for 89 languages, 512x512 image resolution, and Matryoshka representations that enable dimensional flexibility from 1024 down to 64 dimensions while maintaining strong performance.
Jina-CLIP v2 is a state-of-the-art multimodal embedding model that combines text and image understanding in a single unified model. It represents a significant improvement over v1 with enhanced multilingual capabilities and higher resolution image processing.
The model combines two specialized encoders:
Even aggressive 75% dimensional reduction maintained over 99% performance across text, image, and cross-modal tasks. The model shows 3% performance improvement over v1 in both text-image and text-text retrieval tasks.
Available through Jina Embeddings API with commercial licensing. Also available on cloud marketplaces (AWS, Azure, GCP) with usage-based pricing.
Loading more......