



State-of-the-art multimodal embedding model from BAAI supporting text-to-image, image-to-text, and compositional visual search. Trained on the MegaPairs dataset with over 26 million retrieval triplets.
Loading more......
BGE-VL is a State-Of-The-Art multimodal embedding model that supports any visual search applications, including text-to-image, image-to-text, image&prompt-to-image, text-to-image&text, and more. Released by BAAI in March 2025.
Based on the MegaPairs dataset, the BAAI BGE team trained three multi-modal retrieval models:
BAAI released MegaPairs, a massive synthetic dataset containing over 26 million multimodal retrieval instruction-tuning triplets that powers BGE-VL.
Released under MIT license - completely free for both academic and commercial use.
Models and documentation available on GitHub, Hugging Face, and the official BGE documentation site.