gte-Qwen2-1.5B-instruct

A state-of-the-art multilingual text embedding model from Alibaba's GTE (General Text Embedding) series, built on the Qwen2-1.5B LLM. The model supports up to 8192 tokens and incorporates bidirectional attention mechanisms for enhanced contextual understanding across diverse domains.

Visit Website

Overview

gte-Qwen2-1.5B-instruct is the latest model in the GTE (General Text Embedding) model family from Alibaba, built on the Qwen2-1.5B LLM architecture. The model uses the same training data and strategies as the larger gte-Qwen2-7B-instruct model while maintaining a more compact size.

Key Features

Bidirectional Attention: Integration of bidirectional attention mechanisms enriches contextual understanding
Multilingual Support: Comprehensive training across a vast, multilingual text corpus spanning diverse domains and scenarios
Long Context: Maximum sequence length of 8192 tokens
Advanced Training: Leverages both weakly supervised and supervised data for robust performance

Model Performance

The larger gte-Qwen2-7B-instruct model achieved a score of 70.24 on the MTEB benchmark, outperforming:

NV-Embed-v1 (69.32)
gte-Qwen1.5-7B-instruct (67.34)

Availability

The GTE series models are available:

On Hugging Face for open-source use
As commercial API services on Alibaba Cloud (text-embedding-v1/v2/v3)
Compatible with Sentence Transformers framework

Use Cases

Multilingual semantic search
Cross-lingual information retrieval
RAG (Retrieval-Augmented Generation) applications
Document clustering and classification
Embedding generation for vector databases

Model Variants

The GTE-Qwen2 series includes:

gte-Qwen2-1.5B-instruct (1.5 billion parameters)
gte-Qwen2-7B-instruct (7 billion parameters)

Technical Details

Developed by Tongyi Lab of Alibaba Group, last updated January 21, 2025. The model represents the state-of-the-art in multilingual embedding models for 2026.

Surveys

Loading more......

Information

Websitehuggingface.co

PublishedMar 20, 2026

Tags

4 Items

#embeddings #multilingual #instruction-based #open-source

Similar Products

Qwen3 Embedding

Multilingual embedding model supporting over 100 languages and ranking #1 on MTEB multilingual leaderboard. Offers flexible model sizes from 0.6B to 8B parameters with user-defined instructions.

000

BGE-M3

A versatile multilingual text embedding model from BAAI that supports 100+ languages and can handle inputs up to 8192 tokens. BGE-M3 is unique in supporting three retrieval methods simultaneously: dense retrieval, multi-vector retrieval, and sparse retrieval.

000

gte-Qwen2-7B-instruct

A large-scale multilingual text embedding model from Alibaba's GTE series with 7 billion parameters. Built on Qwen2-7B, it achieved a score of 70.24 on MTEB, outperforming NV-Embed-v1 and supporting 100+ languages with up to 8192 token context.

000

INSTRUCTOR

A task-specific text embedding model that generates customized embeddings based on natural language instructions. INSTRUCTOR achieves state-of-the-art performance on 70 diverse embedding tasks by allowing users to specify the task objective and domain.

000

Snowflake Arctic Embed

Suite of high-quality multilingual text embedding models optimized for retrieval performance, developed by Snowflake and available as open-source for commercial use.

000

E5-Mistral-7B-Instruct

Open-source embeddings model from Microsoft initialized from Mistral-7B-v0.1, achieving state-of-the-art BEIR score of 56.9 for English text embedding and retrieval tasks with 4096-dimensional vectors.

000

Overview

Key Features

Bidirectional Attention: Integration of bidirectional attention mechanisms enriches contextual understanding
Multilingual Support: Comprehensive training across a vast, multilingual text corpus spanning diverse domains and scenarios
Long Context: Maximum sequence length of 8192 tokens
Advanced Training: Leverages both weakly supervised and supervised data for robust performance

Model Performance

The larger gte-Qwen2-7B-instruct model achieved a score of 70.24 on the MTEB benchmark, outperforming:

NV-Embed-v1 (69.32)
gte-Qwen1.5-7B-instruct (67.34)

Availability

The GTE series models are available:

On Hugging Face for open-source use
As commercial API services on Alibaba Cloud (text-embedding-v1/v2/v3)
Compatible with Sentence Transformers framework

Use Cases

Multilingual semantic search
Cross-lingual information retrieval
RAG (Retrieval-Augmented Generation) applications
Document clustering and classification
Embedding generation for vector databases

Model Variants

The GTE-Qwen2 series includes:

gte-Qwen2-1.5B-instruct (1.5 billion parameters)
gte-Qwen2-7B-instruct (7 billion parameters)

Technical Details

Developed by Tongyi Lab of Alibaba Group, last updated January 21, 2025. The model represents the state-of-the-art in multilingual embedding models for 2026.

gte-Qwen2-1.5B-instruct

Overview

Key Features

Model Performance

Availability

Use Cases

Model Variants

Technical Details

Information

Categories

Tags

Similar Products

Connect with us

Stay Updated

Product

Clients

Company

Resources

gte-Qwen2-1.5B-instruct

Overview

Key Features

Model Performance

Availability

Use Cases

Model Variants

Technical Details

Information

Categories

Tags

Similar Products