



A Google Cloud reference architecture demonstrating an end-to-end two-tower retrieval system for large-scale candidate generation that uses Vertex AI and vector similarity search concepts to learn and serve semantic similarity between entities.
This reference architecture describes how to design and implement an end-to-end two-tower retrieval system for large-scale candidate generation using Google Cloud, Vertex AI, and vector similarity search. It focuses on learning and serving semantic similarity between entities (for example, web queries and candidate items) in large-scale recommendation and personalization systems with low-latency serving requirements.
Primary audience: data scientists and machine learning engineers building large-scale recommenders and retrieval systems.
Two-tower (dual encoder) model: Two separate neural networks (towers) independently encode:
Candidate generation stage:
Ranking stage:
Semantic similarity learning:
<query, candidate> pairsThe architecture demonstrates how to:
High-level components (as shown in the diagrams and description):
Loading more......
The reference architecture uses the following Google Cloud products (as stated or implied):
<query, candidate> entity pairsThis item is a free reference architecture / documentation page. There is no direct pricing for the document itself. Any costs arise from using underlying Google Cloud services (for example, Vertex AI and vector search), which are priced separately in Google Cloud’s standard pricing documentation.