• Home
  • Categories
  • Tags
  • Pricing
  • Submit
  1. Home
  2. Curated Resource Lists
  3. Implement two-tower retrieval for large-scale candidate generation

Implement two-tower retrieval for large-scale candidate generation

A Google Cloud reference architecture demonstrating an end-to-end two-tower retrieval system for large-scale candidate generation that uses Vertex AI and vector similarity search concepts to learn and serve semantic similarity between entities.

🌐Visit Website

About this tool

Implement two-tower retrieval for large-scale candidate generation

  • Type: Reference architecture / technical guide
  • Provider: Google Cloud
  • Category: Curated resource list / Architecture pattern
  • Topics: RAG, semantic search, large-scale recommendation, two-tower retrieval
  • URL: https://cloud.google.com/architecture/implement-two-tower-retrieval-large-scale-candidate-generation

Overview

This reference architecture describes how to design and implement an end-to-end two-tower retrieval system for large-scale candidate generation using Google Cloud, Vertex AI, and vector similarity search. It focuses on learning and serving semantic similarity between entities (for example, web queries and candidate items) in large-scale recommendation and personalization systems with low-latency serving requirements.

Primary audience: data scientists and machine learning engineers building large-scale recommenders and retrieval systems.

Key Concepts

  • Two-tower (dual encoder) model: Two separate neural networks (towers) independently encode:

    • Query tower: user/query features
    • Candidate tower: item/candidate features These towers produce embeddings that can be compared via vector similarity.
  • Candidate generation stage:

    • First stage in a two- or multi-stage recommender architecture.
    • Filters millions of items down to hundreds of relevant candidates.
    • Designed for high throughput and low latency.
  • Ranking stage:

    • Second stage that re-ranks the candidate subset.
    • Produces the final set of recommended items.
  • Semantic similarity learning:

    • Learns similarity between <query, candidate> pairs
    • Uses vector similarity search to efficiently retrieve nearest neighbors in embedding space.

Architecture

The architecture demonstrates how to:

  • Train a two-tower model on Google Cloud.
  • Deploy each tower separately for different serving and deployment tasks.
  • Use vector similarity search for large-scale retrieval from a large corpus of candidate items.

High-level components (as shown in the diagrams and description):

  • A training pipeline that learns a dual-encoder (two-tower) model.
  • A query tower service that encodes queries or user context at request time.
  • A candidate tower service that encodes items/candidates, typically offline or periodically.
  • A vector index (vector similarity search) storing candidate embeddings for large-scale approximate nearest neighbor (ANN) search.
  • A candidate generation service that retrieves top candidates from the vector index.
  • A downstream ranking component that further filters and sorts the retrieved candidates.

Products Used

The reference architecture uses the following Google Cloud products (as stated or implied):

  • Vertex AI
    • Model training for the two-tower architecture
    • Model deployment for serving each tower independently
  • Vector similarity search (Vertex AI Vector Search or equivalent Google Cloud vector indexing) for large-scale nearest neighbor retrieval in embedding space

Use Cases

  • Large-scale recommendation systems that must:
    • Serve recommendations with low latency
    • Handle millions of candidate items
    • Support two-stage or multi-stage architectures (candidate generation + ranking)
  • Personalization scenarios where you need to learn semantic similarity between:
    • Search queries and documents/items
    • Users and items
    • Any <query, candidate> entity pairs

Features

  • End-to-end reference design for a two-tower retrieval system on Google Cloud.
  • Separation of concerns:
    • Independent training and deployment of query and candidate towers.
  • Scalable candidate generation:
    • Efficiently filters a very large candidate pool to a manageable set of hundreds of items.
  • Low-latency retrieval:
    • Uses vector similarity search to support fast, high-throughput queries.
  • Semantic retrieval:
    • Embedding-based matching of queries and candidates via learned semantic similarity.
  • Integration with multi-stage recommenders:
    • Designed to feed downstream ranking models.
  • Guidance on modeling and data preparation (via linked resources):
    • Refers to “Scaling deep retrieval with TensorFlow Recommenders and Vector Search” for details on model design, problem framing, and data preparation.

Pricing

This item is a free reference architecture / documentation page. There is no direct pricing for the document itself. Any costs arise from using underlying Google Cloud services (for example, Vertex AI and vector search), which are priced separately in Google Cloud’s standard pricing documentation.

Surveys

Loading more......

Information

Websitecloud.google.com
PublishedDec 25, 2025

Categories

1 Item
Curated Resource Lists

Tags

3 Items
#RAG
#semantic search
#architectures

Similar Products

6 result(s)
DataRobot Vector Database

DataRobot Vector Database is a managed vector store capability within the DataRobot AI Platform that allows users to create, register, deploy, and update vector databases for AI workloads, including RAG and semantic search. It integrates with NVIDIA NIM embeddings and supports both built-in and bring-your-own embeddings for building production-grade vector search solutions.

Verba

Verba is a community-driven, open-source Retrieval-Augmented Generation (RAG) application that provides an end-to-end, user-friendly interface for building RAG workflows on top of a vector database, showcasing practical semantic search and retrieval patterns with Weaviate.

Mastering Multimodal RAG

A course focused on mastering multimodal Retrieval Augmented Generation (RAG) and embeddings, which are fundamental components often stored and managed by vector databases.

Awesome papers and technical blogs on vector DB

A curated collection of papers and technical blogs focused on vector databases, semantic-based vector search, and approximate nearest neighbor search (ANN Search). These resources are essential for understanding and building large-scale information retrieval systems and vector databases.

Pinecone
Featured

Pinecone is a fully managed vector database designed for high‑performance semantic search and AI applications. It provides scalable, low-latency storage and retrieval of vector embeddings, allowing developers to build semantic search, recommendation, and RAG (Retrieval-Augmented Generation) systems without managing infrastructure.

DataRobot Vector Databases
Featured

The DataRobot vector databases feature provides FAISS-based internal vector databases and connections to external vector databases such as Pinecone, Elasticsearch, and Milvus. It supports creating and configuring vector databases, adding internal and external data sources, versioning internal and connected databases, and registering and deploying vector databases within the DataRobot AI platform to power retrieval-augmented generation and other AI use cases.

Built with
Ever Works
Ever Works

Connect with us

Stay Updated

Get the latest updates and exclusive content delivered to your inbox.

Product

  • Categories
  • Tags
  • Pricing
  • Help

Clients

  • Sign In
  • Register
  • Forgot password?

Company

  • About Us
  • Admin
  • Sitemap

Resources

  • Blog
  • Submit
  • API Documentation
All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
Copyright © 2025 Acme. All rights reserved.·Terms of Service·Privacy Policy·Cookies