GPTCache (Semantic Cache)

Open-source semantic caching library for LLMs that uses embedding similarity to identify and retrieve responses for similar queries, reducing API costs by up to 70% and improving response times for ChatGPT and other language models.

Visit Website

Surveys

Loading more......

Information

Websitegithub.com

PublishedMar 14, 2026

Tags

3 Items

#caching #cost-optimization #performance

Similar Products

Semantic Caching

A caching technique that uses vector embeddings to identify and reuse responses for semantically similar queries, reducing LLM costs and latency. Unlike traditional caches based on exact matches, semantic caching achieves cache hit ratios of up to 92% by matching queries based on semantic similarity.

000

LLM Caching for Vector Search

Caching strategies for LLM and vector search systems including semantic caching, embedding caching, and response caching to reduce costs and improve latency in RAG applications.

000

Amazon ElastiCache Vector Search

Vector search extension for Amazon ElastiCache for Redis, featuring HNSW indexing for k-NN similarity, hybrid lexical+vector search with BM25 fusion capabilities. Used for enterprise semantic caching, real-time recommendations, and RAG applications. Integrated Redis module offers sub-microsecond latency vs standalone like Weaviate, optimized for hot data workloads.

000

Amazon S3 Vector Search

Leveraging Amazon S3 as a storage layer for vector databases, enabling 70-95% cost reduction for certain use cases. S3's low storage costs make it attractive for large-scale vector datasets with appropriate access patterns.

000

Redis LangCache

Redis as vector database via RediSearch module supports HNSW/Flat indexes for real-time vector search in key-value store. Features: sub-ms latency, JSON payloads, modules ecosystem; use cases: caching + search hybrids. Vs dedicated VDBs, Redis excels in low-latency but limited scale for pure vectors.

000

Qdrant Vector Search Benchmarks

Open-source comparative benchmarks evaluating vector search performance of engines like Qdrant, Elasticsearch, Milvus, Redis, and Weaviate. Covers single-node upload/search, filtered search across various datasets and configurations, focusing on RPS, latency, precision, and indexing time using affordable hardware.

000

GPTCache (Semantic Cache)

Information

Categories

Tags

Similar Products

Connect with us

Stay Updated

Product

Clients

Company

Resources

GPTCache (Semantic Cache)

Information

Categories

Tags

Similar Products

Overview

The Problem

How It Works

Architecture Components

Key Benefits

Cost Reduction

Performance Improvement

Efficiency

Installation

Quick Start

Configuration Options

Similarity Threshold

Vector Store Selection

Embedding Models

Use Cases

Integration

Framework Support

Example with LangChain

Advanced Features

Performance Metrics

Security & Privacy

Related Projects

Resources

Pricing