
GPTCache (Semantic Cache)
Open-source semantic caching library for LLMs that uses embedding similarity to identify and retrieve responses for similar queries, reducing API costs by up to 70% and improving response times for ChatGPT and other language models.
About this tool
Overview
GPTCache is an open-source semantic cache designed to improve the efficiency and speed of GPT-based applications by storing and retrieving responses generated by language models. Unlike traditional exact-match caching, semantic caching identifies semantically similar questions for more efficient cache hits.
The Problem
Approximately 31% of ChatGPT queries exhibit semantic similarity to previously submitted requests, revealing substantial inefficiencies in current LLM deployment strategies. The high computational and financial costs of frequent API calls present a substantial bottleneck, especially for applications handling repetitive queries.
How It Works
GPTCache employs embedding algorithms to convert queries into embeddings and uses a vector store for similarity search on these embeddings.
Architecture Components
1. Embedding Generator
- Extracts embeddings from requests for similarity search
- Generic interface supporting multiple embedding APIs
- Converts text queries to vector representations
2. Vector Store
- Finds K most similar requests from input embedding
- Supports Milvus, Zilliz Cloud, FAISS, and others
- Enables efficient similarity search
3. Cache Storage
- Stores LLM responses
- Retrieves cached responses for similar queries
- Returns to requester if good semantic match found
Key Benefits
Cost Reduction
- Up to 70% API cost savings for repetitive queries
- Reduces redundant API calls
- Notable reduction in operational costs
Performance Improvement
- Significantly faster response times
- Sub-second cache retrieval vs seconds for LLM calls
- Better user experience
Efficiency
- 31% of queries can be served from cache
- Reduces LLM provider load
- Scales better with traffic
Installation
pip install gptcache
Quick Start
from gptcache import cache
from gptcache.adapter import openai
# Initialize cache
cache.init()
# Use with OpenAI
openai.ChatCompletion.create(
model='gpt-3.5-turbo',
messages=[{
'role': 'user',
'content': 'What is semantic caching?'
}],
)
Configuration Options
Similarity Threshold
Control when cached responses are returned:
- Higher threshold: More exact matches required
- Lower threshold: More cache hits, less precision
Vector Store Selection
Choose based on scale and requirements:
- FAISS: Fast, in-memory, good for development
- Milvus: Production-ready, distributed
- Zilliz Cloud: Managed service
Embedding Models
Supports various embedding providers:
- OpenAI embeddings
- Sentence Transformers
- Custom embedding models
Use Cases
- Customer Support: Repetitive FAQ queries
- Documentation: Similar code/technical questions
- Search Applications: Common search patterns
- Chatbots: Frequently asked questions
- Development: Testing and debugging
Integration
Framework Support
- LangChain: Full integration
- LlamaIndex: Native support
- OpenAI: Direct adapter
- Custom: Flexible API
Example with LangChain
from gptcache.adapter.langchain_models import LangChainLLMs
cached_llm = LangChainLLMs(llm=your_llm)
chain = LLMChain(llm=cached_llm, prompt=prompt)
Advanced Features
- Multiple Similarity Evaluators: Combine multiple strategies
- Custom Cache Policies: LRU, LFU, TTL
- Distributed Caching: Multi-instance support
- Cache Warming: Pre-populate common queries
- Analytics: Cache hit rates and cost savings
Performance Metrics
Typical Results:
- Cache hit rate: 20-40% depending on application
- Response time: 90% faster for cache hits
- Cost reduction: 30-70% of API costs
Security & Privacy
Recent research explores privacy-aware semantic caching:
- Encryption for sensitive queries
- Privacy-preserving similarity search
- Compliance with data regulations
Related Projects
- ModelCache: Alternative by Codefuse AI
- LangChain Caching: Built-in caching support
- Redis Semantic Cache: Redis-based solution
Resources
- GitHub: https://github.com/zilliztech/GPTCache
- Documentation: https://gptcache.readthedocs.io/
- Research: Multiple academic papers on semantic caching
Pricing
Free and open-source library. Costs only for:
- Embedding API calls (if using external service)
- Vector store infrastructure (if using managed service)
Loading more......
Information
Categories
Tags
Similar Products
6 result(s)