• Home
  • Categories
  • Pricing
  • Submit
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies
    Decorative pattern
    Decorative pattern
    1. Home
    2. Llm Frameworks
    3. Mastra

    Mastra

    AI agent framework featuring Observational Memory that achieves 95% on LongMemEval with 5-40x compression and stable, reproducible context windows.

    Overview

    Observational Memory (OM) is a memory system developed by Mastra that achieves 94.87% on LongMemEval with GPT-5-mini (the highest score ever recorded on this benchmark) and 84.23% with GPT-4o, beating the previous state-of-the-art.

    How It Works

    The architecture uses two background agents (Observer and Reflector) that watch conversations and maintain a dense text-only observation log that replaces raw message history as it grows. The context window is broken into two blocks: the first is the list of observations, and the second is raw messages that haven't yet been compressed.

    When messages hit 30k tokens (configurable threshold), a separate "observer agent" compresses messages into new observations that are appended to the first block. When observations hit 40k tokens (also configurable), a separate "reflector agent" garbage collects observations that don't matter.

    Key Features

    • The compression is typically 5–40×
    • Completely stable context window that's predictable, reproducible, and prompt-cacheable across many agent/user turns
    • Completely open source end to end
    • Uses formatted text rather than structured objects, as it's easier to use, optimized for LLMs, and far easier to debug

    Announcement and Availability

    The system was announced in February 2026 and is available for use with Mastra's agent framework.

    Pricing

    Open-source framework.

    Surveys

    Loading more......

    Information

    Websitemastra.ai
    PublishedMar 24, 2026

    Categories

    1 Item
    Llm Frameworks

    Tags

    3 Items
    #agent-framework#observational-memory#compression

    Similar Products

    6 result(s)

    CommVQ

    A commutative vector quantization method for KV cache compression that reduces FP16 cache size by 87.5% with 2-bit quantization and enables 1-bit quantization, allowing LLaMA-3.1 8B to run with 128K context on a single RTX 4090 GPU.

    Featured

    Residual Quantization with Implicit Neural Codebooks

    ICML 2024 paper presenting a novel residual quantization approach using implicit neural codebooks for vector compression in high-dimensional similarity search, replacing traditional fixed codebooks with learned representations.

    ConstBERT

    Novel approach to reduce storage footprint of multi-vector retrieval by encoding each document with a fixed, smaller set of learned embeddings. Reduces index sizes by over 50% compared to ColBERT while retaining most effectiveness.

    Binary Quantization for Vector Search

    Compression technique that converts full-precision vectors to binary representations, achieving 32x storage reduction while maintaining 90-95% recall for efficient large-scale vector search.

    Compression Ratio Optimization

    Techniques for optimizing the trade-off between memory usage and accuracy in vector quantization, achieving 5-40x compression in systems like Mastra's Observational Memory.

    Observer-Reflector Architecture

    Memory system architecture used in Mastra's Observational Memory with two background agents that compress and garbage collect conversation history achieving 5-40x compression.