LongMemEval

Comprehensive benchmark for evaluating long-term memory in chat assistants with 500 manual questions testing information extraction, multi-session reasoning, and temporal reasoning across 115K-1.5M tokens.

Visit Website

Surveys

Loading more......

Information

Websitegithub.com

PublishedMar 24, 2026

Tags

3 Items

#benchmark #agent-memory #evaluation

Similar Products

MTEB Leaderboard

Massive Text Embedding Benchmark leaderboard covering 58 datasets across 112 languages and 8 embedding tasks. Industry-standard benchmark for comparing text embedding models.

000

BEIR

BEIR (Benchmarking IR) is a benchmark suite for evaluating information retrieval and vector search systems across multiple tasks and datasets. Useful for comparing vector database performance.

000

MMTEB

Massive Multilingual Text Embedding Benchmark covering over 500 quality-controlled evaluation tasks across 250+ languages, representing the largest multilingual collection of embedding model evaluation tasks.

000

SISAP Indexing Challenge

An annual competition focused on similarity search and indexing algorithms, including approximate nearest neighbor methods and high-dimensional vector indexing, providing benchmarks and results relevant to vector database research.

000

IntelLabs's Vector Search Datasets

A collection of datasets curated by Intel Labs specifically for evaluating and benchmarking vector search algorithms and databases.

000

BigANN Benchmarks

Main competition for large-scale vector database algorithms held at NeurIPS conferences. Framework for evaluating approximate nearest neighbor search algorithms on billion-scale datasets with standardized metrics and datasets.

000

LongMemEval

Information

Categories

Tags

Similar Products

Connect with us

Stay Updated

Product

Clients

Company

Resources

LongMemEval

Information

Categories

Tags

Similar Products

Overview

Benchmark Characteristics

Recent 2026 Performance Breakthroughs

Pricing