Build an Agent with Memory
Free spec to build a memory-augmented agent — four explicit memory tiers (working, episodic, semantic, procedural), write tool the agent decides when to call, retrieval that blends recency and relevance, write-time consolidation, and a forgetting primitive. Framework-agnostic: pick the substrate in STACK.md (Postgres+pgvector, SQLite+sqlite-vss, Qdrant, or specialized stores like Mem0, Letta, or Zep). Includes typed contracts, multi-session eval, and privacy compliance.
How to use this spec
- Click any row above to open the full task — title, description, subtasks, AI instructions, the works. Same layout the product uses internally.
- Hit Copy as Prompt in the right sidebar of any task. You'll get the XML-wrapped prompt Tekk uses internally — paste it into Cursor, Claude Code, Codex, ChatGPT, or anywhere else and the agent has the full task context.
- Open in jumps the same prompt directly into v0, Lovable, Bolt, Magic Patterns, Replit, or Cursor with one click.
What you're building
Goal 1: Ship a working memory-augmented agent: four explicit memory tiers (working, episodic, semantic, procedural), a write tool the model calls deliberately, retrieval that ranks by recency AND relevance, write-time consolidation that distills old episodes into durable facts, typed conflict resolution, multi-session eval, right-to-be-forgotten, and production traces.
A memory-augmented agent pattern as a framework-agnostic starter kanban. Pick five tooling decisions in STACK.md (language / memory_substrate / embedder / model / consolidation_cadence); every later card reads your picks and runs the right variant. Acceptance bar: eval harness passes on 12 fixtures with cross-session recall >= 90%, contradiction handling correct on 6/6, retrieval p50 < 50ms in-memory or < 200ms Postgres, top-K precision >= 85%, forget(user) removes every direct + derived record in one call, and the +/-5% regression bar holds after production hardening.
Architecture
flowchart TB User[User Turn] --> Assemble[assemble_context] Procedural[(Procedural<br/>skills + system)] --> Assemble Working[(Working<br/>last N turns)] --> Assemble Semantic[(Semantic<br/>durable facts)] -->|retrieve top-K| Assemble Episodic[(Episodic<br/>past events)] -->|retrieve top-K| Assemble Assemble --> Model[Model] Model -->|remember tool call| Resolver[Conflict Resolver] Resolver -->|append| Episodic Resolver -->|overwrite or preserve| Semantic Episodic -.write-time consolidation.-> Consolidator[Consolidation Job] Consolidator -.derived facts.-> Semantic Forget[forget user] -.cascade via derived_from.-> Episodic Forget -.cascade.-> Semantic
The three load-bearing decisions: explicit memory types, not one blob (working / episodic / semantic / procedural each have their own namespace, write rules, and retrieval profile — this is the difference between a memory architecture and "we save the chat history"), write-time consolidation, not read-time (the expensive episodic-to-semantic distillation runs on the write path or a background job, never inside retrieve, so the read hot path stays fast), and recency AND relevance retrieval (score blend = α · semantic similarity + β · recency + γ · BM25, with a hard is_valid filter that drops superseded records — pure cosine similarity returns stale-but-similar matches in production).
When agent memory makes sense (and when it doesn't)
Agent memory is not free. Letta's own benchmarks show a plain filesystem scores 74% on memory tasks, beating specialized memory libraries. Reach for a deliberate memory architecture when the boundary earns the cost — otherwise ship a stateless agent and add memory after evals tell you it's worth it.
✓ Use agent memory when
- The user expects continuity across sessions. A coaching agent, a personal assistant, a long-running coding agent — the entire product promise breaks if turn 1 of session 5 doesn't remember turn 12 of session 1. Memory IS the feature.
- Personalization is a product feature, not a nice-to-have. Per-user preferences ("prefers metric units", "reviewed PR #842 last week") move the agent from generic to useful. Without a memory architecture, you're re-stuffing the entire history every turn — expensive and fragile.
- Token cost matters more than substrate cost. Replaying the full history every turn scales O(turns²). A memory layer with retrieval-on-context scales O(retrieved_k). For high-turn-count users, the substrate cost is dwarfed by the token savings.
- Compliance requires explicit forget semantics. GDPR right-to-be-forgotten, HIPAA retention windows, SOC 2 data classification — these all need
forget(user)as a first-class operation with cascade-delete on derived records. Stuffing chat history into context can't satisfy that audit. - You've shipped a stateless version and have an eval baseline. Memory failure modes (stale facts, contradictions, retrieval misses) are silent — without an eval harness, you can't tell whether your memory layer helped or just added latency. Stateless agent first, eval harness second, memory third.
⚠ Ship without memory when
- Tasks are single-session by design. A code-review agent that runs once per PR doesn't need memory of previous PRs unless you've explicitly designed for it. Don't bolt on memory because "it might be useful" — add an
escalatepath if the rare case appears. - The context window holds the whole interaction. Sub-50-turn conversations with a 200K-token model can fit raw. The retrieval-accuracy degradation curve in long context is real, but at 50 turns you're nowhere near it. Don't add an architectural layer to save tokens you weren't going to spend.
- You don't have an eval harness yet. Memory bugs are silent killers — a stale fact gets retrieved, the model trusts it, the user loses trust in the agent, and you have no way to detect the bug in production. Eval harness before memory architecture, every time.
- The user-volume × turns × tokens math doesn't justify the substrate. Postgres+pgvector and Qdrant are not free to run. If your product has 100 weekly active users with 5 turns each, the memory substrate cost exceeds the token savings. Run the math before committing.
- You haven't shipped a stateless version first. Stateless → memory-augmented is the right migration order, never the reverse. Memory adds load-bearing complexity (consolidation, conflict resolution, forgetting) — adding it before you understand the actual conversation patterns produces an architecture that fights the workload.
What the community says
How to know it's working
Ship the spec, then measure on these criteria (the eval harness task grades them):
- Cross-session recall >= 90% on 4 user-preference fixtures (session 1 establishes a preference, session N asks; model output includes the preference)
- Contradiction handling correct on 6/6 fixtures — facts overwrite (new wins, old marked superseded_by), opinions preserve (both stay live with timestamps), events always append
- Retrieval p50 latency < 50ms in-memory or < 200ms Postgres on 1000 records per user namespace
- Retrieval top-K precision >= 85% — relevant record in top-3 of retrieve results across the 3 precision fixtures
- forget(user_id, scope=ALL) removes every direct and derived record within one call (verified by post-call retrieve returning 0 results, including consolidated semantic facts derived from the user's episodes)
- Per-user namespace stays under MAX_NAMESPACE_BYTES (default 50MB) after the storage-limit enforcer runs — eviction order is importance ASC, last_accessed_at ASC
- Substrate circuit breaker opens after 3 consecutive retrieval failures in one run_id; assemble_context falls back to procedural + working memory only and emits a memory_degraded trace event
Sources
Every claim, pattern, and acceptance threshold on this page maps back to one of these. Read them before deviating from the spec.
- ↗ Mem0 — universal memory layer for AI agents mem0ai · GitHub
- ↗ Letta (formerly MemGPT) — stateful agents platform letta-ai · GitHub
- ↗ Zep — context engineering with temporal knowledge graphs getzep · GitHub
- ↗ Cognee — memory control plane for AI agents topoteretes · GitHub
- ↗ LangGraph memory concepts — short-term vs long-term, BaseStore, namespaces LangGraph Docs
- ↗ MemGPT: Towards LLMs as Operating Systems (Packer et al., 2023) arXiv
- ↗ Launching long-term memory support in LangGraph LangChain Blog
- ↗ Memory blocks — Letta's architecture Letta Blog
- ↗ State of AI Agent Memory 2026 Mem0 Blog
- ↗ The New Reality of Agent Memory: The Complete Guide (2026) Sitepoint
- ↗ langmem summarization — short-term consolidation primitive LangChain Docs
- ↗ MemGPT: Towards LLMs as Operating Systems (HN discussion) Hacker News
- ↗ AI Memory Architectures: Why MemGPT Outperformed OpenAI's Approaches (HN) Hacker News
- ↗ Ask HN: Mem0 stores memories, but doesn't learn user patterns Hacker News
Build this in your codebase tonight
Sign up — Tekk reads your repo, picks your stack from the five decisions in STACK.md, and writes a personalized version of this 10-task spec. Same architecture, your patterns, your dependencies. Want to do it yourself? Open any task above and hit Copy as Prompt — paste into Cursor, Claude Code, or Codex.