Long-term memory that actually persists — how ToRun remembers what matters
Most AI products claim to "remember you." What they usually mean is that the current conversation window has not scrolled out of range yet. Once you close the tab, the model has no idea who you are. You re-explain your s…
ToRun Team
AuthorMost AI products claim to "remember you." What they usually mean is that the current conversation window has not scrolled out of range yet. Once you close the tab, the model has no idea who you are. You re-explain your stack, your preferences, and your constraints from scratch every single time.
ToRun's long-term memory is a different mechanism entirely. This post describes what it stores, how it gets assembled into every turn, and what you can do with it.
What gets stored and why
Memory in ToRun is a set of durable facts extracted from your conversations. Think of them as structured notes — not a raw transcript dump, but discrete observations: you prefer concise answers, your project uses MongoDB, you work in TypeScript, you have a background in ML, you want code examples in a specific style.
Facts are language-agnostic. If you write in Turkish for three sessions and English for ten, the memory system does not fracture into per-locale silos. Unicode-aware normalization (NFKD decomposition, diacritic stripping, case-folding) ensures that a fact entered in one language can surface when you query in another. This matters in practice: a 29-locale platform cannot afford a memory that only works reliably for English speakers.
New facts can be added automatically as the model notices durable information, or you can assert them explicitly ("remember that I deploy to Azure"). Neither path is hidden from you.
How memory becomes context at call time
Every chat turn in ToRun assembles a three-layer context before the prompt reaches a model:
- System layer — product-level instructions, persona configuration, any active knowledge base instructions.
- Summary layer — a compressed representation of the current conversation so far, managed automatically so that long threads do not exhaust the model's context window.
- Recent layer — the most recent N turns verbatim, weighted toward the last thing you said.
Long-term memory slots into the system layer. The retrieval step runs against your stored facts using a relevance signal from the current message: if you ask about database indexing, facts about MongoDB preferences surface; if you switch to asking about design, those recede. The model sees only what is relevant to the current turn, not a flat dump of every fact you have ever generated. This keeps the context window used efficiently and avoids "memory bloat" where irrelevant personal details dilute the signal.
Context budgets are tier-dependent — Free tier gets 16K tokens of total context, scaling up through subscription plans to 1M tokens on Enterprise. When the assembled context (memory + summary + recent turns) would exceed the model's window, ToRun clamps the summary layer first, never silently dropping your most recent messages or your most relevant memory facts.
You own the data, you control the facts
Memory is user-owned. You can list everything ToRun has stored about you and delete individual facts or all of them. There is no "secret" memory that influences your conversations without your knowledge. The fact store is inspectable and forgettable on demand.
This design follows from a broader principle: ToRun should show you its real state, not hide it. Transparent memory means you can trust it. If the model uses a fact that is wrong — say it once remembered you preferred verbose explanations but now you want concise ones — you tell it to forget and the next call will not see that fact.
Deletion is permanent and immediate. There is no grace period where the old fact still surfaces.
Knowledge bases are not the same thing as memory
Memory stores facts about you. Knowledge bases ground answers in your documents.
When you upload documentation, code, or domain content to a knowledge base, ToRun uses retrieval-augmented generation (RAG) to pull in the relevant chunks at query time. The retrieved text gets inserted alongside the conversation context, and the model cites what it used. This is the right tool when your question requires reading source material — an API spec, a legal document, your internal runbook — rather than just recalling a personal preference.
The two systems compose: a single turn can draw on both your personal memory facts and a knowledge base retrieval. They feed different layers of the context assembly and serve different purposes. Memory answers "who are you and how do you prefer to work"; the knowledge base answers "what does this document say."
Using both together is what makes ToRun useful for real workflows rather than demo scenarios. Your preferences stay in memory so you do not re-state them; your source material stays in the knowledge base so answers stay grounded in evidence rather than the model's training data.