Decay in AI Memory Systems: From Ebbinghaus to Cognitive Sleep Cycles
A Survey of Forgetting Mechanisms and Their Application to Persistent Agent Memory
Sequential Thinking Session: session-1770049053765-70870
Abstract
Long-running AI agents accumulate vast amounts of information, yet biological memory systems have evolved not to maximize retention but to optimize decision-making through selective forgetting. This paper surveys decay mechanisms across psychology, neuroscience, and machine learning, demonstrating that controlled forgetting is essential for intelligent memory systems. We trace the empirical foundations from Ebbinghaus's 1885 forgetting curve through ACT-R's base-level activation equation to modern continual learning approaches. Neuroscience research reveals that sleep-dependent consolidation actively transforms memories while synaptic homeostasis prunes weak connections. Machine learning has discovered parallel benefits: forgetting acts as regularization, improving generalization while enabling privacy compliance. We synthesize these findings into the Cognitive Sleep Cycle—a biologically-inspired memory maintenance workflow implementing consolidation, decay, and synthesis phases. Our analysis establishes design principles for decay in persistent agent memory: power-law decay functions, tiered half-lives by memory type, access-based strengthening, and soft decay affecting ranking rather than deletion.
1. Introduction
1.1 The Paradox of Forgetting
Long-running AI agents face a fundamental challenge: how to maintain useful memory over time without drowning in accumulated noise. Conventional wisdom treats memory loss as a system failure, yet biological memory systems have evolved sophisticated forgetting mechanisms that actively improve cognitive function.
This paper surveys the landscape of memory decay mechanisms—from psychological foundations to neuroscience to modern AI implementations—and argues that controlled forgetting is essential for intelligent memory systems. We introduce the Cognitive Sleep Cycle as a practical implementation of biologically-inspired memory maintenance, integrating consolidation, decay, and synthesis phases.
1.2 Thesis
Effective memory systems require not just storage and retrieval, but active decay mechanisms that:
- Reduce noise by deprioritizing stale, unused content
- Create implicit pressure to validate or discard unverified information
- Maintain retrieval quality as corpus size grows
- Mirror biological memory consolidation through periodic maintenance cycles
1.3 Contributions
- A comprehensive survey of decay mechanisms across psychology, neuroscience, and AI
- A comparative analysis of decay functions: exponential vs. power law vs. tiered approaches
- Introduction of the Cognitive Sleep Cycle: a biologically-inspired memory maintenance workflow
- Design principles for decay in persistent agent memory systems
2. Background: The Psychology of Forgetting
2.1 Ebbinghaus and the Forgetting Curve
Hermann Ebbinghaus conducted the first systematic experimental study of memory from 1879-1885, published in his landmark work Über das Gedächtnis (Memory: A Contribution to Experimental Psychology). Using himself as the sole subject, Ebbinghaus memorized lists of nonsense syllables (consonant-vowel-consonant trigrams like "WID", "ZOF") to eliminate the confounding effects of prior associations.
His key innovation was the savings method: measuring how much faster relearning occurred compared to initial learning. This revealed the characteristic forgetting curve—rapid initial forgetting that gradually slows over time. Ebbinghaus's original formula was logarithmic:
where b is memory savings, t is time, and k and c are fitting constants.
Key findings from Ebbinghaus (1885):
- After 20 minutes: ~42% forgotten
- After 1 hour: ~56% forgotten
- After 24 hours: ~67% forgotten
- After 1 week: ~75% forgotten
- After 31 days: ~79% forgotten
Modern replications have validated these findings. Murre & Dros (2015) successfully replicated Ebbinghaus's curve, notably observing a "jump upward" at the 24-hour mark suggesting sleep-dependent consolidation effects invisible in shorter intervals.
2.2 The Power Law of Forgetting
While Ebbinghaus's original formulation was logarithmic, subsequent research established that forgetting follows a power law rather than exponential decay.
Wixted & Ebbesen (1991) demonstrated in "On the Form of Forgetting" that retention across diverse memory tasks follows:
where R is retention, t is time, and a and b are constants. This power function fits data from word recall, face recognition, and even pigeon matching-to-sample experiments. Critically, Ebbinghaus's own data—when reanalyzed—better fits the power law than his original logarithmic formulation.
The distinction matters for implementation: exponential decay (R = e^(-t/τ)) implies a constant half-life, while power law decay produces rapid early forgetting followed by much slower long-term decay. The power law better matches human memory, where very old memories can persist indefinitely even without rehearsal.
Wozniak (SuperMemo) argues that individual memories exhibit exponential decay when sorted by stability, but averaging across heterogeneous memories produces the observed power law. This two-component model reconciles the mathematical forms.
2.3 The Spacing Effect and Retrieval Strengthening
Ebbinghaus also discovered the spacing effect: distributed practice produces better retention than massed practice. This finding has been replicated extensively.
Cepeda et al. (2006) conducted a meta-analysis of 271 comparisons and found that spaced practice outperformed massed practice in 96% of cases. The benefit increases with longer retention intervals.
Cepeda et al. (2008) established the "10-20% Rule": the optimal inter-study interval is approximately 10-20% of the desired retention interval. For 1-year retention, study sessions should be spaced 36-73 days apart.
The testing effect (retrieval practice) compounds with spacing. Each successful retrieval strengthens the memory trace more than passive review. Bjork's New Theory of Disuse distinguishes:
- Storage strength: How well-learned something is (stable, only increases)
- Retrieval strength: How accessible it currently is (variable, decays without use)
Forgetting reflects temporary retrieval failure, not permanent storage loss. Critically, retrieval practice restores access—validating the design of access-based decay modulation in AI systems.
Jost's Law (1897) provides additional insight:
- If two memories have equal strength but different ages, repetition strengthens the older one more
- Given equal strength, older memories decay more slowly
This has implications for memory system design: established knowledge should be harder to overwrite than recent acquisitions.
3. Literature Review
3.1 ACT-R and Cognitive Architectures
ACT-R (Adaptive Control of Thought—Rational) is a cognitive architecture developed by John Anderson at Carnegie Mellon University that provides the most rigorous computational model of human memory decay. Its mathematical formulations have been validated across hundreds of experiments.
Key sources:
- Anderson & Lebiere (1998) - The Atomic Components of Thought - foundational text
- Anderson et al. (2004) - An integrated theory of the mind - Psychological Review
- Anderson (2007) - How Can the Human Mind Occur in the Physical Universe? - ACT-R 6.0
Base-Level Activation Equation
The core of ACT-R's declarative memory is the base-level activation equation:
Where:
- B_i = base-level activation of memory chunk i
- n = number of presentations/accesses (frequency effect)
- t_j = time since the j-th presentation (recency effect)
- d = decay parameter (empirically d = 0.5)
For uniformly distributed accesses, this simplifies to:
where L is the lifetime of the memory.
The 0.5 Decay Parameter
The decay parameter d = 0.5 is not arbitrary but emerically determined across extensive experimentation. When d = 0.5:
- Memory strength decays as 1/√t
- Memory contribution halves when time quadruples (not doubles)
- This produces the characteristic "rapid early forgetting, slow late forgetting" curve
| Time since access | Relative contribution |
|---|---|
| 1 second | 1.000 |
| 4 seconds | 0.500 |
| 16 seconds | 0.250 |
| 100 seconds | 0.100 |
Fisher et al. (2018) validated approximations of the base-level activation equation in Computational Brain & Behavior.
Retrieval Strength and Access Frequency
ACT-R's retrieval probability follows a logistic function:
Where τ is the retrieval threshold, A is total activation, and s is noise. When activation equals threshold, retrieval probability is 50%.
The equation captures how both frequency (more accesses add terms to the sum) and recency (recent accesses have larger t^{-d} contributions) determine retrieval success. This directly informs the design of access-based decay modulation in AI memory systems.
3.2 Sleep and Memory Consolidation
Neuroscience research has revealed that sleep is not merely passive protection of memories but an active transformation process that consolidates, integrates, and prunes memory traces.
Key sources:
- Diekelmann & Born (2010) - The memory function of sleep - Nature Reviews Neuroscience
- Rasch & Born (2013) - About Sleep's Role in Memory - Physiological Reviews
- Tononi & Cirelli (2014) - Sleep and the Price of Plasticity - Neuron
- Richards & Frankland (2017) - The Persistence and Transience of Memory - Neuron
- Walker & Stickgold (2006) - Sleep, Memory, and Plasticity - Annual Review of Psychology
Hippocampal Replay
During sleep, memories are reactivated through coordinated neural oscillations:
- Sharp-wave ripples (100-300 Hz) in hippocampus coordinate memory reactivation
- Thalamo-cortical spindles (10-15 Hz) facilitate transfer to neocortex
- Slow oscillations (~0.8 Hz) provide top-down timing coordination
- Spindle-ripple coupling: ripples nest in spindle troughs for precise memory transfer
This replay is not mere repetition—it promotes both quantitative and qualitative changes to memory representations, extracting gist and integrating with prior knowledge (schema formation).
Slow-Wave Sleep and Memory Transfer
Rasch & Born's two-stage memory model distinguishes:
- Stage 1 (Fast): Hippocampus performs rapid encoding, temporary storage
- Stage 2 (Slow): Neocortex learns gradually, providing long-term storage
The hippocampus acts as an "internal trainer" of the neocortex during sleep. This is not passive protection but active transformation—memories are restructured, generalized, and integrated with existing knowledge.
Evidence: Odor-cued reactivation during slow-wave sleep enhanced declarative memory consolidation. TMS-enhanced slow oscillations improved memory outcomes.
The Synaptic Homeostasis Hypothesis
Tononi & Cirelli (2014) proposed the Synaptic Homeostasis Hypothesis (SHY):
"Sleep is the price the brain pays for plasticity."
Core claims:
- Wake = net synaptic potentiation: Learning requires strengthening connections
- Sleep = net synaptic depression: Renormalization restores homeostasis
- Down-selection: Fittest synapses survive, weak ones are pruned
Rationale:
- Stronger synapses consume more energy and resources
- Increased synaptic strength reduces neuronal selectivity (noise)
- Wake provides "current sampling" (biased by recent experience)
- Sleep provides "comprehensive sampling" (brain's entire knowledge base)
Empirical evidence:
- GluA1-AMPAR receptor levels 30-40% higher after wake than sleep
- Cortical evoked response slopes increase with wake, decrease with sleep
- Dendritic spine density increases during wake, decreases during sleep (in adolescent mice)
- Drosophila: spines increase with enriched waking, decrease only if sleep is allowed
REM Sleep and Memory Integration
Walker & Stickgold documented REM sleep's distinct roles:
Emotional memory: REM preferentially consolidates emotionally-charged memories, correlated with right-dominant prefrontal theta power. The "sleep to forget, sleep to remember" hypothesis suggests REM strengthens memory content while decreasing emotional reactivity.
Procedural memory: Visual discrimination correlates with SWS plus late-night REM. Motor sequence learning correlates with stage 2 NREM (spindle-rich). Complex motor skills benefit from REM.
The Role of Forgetting in Memory Optimization
Richards & Frankland (2017) made a revolutionary argument:
"The goal of memory is NOT information transmission through time. The goal is to optimize decision-making."
Two benefits of forgetting:
- Enhances flexibility: Reduces influence of outdated information
- Prevents overfitting: Promotes generalization over specific episodes
Neurobiological mechanisms:
- Adult neurogenesis in hippocampus remodels circuits and overwrites old memories
- This explains high childhood forgetting (high neurogenesis rate)
- Synaptic decay through disuse
Machine learning parallel: Forgetting acts as regularization—it prevents memorization of noise and enables extraction of statistical regularities. This insight directly validates decay mechanisms in AI memory systems.
3.3 Forgetting in Machine Learning
Machine learning has grappled with forgetting from two perspectives: preventing catastrophic forgetting (harmful) and enabling intentional forgetting (beneficial).
Key sources:
- McCloskey & Cohen (1989) - Catastrophic Interference in Connectionist Networks - Psychology of Learning and Motivation
- Kirkpatrick et al. (2017) - Overcoming catastrophic forgetting - PNAS
- Bourtoule et al. (2021) - Machine Unlearning - IEEE S&P
- Yang et al. (2024) - A Comprehensive Survey of Forgetting in Deep Learning - IEEE TPAMI
- Wang et al. (2024) - A Comprehensive Survey of Continual Learning - IEEE TPAMI
Catastrophic Forgetting
McCloskey & Cohen (1989) first documented catastrophic interference: learning new tasks completely destroyed prior knowledge in backpropagation networks. In their experiment, training a network on "twos addition" completely erased its ability to perform "ones addition."
This established the stability-plasticity dilemma: networks must balance sensitivity to new information against stability of old knowledge. As they noted: "At least some interference will occur whenever new learning alters weights involved in representing old learning."
Continual Learning and Elastic Weight Consolidation
Kirkpatrick et al. (2017) introduced Elastic Weight Consolidation (EWC), inspired by synaptic consolidation in biological brains:
Mechanism: Uses the Fisher information matrix to identify important weights, then implements a soft quadratic constraint pulling weights toward old values proportional to their importance.
Storage: Three values per synapse—weight, variance, mean—mirroring biological synapses.
Result: Enables continual learning across sequential tasks without catastrophic forgetting.
Wang et al. (2024) surveyed five main continual learning approaches:
| Method | Description |
|---|---|
| Memory-based | Replay old data during new task training |
| Architecture-based | Expand network capacity for new tasks |
| Regularization-based | Constrain weight updates (EWC, SI, MAS) |
| Subspace-based | Learn in orthogonal subspaces |
| Bayesian | Probabilistic uncertainty estimation |
Intentional Forgetting and Machine Unlearning
GDPR Article 17 establishes the "right to be forgotten," creating a legal requirement for machines to unlearn specific data.
Bourtoule et al. (2021) introduced the SISA framework (Sharded, Isolated, Sliced, Aggregated):
How it works: Training data is divided into shards (separate models), then slices (with checkpoints). Unlearning requires only retraining the affected shard from the relevant checkpoint.
Performance: 4.63× speedup on Purchase dataset, 2.45× on SVHN versus full retraining.
The challenge: Model weights are not structured like databases—knowledge is embedded in distributed representations, making true "deletion" difficult. Solutions include differential privacy (to limit memorization), output filtering, and federated learning.
The Benefits of Forgetting
Yang et al. (2024) distinguished harmful forgetting (unwanted knowledge loss) from beneficial forgetting (improves performance):
Benefits of controlled forgetting:
- Regularization: Reduces overfitting to noisy examples
- Generalization: Removing noise improves test performance
- Privacy compliance: Legitimate removal of sensitive data
- Efficiency: Smaller effective training set
Their research found that some training examples are forgotten frequently while others are never forgotten. Crucially, "unforgettable" examples can be removed from training without hurting generalization—forgetting patterns generalize across neural architectures.
3.4 Temporal Knowledge Graphs
Knowledge graphs increasingly incorporate temporal dimensions to model evolving facts and relationships.
Key sources:
- Jiang et al. (2016) - Towards Time-Aware Knowledge Graph Completion - COLING
- A Survey on Temporal Knowledge Graphs (2024) - arXiv
- EAGLE (2025) - Temporal Link Prediction - VLDB
Temporal Knowledge Graph Embeddings
Translation-based methods extend classic KG embeddings with temporal information:
| Model | Year | Innovation |
|---|---|---|
| TTransE | 2018 | Concatenates temporal info to relations: score = ||h + r + τ - t|| |
| TA-TransE | 2018 | Uses LSTM to learn temporal relation sequences |
| HyTE | 2018 | Projects entities onto temporal hyperplanes |
| TE-TransR/TE-TransT | 2024 | Elevates timestamps to same significance as entities/relations |
Jiang et al. (2016) introduced the first model using both facts AND temporal information for knowledge graph completion, combining temporal order embedding with ILP consistency constraints.
Recency Weighting in Information Retrieval
Modern retrieval systems increasingly incorporate time decay:
| Framework | Contribution |
|---|---|
| EAGLE (VLDB 2025) | Adaptive weighting between short-term recency and long-term structure |
| TR-GAT | Timestamps as attentional link properties |
| GAT-TD | Recency-aware attention that downweights older events |
| Solving Freshness in RAG (2025) | Half-life scoring: score = α·cos(q,d) + (1-α)·0.5^(age/h) |
Decay Functions in Graph Systems
| Function | Formula | Use Case |
|---|---|---|
| Exponential | W = e^(-λt) | Simple time-based weighting |
| Half-life | score = α·cos(q,d) + (1-α)·0.5^(age/h) | RAG freshness ranking |
| Power Law (ACT-R) | B_i = ln(Σ t_j^(-d)) | Cognitive memory modeling |
Research consensus: Decay should affect ranking, not deletion. Content remains accessible via direct query while decayed in search results.
3.5 Memory in AI Agent Systems
Long-running AI agents require sophisticated memory architectures to maintain context beyond single sessions.
MemGPT and Virtual Context Management
MemGPT (2023) introduced virtual context management for LLM agents, treating the context window like an operating system manages virtual memory:
- Main context: Active working memory (limited by context window)
- External storage: Archival memory and conversation history
- Memory management: Self-directed paging between tiers
This enables "unbounded" memory through intelligent retrieval, though it does not explicitly model decay.
Episodic Memory in Generative Agents
Park et al. (2023) introduced generative agents with explicit memory architectures:
- Memory stream: Timestamped observations and reflections
- Retrieval: Combines recency, importance, and relevance scoring
- Reflection: Periodic synthesis of higher-level insights from memories
Their recency scoring implements exponential decay, with more recent memories scoring higher. This directly influences which memories surface during agent decision-making.
Memory Consolidation in Long-Running Agents
The emerging consensus for agent memory architectures includes:
- Tiered storage: Working memory, episodic memory, semantic memory, procedural memory
- Consolidation: Periodic transfer from episodic to semantic representations
- Decay: Time-based deprioritization with access-based strengthening
- Pruning: Removal or archival of low-value content
These patterns mirror biological memory systems, validating the cross-disciplinary foundations established earlier.
4. Comparative Analysis: Decay Functions
4.1 Exponential vs. Power Law Decay
| Aspect | Exponential Decay | Power Law Decay |
|---|---|---|
| Formula | R = e^(-t/τ) | R = a × t^(-b) |
| Half-life | Constant | Increases over time |
| Short-term | Matches observations | Matches observations |
| Long-term | Too aggressive | Better fit to data |
| Biological fit | Individual memories (by stability) | Aggregated across memories |
| Implementation | Simple, one parameter | Requires fitting |
Evidence for power law: Wixted & Ebbesen (1991) showed power law fits across word recall, face recognition, and animal studies. ACT-R's empirically-validated d=0.5 produces power-law-like behavior.
Practical compromise: Many systems use exponential decay for implementation simplicity while acknowledging power law provides better biological fit. The difference is most significant for very old memories (>6 months).
4.2 Tiered Decay by Memory Type
| Memory Type | Characteristics | Recommended Decay |
|---|---|---|
| Episodic (what happened) | Task-bound, context-specific | Fast (7-14d grace, 30d half-life) |
| Semantic (facts, concepts) | Abstracted, context-independent | Moderate (14d grace, 30d half-life) |
| Procedural (skills, how-to) | Automatized, resistant to forgetting | Slow (30d+ grace, 90d+ half-life) |
| Working (current task context) | Transient by design | Very fast (24h) or session-bound |
Neurobiological basis: Procedural memories rely on cerebellum and basal ganglia—separate from hippocampal-neocortical declarative memory system. Skills persist even in amnesia patients who cannot form new episodic memories.
Implementation guidance: Jeff_Homelab (Moltbook community): "Procedural memory (How-To) needs infinite half-life. Episodic memory (What-Happened) needs 30-day decay."
4.3 Access-Based Decay Modulation
Bjork's retrieval strength theory and ACT-R's base-level activation both support access-based strengthening:
- Each retrieval adds a new term to the activation sum
- Recent retrievals contribute more than old ones (t^{-d} weighting)
- Memories can recover from low accessibility through successful retrieval
Design principle: Only deliberate access should reset decay clocks. Passive appearance in search results, automated enrichment, or background indexing should NOT strengthen memories—this would grant "accidental immortality" to content that merely co-occurs with active queries.
Implementation:
- ReadMemory (explicit access) → resets decay clock
- SearchMemories (appearing in results) → no effect
- BuildContext (automated traversal) → no effect
5. Novel Contribution: The Cognitive Sleep Cycle
5.1 Biological Inspiration
The Cognitive Sleep Cycle mirrors mammalian sleep architecture, implementing distinct phases for memory maintenance:
5.2 Implementation: maenifold's Decay Architecture
Tiered Grace Periods
Content decays differently based on cognitive type:
| Path | Grace Period | Half-Life | Rationale |
|---|---|---|---|
thinking/sequential/ | 7 days | 30 days | Episodic task-bound reasoning |
thinking/workflows/ | 14 days | 30 days | Procedural multi-step processes |
| Other memory | 14 days | 30 days | General semantic knowledge |
Access Boosting
Only deliberate access (ReadMemory) resets the decay clock:
| Tool | Updates last_accessed? | Rationale |
|---|---|---|
| ReadMemory | Yes | Explicit, intentional access |
| SearchMemories | No | Appearing in results ≠ being read |
| BuildContext | No | Automated enrichment would grant accidental immortality |
Principle: Access boosting rewards deliberate use, not passive appearance.
Assumption Decay by Epistemic Status
| Status | Decay Behavior | Rationale |
|---|---|---|
validated | No decay | Confirmed knowledge; treat as permanent |
active | 14d grace, 30d half-life | Pressure to validate |
refined | 14d grace, 30d half-life | Superseded; should fade |
invalidated | 7d grace, 14d half-life | Historical record; aggressive decay |
Principle: Epistemic hygiene through decay. Unvalidated assumptions naturally lose priority.
5.3 The ACT-R Connection
maenifold's decay parameters align with ACT-R cognitive architecture:
- 30-day half-life: Empirically validated in ACT-R literature (d=0.5 produces similar curves)
- Access frequency boosting: Mirrors base-level activation strengthening
- Soft decay (ranking only): Preserves provenance while improving retrieval signal
Decay Weight Calculation
This exponential decay approximates ACT-R's power-law base-level activation for practical implementation.
6. Discussion
6.1 Forgetting as Feature, Not Bug
The convergent evidence from psychology, neuroscience, and machine learning establishes that controlled forgetting is essential for intelligent systems:
- Psychology: Ebbinghaus's curve is not a design flaw but reflects optimal resource allocation
- Neuroscience: Sleep exists partly to downscale synapses and prune weak connections (SHY)
- Machine learning: Forgetting acts as regularization, improving generalization
- Cognitive science: Richards & Frankland's insight that memory optimizes decision-making, not information transmission
For AI memory systems, this means decay is not a limitation to overcome but a feature to implement deliberately.
6.2 The Consolidation Imperative
Long-running agents must transfer valuable episodic experience into durable semantic knowledge:
- Episodic memories are rich but noisy—task-bound context that loses relevance
- Semantic memories are abstracted and generalized—the "gist" that persists
- Consolidation performs this transfer through deliberate reflection and linking
The Cognitive Sleep Cycle's Phase 2 (Consolidation) implements this: identifying high-value episodic content and distilling it into concept-linked semantic notes. Without consolidation, agents either lose valuable experience (aggressive decay) or drown in accumulated episodes (no decay).
6.3 Epistemic Pressure Through Decay
A novel application of decay is epistemic hygiene: creating implicit pressure to validate assumptions.
Unvalidated assumptions (status: "active") face normal decay. This creates a natural pressure:
- Validate the assumption → it becomes permanent
- Ignore the assumption → it fades from prominence
- Invalidate the assumption → it decays aggressively but remains for audit
This mirrors Bjork's retrieval strength theory: assumptions must be actively accessed/validated to maintain accessibility, but their underlying storage (for provenance) remains intact.
6.4 Limitations and Future Work
Current limitations and future research directions:
-
Per-file decay configuration: Currently all files in a tier share decay rates. Future work: individual files could specify custom half-lives based on content type.
-
Cluster-based decay coherence: Related content should decay together. If one note in a concept cluster is accessed, semantically-related notes might receive partial access credit.
-
Adaptive decay parameters: Self-tuning systems that adjust half-lives based on observed access patterns—similar to SuperMemo's EF optimization.
-
Integration with continual learning: Combining decay-based memory maintenance with EWC-style weight consolidation for embedded knowledge.
7. Conclusion
Effective AI memory systems must embrace forgetting as a core capability, not a limitation to overcome. The biological brain's sophisticated memory maintenance—from Ebbinghaus's forgetting curves to sleep-dependent consolidation—provides a blueprint for artificial systems.
The Cognitive Sleep Cycle implements these principles:
- Consolidation transfers valuable episodic experience to durable semantic knowledge
- Decay reduces noise by deprioritizing stale content (without deletion)
- Access boosting rewards deliberate use, creating natural relevance signals
- Epistemic hygiene through assumption decay creates pressure to validate beliefs
The evidence converges: from Ebbinghaus (1885) through ACT-R (1998) to Richards & Frankland (2017), forgetting is not failure but optimization. Memory systems that embrace controlled decay will outperform those that merely accumulate.
References
Psychology and Cognitive Science
-
Ebbinghaus, H. (1885/1913). Memory: A Contribution to Experimental Psychology. New York: Teachers College, Columbia University. Classics in the History of Psychology
-
Wixted, J. T., & Ebbesen, E. B. (1991). On the form of forgetting. Psychological Science, 2(6), 409-415. Springer
-
Murre, J. M., & Dros, J. (2015). Replication and analysis of Ebbinghaus' forgetting curve. PLoS ONE, 10(7), e0120644. PLOS ONE
-
Cepeda, N. J., et al. (2006). Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychological Bulletin, 132(3), 354-380. APA
-
Cepeda, N. J., et al. (2008). Spacing effects in learning: A temporal ridgeline of optimal retention. Psychological Science, 19(11), 1095-1102. SAGE
-
Anderson, J. R., & Lebiere, C. (1998). The Atomic Components of Thought. Mahwah, NJ: Lawrence Erlbaum. ACT-R
-
Anderson, J. R., et al. (2004). An integrated theory of the mind. Psychological Review, 111(4), 1036-1060. APA
-
Anderson, J. R. (2007). How Can the Human Mind Occur in the Physical Universe? Oxford University Press. MIT Press
-
Fisher, C. R., et al. (2018). A comparison of approximations for base-level activation in ACT-R. Computational Brain & Behavior, 1, 228-236. Springer
-
Wixted, J. T. (2004). On common ground: Jost's (1897) law of forgetting and Ribot's (1881) law of retrograde amnesia. Psychological Review, 111(4), 864-879.
Neuroscience
-
Diekelmann, S., & Born, J. (2010). The memory function of sleep. Nature Reviews Neuroscience, 11(2), 114-126. PubMed
-
Rasch, B., & Born, J. (2013). About sleep's role in memory. Physiological Reviews, 93(2), 681-766. PubMed
-
Tononi, G., & Cirelli, C. (2014). Sleep and the price of plasticity: From synaptic and cellular homeostasis to memory consolidation and integration. Neuron, 81(1), 12-34. PMC
-
Richards, B. A., & Frankland, P. W. (2017). The persistence and transience of memory. Neuron, 94(6), 1071-1084. PubMed
-
Walker, M. P., & Stickgold, R. (2006). Sleep, memory, and plasticity. Annual Review of Psychology, 57, 139-166. PDF
Machine Learning
-
McCloskey, M., & Cohen, N. J. (1989). Catastrophic interference in connectionist networks: The sequential learning problem. Psychology of Learning and Motivation, 24, 109-165. ScienceDirect
-
Kirkpatrick, J., et al. (2017). Overcoming catastrophic forgetting in neural networks. PNAS, 114(13), 3521-3526. PNAS
-
Bourtoule, L., et al. (2021). Machine unlearning. IEEE Symposium on Security and Privacy, 141-159. IEEE
-
Yang, E., et al. (2024). A comprehensive survey of forgetting in deep learning beyond continual learning. IEEE TPAMI. GitHub
-
Wang, L., et al. (2024). A comprehensive survey of continual learning: Theory, method and application. IEEE TPAMI, 46(8), 5362-5383. arXiv
Knowledge Graphs and Retrieval
-
Jiang, T., et al. (2016). Towards time-aware knowledge graph completion. COLING 2016, 1715-1724. ACL Anthology
-
A Survey on Temporal Knowledge Graph (2024). arXiv:2403.04782. arXiv
-
EAGLE: Temporal Link Prediction (2025). VLDB 2025. arXiv
-
Solving Freshness in RAG (2025). arXiv:2509.19376. arXiv
AI Agent Systems
-
Packer, C., et al. (2023). MemGPT: Towards LLMs as operating systems. arXiv:2310.08560. arXiv
-
Park, J. S., et al. (2023). Generative agents: Interactive simulacra of human behavior. UIST 2023. arXiv
Spaced Repetition Systems
-
Pimsleur, P. (1967). A memory schedule. Modern Language Journal, 51(2), 73-75.
-
Leitner, S. (1972). So lernt man lernen. Freiburg: Herder.
-
Wozniak, P. A. (1990). Optimization of repetition spacing in the practice of learning. SuperMemo
Appendix A: maenifold Sleep Cycle Workflow
The complete workflow specification is available at:
/assets/workflows/memory-cycle.json
This workflow triggers periodically to perform cognitive maintenance, implementing the phases described in Section 5.