Multi-Tier Memory System Technical Documentation#
Overview#
The Multi-Tier Memory System is a sophisticated cognitive-inspired memory architecture that enables Marcus to learn from past experiences, predict task outcomes, and optimize agent-task assignments. The system models itself after human memory structures with four distinct tiers: Working Memory, Episodic Memory, Semantic Memory, and Procedural Memory.
Architecture#
Core Components#
Base Memory System (
memory.py)Implements the foundational four-tier memory architecture
Handles task outcome recording and basic predictions
Manages agent performance profiling
Provides cascade effect analysis for project dependencies
Advanced Memory System (
memory_advanced.py)Extends base system with enhanced prediction capabilities
Implements confidence intervals and complexity adjustments
Adds time-based relevance weighting
Provides risk factor analysis with mitigation suggestions
Memory Tiers#
1. Working Memory (Volatile, Current State)#
self.working = {
"active_tasks": {}, # agent_id -> current task
"recent_events": [], # last N events
"system_state": {}, # current system metrics
}
# Note: "all_tasks" is added dynamically via update_project_tasks() method
Maintains real-time state of active operations
Tracks which agents are working on what tasks
Stores recent events for immediate context
Project tasks added via
update_project_tasks()for dependency analysis
2. Episodic Memory (Task Execution History)#
self.episodic = {
"outcomes": [], # List of TaskOutcome objects
"timeline": defaultdict(list), # date -> events
}
Records specific task execution outcomes
Maintains chronological timeline of events
Preserves detailed context of each task execution
Enables pattern recognition across similar experiences
3. Semantic Memory (Learned Facts)#
self.semantic = {
"agent_profiles": {}, # agent_id -> AgentProfile
"task_patterns": {}, # pattern_id -> TaskPattern
"success_factors": {}, # factor -> impact
}
Stores extracted knowledge and patterns
Maintains agent capability profiles
Identifies task type patterns and success factors
Builds knowledge base from experience
4. Procedural Memory (Workflows and Strategies)#
self.procedural = {
"workflows": {}, # workflow_id -> steps
"strategies": {}, # situation -> strategy
"optimizations": {}, # pattern -> optimization
}
Captures learned workflows and best practices
Stores situation-specific strategies
Maintains optimization patterns
Integration with Marcus Ecosystem#
Event System Integration#
The Memory system publishes events through the Marcus Events system:
TASK_STARTED: When an agent begins a taskTASK_COMPLETED: When a task is finished (success or failure)
Persistence Integration#
Automatically loads historical data on initialization
Persists task outcomes and agent profiles
Enables long-term learning across system restarts
Workflow Integration#
The Memory system is invoked at key points in the typical Marcus workflow:
create_project: No direct involvementregister_agent: Creates new agent profile if neededrequest_next_task: Uses predictions to optimize task assignmentreport_progress: Updates working memory with progress eventsreport_blocker: Records blockers in agent profiles and task outcomesfinish_task: Records complete task outcome and triggers learning
Key Features#
1. Predictive Analytics#
Task Outcome Prediction#
async def predict_task_outcome(agent_id: str, task: Task) -> Dict[str, Any]
Provides:
Success probability (0-1)
Estimated duration with adjustments
Blockage risk assessment
Risk factors identification
Enhanced Predictions (Advanced System)#
async def predict_task_outcome_v2(agent_id: str, task: Task) -> Dict[str, Any]
Adds:
Confidence intervals based on sample size
Complexity factor adjustments
Time-based relevance weighting
Detailed risk analysis with mitigation suggestions
2. Agent Performance Tracking#
Agent Profiles#
Maintains comprehensive profiles including:
Total/successful/failed/blocked task counts
Skill-specific success rates
Average estimation accuracy
Common blockers encountered
Peak performance patterns
Performance Trajectory Analysis#
async def calculate_agent_performance_trajectory(agent_id: str) -> Dict[str, Any]
Provides:
Current skill levels
Improving vs struggling skills
30-day skill projections
Personalized recommendations
3. Additional Public Prediction Methods#
predict_completion_time#
async def predict_completion_time(self, agent_id: str, task: Task) -> Dict[str, Any]
Returns estimated completion time with confidence intervals based on historical performance data.
Returns:
Estimated duration in hours
Confidence interval (lower and upper bounds)
Sample size used for estimation
Confidence level
predict_blockage_probability#
async def predict_blockage_probability(self, agent_id: str, task: Task) -> Dict[str, Any]
Returns the probability that a task will be blocked, along with a breakdown of risk factors.
Returns:
Blockage probability (0.0–1.0)
Risk breakdown by category
Historical blocker patterns for this agent/task type
Mitigation suggestions
find_similar_outcomes#
async def find_similar_outcomes(self, task: Task, limit: int = 5) -> List[TaskOutcome]
Finds historically similar task outcomes from episodic memory.
Parameters:
task: The task to find similar outcomes forlimit: Maximum number of similar outcomes to return (default 5)
Returns:
List of
TaskOutcomeobjects from similar historical tasks, ordered by similarity
4. Cascade Effect Analysis#
async def predict_cascade_effects(self, task_id: str, delay_hours: float) -> Dict[str, Any]
Method on the Memory class (requires self). Calculates:
Tasks affected by delays
Total project delay impact
Critical path implications
"mitigation_options": list of suggested mitigation strategies (dict key is"mitigation_options")
4. Learning Algorithms#
Exponential Moving Average for Skill Updates#
new_rate = old_rate * (1 - learning_rate) + new_value * learning_rate
Learning rate: 0.1 (10% weight to new experiences)
Provides smooth skill evolution tracking
Time-Based Relevance Weighting#
weight = recency_decay ** weeks_old # recency_decay = 0.95
Recent experiences weighted more heavily
Older data gradually loses influence
Implementation Details#
Data Models#
TaskOutcome#
@dataclass
class TaskOutcome:
task_id: str
agent_id: str
task_name: str
estimated_hours: float
actual_hours: float
success: bool
blockers: List[str] = field(default_factory=list)
started_at: Optional[datetime] = None
completed_at: Optional[datetime] = None
AgentProfile#
@dataclass
class AgentProfile:
agent_id: str
total_tasks: int
successful_tasks: int
failed_tasks: int
blocked_tasks: int
skill_success_rates: Dict[str, float]
average_estimation_accuracy: float
common_blockers: Dict[str, int]
peak_performance_hours: List[int]
TaskPattern#
@dataclass
class TaskPattern:
pattern_type: str
task_labels: List[str]
recent_durations: List[float]
success_rate: float
common_blockers: List[str]
prerequisites: List[str]
best_agents: List[str]
max_samples: int = 100 # Keep last 100 samples for median calculation
Confidence Calculation#
The system uses logarithmic growth for confidence:
0-10 samples: Low confidence (0.1-0.5)
10-20 samples: Medium confidence (0.5-0.8)
20+ samples: High confidence (0.8-0.95)
Complexity Assessment#
Complexity factor calculation considers:
Task duration vs agent’s typical tasks
Task labels (complex, advanced, integration, etc.)
Number and nature of dependencies
Historical performance on similar tasks
Pros and Cons#
Pros#
Data-Driven Decision Making: All predictions based on actual historical performance
Continuous Learning: System improves with every completed task
Risk Awareness: Proactively identifies and suggests mitigations for risks
Personalized: Adapts to individual agent capabilities and patterns
Holistic View: Considers project-wide impacts of individual decisions
Resilience: Fallback mechanisms ensure system continues even with limited data
Transparency: Provides reasoning and confidence levels for all predictions
Cons#
Cold Start Problem: Limited effectiveness with new agents or task types
Memory Growth: Episodic memory grows unbounded without cleanup
Computational Overhead: Complex predictions can be resource-intensive
Limited Pattern Recognition: Simple similarity matching (no ML yet)
No Cross-Project Learning: Memory isolated per Marcus instance
Manual Workflow Capture: Procedural memory not auto-populated
Dependency on Historical Accuracy: Bad early data can skew predictions
Why This Approach#
The multi-tier cognitive model was chosen for several reasons:
Biological Inspiration: Mirrors proven human memory systems
Separation of Concerns: Each tier serves distinct purposes
Temporal Flexibility: Handles both immediate and long-term needs
Graceful Degradation: System functions even with missing tiers
Extensibility: Easy to add new memory types or learning algorithms
Interpretability: Clear what each component does and why
Future Evolution#
Short-term Enhancements#
ML Integration: Replace similarity matching with trained models
Cross-Project Learning: Share learned patterns across projects
Automated Workflow Mining: Extract procedures from execution patterns
Memory Pruning: Implement forgetting mechanisms for old data
Real-time Adaptation: Adjust predictions during task execution
Long-term Vision#
Predictive Project Planning: Generate optimal task sequences
Agent Team Composition: Suggest ideal team configurations
Anomaly Detection: Identify unusual patterns requiring attention
Knowledge Transfer: Export/import learned knowledge
Causal Reasoning: Understand why certain approaches succeed
Additional Utility Methods#
get_median_duration_by_type#
def get_median_duration_by_type(task_type: str) -> Optional[float]
Returns the median task duration for a specific task type label.
Parameters:
task_type: Task type label (e.g., “design”, “implement”, “test”)
Returns:
Median duration in hours, or None if no historical data available
Notes:
Uses median instead of average to be robust to outliers
First tries exact match on pattern type
Falls back to patterns containing the task type
get_global_median_duration#
async def get_global_median_duration() -> float
Returns the global median task duration from all completed tasks.
Returns:
Median task duration in hours (defaults to 1.0 if no historical data)
Notes:
Prefers SQL-based calculation from persistence layer for efficiency
Falls back to in-memory calculation if persistence unavailable
Only considers successful completions with actual_hours > 0
More robust to outliers than mean (tasks that sat waiting for input)
update_project_tasks#
def update_project_tasks(tasks: List[Task]) -> None
Updates working memory with current project tasks for cascade analysis.
Parameters:
tasks: List of all project tasks
Notes:
Stores tasks in
self.working["all_tasks"]Required for dependency analysis and cascade effect predictions
Should be called when project task list changes
get_memory_stats#
def get_memory_stats() -> Dict[str, Any]
Returns memory system statistics across all tiers.
Returns: Dictionary with:
working_memory: Active tasks, recent events, project tasks countepisodic_memory: Total outcomes, days trackedsemantic_memory: Agent profiles count, task patterns countprocedural_memory: Workflows count, strategies count
Task Complexity Handling#
Simple Tasks#
Rely more on agent’s general success rate
Use basic duration estimates
Minimal risk factor analysis
Quick predictions with lower computational cost
Complex Tasks#
Deep analysis of similar historical tasks
Multiple risk factors considered
Detailed mitigation strategies provided
Cascade effect analysis for dependencies
Higher confidence thresholds required
Board-Specific Considerations#
While the Memory system is board-agnostic, it can adapt to different board types:
Kanban Boards: Track cycle time and throughput patterns
Sprint Boards: Learn velocity and burndown patterns
Custom Workflows: Adapt to board-specific state transitions
Cato Integration#
The Memory system is designed to integrate with Cato (Marcus’s reasoning engine):
Context Provider: Supplies historical context for decisions
Constraint Input: Provides performance constraints for optimization
Feedback Loop: Learns from Cato’s assignment outcomes
Prediction Enhancement: Cato can override Memory predictions with reasoning
Technical Excellence#
Async-First Design#
All operations are async, enabling:
Non-blocking predictions during task assignment
Parallel learning from multiple outcomes
Efficient integration with external services
Error Resilience#
Graceful handling of missing data
Fallback predictions when history unavailable
Continued operation despite persistence failures
Performance Optimization#
Lazy loading of historical data
Caching of frequently accessed profiles
Efficient similarity calculations
Bounded search spaces for predictions
Conclusion#
The Multi-Tier Memory System represents a sophisticated approach to organizational learning in autonomous agent systems. By combining cognitive psychology principles with modern software architecture, it provides Marcus with the ability to continuously improve task assignments, predict problems before they occur, and optimize team performance over time. The system’s extensible design ensures it can evolve alongside Marcus’s capabilities while maintaining its core mission of turning past experience into future success.