Multi-Tier Memory System Technical Documentation#

Overview#

The Multi-Tier Memory System is a sophisticated cognitive-inspired memory architecture that enables Marcus to learn from past experiences, predict task outcomes, and optimize agent-task assignments. The system models itself after human memory structures with four distinct tiers: Working Memory, Episodic Memory, Semantic Memory, and Procedural Memory.

Architecture#

Core Components#

Base Memory System (memory.py)
- Implements the foundational four-tier memory architecture
- Handles task outcome recording and basic predictions
- Manages agent performance profiling
- Provides cascade effect analysis for project dependencies
Advanced Memory System (memory_advanced.py)
- Extends base system with enhanced prediction capabilities
- Implements confidence intervals and complexity adjustments
- Adds time-based relevance weighting
- Provides risk factor analysis with mitigation suggestions

Memory Tiers#

1. Working Memory (Volatile, Current State)#

self.working = {
    "active_tasks": {},     # agent_id -> current task
    "recent_events": [],    # last N events
    "system_state": {},     # current system metrics
}
# Note: "all_tasks" is added dynamically via update_project_tasks() method

Maintains real-time state of active operations
Tracks which agents are working on what tasks
Stores recent events for immediate context
Project tasks added via update_project_tasks() for dependency analysis

2. Episodic Memory (Task Execution History)#

self.episodic = {
    "outcomes": [],                    # List of TaskOutcome objects
    "timeline": defaultdict(list),     # date -> events
}

Records specific task execution outcomes
Maintains chronological timeline of events
Preserves detailed context of each task execution
Enables pattern recognition across similar experiences

3. Semantic Memory (Learned Facts)#

self.semantic = {
    "agent_profiles": {},     # agent_id -> AgentProfile
    "task_patterns": {},      # pattern_id -> TaskPattern
    "success_factors": {},    # factor -> impact
}

Stores extracted knowledge and patterns
Maintains agent capability profiles
Identifies task type patterns and success factors
Builds knowledge base from experience

4. Procedural Memory (Workflows and Strategies)#

self.procedural = {
    "workflows": {},        # workflow_id -> steps
    "strategies": {},       # situation -> strategy
    "optimizations": {},    # pattern -> optimization
}

Captures learned workflows and best practices
Stores situation-specific strategies
Maintains optimization patterns

Integration with Marcus Ecosystem#

Event System Integration#

The Memory system publishes events through the Marcus Events system:

TASK_STARTED: When an agent begins a task
TASK_COMPLETED: When a task is finished (success or failure)

Persistence Integration#

Automatically loads historical data on initialization
Persists task outcomes and agent profiles
Enables long-term learning across system restarts

Workflow Integration#

The Memory system is invoked at key points in the typical Marcus workflow:

create_project: No direct involvement
register_agent: Creates new agent profile if needed
request_next_task: Uses predictions to optimize task assignment
report_progress: Updates working memory with progress events
report_blocker: Records blockers in agent profiles and task outcomes
finish_task: Records complete task outcome and triggers learning

Key Features#

1. Predictive Analytics#

Task Outcome Prediction#

async def predict_task_outcome(agent_id: str, task: Task) -> Dict[str, Any]

Provides:

Success probability (0-1)
Estimated duration with adjustments
Blockage risk assessment
Risk factors identification

Enhanced Predictions (Advanced System)#

async def predict_task_outcome_v2(agent_id: str, task: Task) -> Dict[str, Any]

Adds:

Confidence intervals based on sample size
Complexity factor adjustments
Time-based relevance weighting
Detailed risk analysis with mitigation suggestions

2. Agent Performance Tracking#

Agent Profiles#

Maintains comprehensive profiles including:

Total/successful/failed/blocked task counts
Skill-specific success rates
Average estimation accuracy
Common blockers encountered
Peak performance patterns

Performance Trajectory Analysis#

async def calculate_agent_performance_trajectory(agent_id: str) -> Dict[str, Any]

Provides:

Current skill levels
Improving vs struggling skills
30-day skill projections
Personalized recommendations

3. Additional Public Prediction Methods#

predict_completion_time#

async def predict_completion_time(self, agent_id: str, task: Task) -> Dict[str, Any]

Returns estimated completion time with confidence intervals based on historical performance data.

Returns:

Estimated duration in hours
Confidence interval (lower and upper bounds)
Sample size used for estimation
Confidence level

predict_blockage_probability#

async def predict_blockage_probability(self, agent_id: str, task: Task) -> Dict[str, Any]

Returns the probability that a task will be blocked, along with a breakdown of risk factors.

Returns:

Blockage probability (0.0–1.0)
Risk breakdown by category
Historical blocker patterns for this agent/task type
Mitigation suggestions

find_similar_outcomes#

async def find_similar_outcomes(self, task: Task, limit: int = 5) -> List[TaskOutcome]

Finds historically similar task outcomes from episodic memory.

Parameters:

task: The task to find similar outcomes for
limit: Maximum number of similar outcomes to return (default 5)

Returns:

List of TaskOutcome objects from similar historical tasks, ordered by similarity

4. Cascade Effect Analysis#

async def predict_cascade_effects(self, task_id: str, delay_hours: float) -> Dict[str, Any]

Method on the Memory class (requires self). Calculates:

Tasks affected by delays
Total project delay impact
Critical path implications
"mitigation_options": list of suggested mitigation strategies (dict key is "mitigation_options")

4. Learning Algorithms#

Exponential Moving Average for Skill Updates#

new_rate = old_rate * (1 - learning_rate) + new_value * learning_rate

Learning rate: 0.1 (10% weight to new experiences)
Provides smooth skill evolution tracking

Time-Based Relevance Weighting#

weight = recency_decay ** weeks_old  # recency_decay = 0.95

Recent experiences weighted more heavily
Older data gradually loses influence

Implementation Details#

Data Models#

TaskOutcome#

@dataclass
class TaskOutcome:
    task_id: str
    agent_id: str
    task_name: str
    estimated_hours: float
    actual_hours: float
    success: bool
    blockers: List[str] = field(default_factory=list)
    started_at: Optional[datetime] = None
    completed_at: Optional[datetime] = None

AgentProfile#

@dataclass
class AgentProfile:
    agent_id: str
    total_tasks: int
    successful_tasks: int
    failed_tasks: int
    blocked_tasks: int
    skill_success_rates: Dict[str, float]
    average_estimation_accuracy: float
    common_blockers: Dict[str, int]
    peak_performance_hours: List[int]

TaskPattern#

@dataclass
class TaskPattern:
    pattern_type: str
    task_labels: List[str]
    recent_durations: List[float]
    success_rate: float
    common_blockers: List[str]
    prerequisites: List[str]
    best_agents: List[str]
    max_samples: int = 100  # Keep last 100 samples for median calculation

Confidence Calculation#

The system uses logarithmic growth for confidence:

0-10 samples: Low confidence (0.1-0.5)
10-20 samples: Medium confidence (0.5-0.8)
20+ samples: High confidence (0.8-0.95)

Complexity Assessment#

Complexity factor calculation considers:

Task duration vs agent’s typical tasks
Task labels (complex, advanced, integration, etc.)
Number and nature of dependencies
Historical performance on similar tasks

Pros and Cons#

Pros#

Data-Driven Decision Making: All predictions based on actual historical performance
Continuous Learning: System improves with every completed task
Risk Awareness: Proactively identifies and suggests mitigations for risks
Personalized: Adapts to individual agent capabilities and patterns
Holistic View: Considers project-wide impacts of individual decisions
Resilience: Fallback mechanisms ensure system continues even with limited data
Transparency: Provides reasoning and confidence levels for all predictions

Cons#

Cold Start Problem: Limited effectiveness with new agents or task types
Memory Growth: Episodic memory grows unbounded without cleanup
Computational Overhead: Complex predictions can be resource-intensive
Limited Pattern Recognition: Simple similarity matching (no ML yet)
No Cross-Project Learning: Memory isolated per Marcus instance
Manual Workflow Capture: Procedural memory not auto-populated
Dependency on Historical Accuracy: Bad early data can skew predictions

Why This Approach#

The multi-tier cognitive model was chosen for several reasons:

Biological Inspiration: Mirrors proven human memory systems
Separation of Concerns: Each tier serves distinct purposes
Temporal Flexibility: Handles both immediate and long-term needs
Graceful Degradation: System functions even with missing tiers
Extensibility: Easy to add new memory types or learning algorithms
Interpretability: Clear what each component does and why

Future Evolution#

Short-term Enhancements#

ML Integration: Replace similarity matching with trained models
Cross-Project Learning: Share learned patterns across projects
Automated Workflow Mining: Extract procedures from execution patterns
Memory Pruning: Implement forgetting mechanisms for old data
Real-time Adaptation: Adjust predictions during task execution

Long-term Vision#

Predictive Project Planning: Generate optimal task sequences
Agent Team Composition: Suggest ideal team configurations
Anomaly Detection: Identify unusual patterns requiring attention
Knowledge Transfer: Export/import learned knowledge
Causal Reasoning: Understand why certain approaches succeed

Additional Utility Methods#

get_median_duration_by_type#

def get_median_duration_by_type(task_type: str) -> Optional[float]

Returns the median task duration for a specific task type label.

Parameters:

task_type: Task type label (e.g., “design”, “implement”, “test”)

Returns:

Median duration in hours, or None if no historical data available

Notes:

Uses median instead of average to be robust to outliers
First tries exact match on pattern type
Falls back to patterns containing the task type

get_global_median_duration#

async def get_global_median_duration() -> float

Returns the global median task duration from all completed tasks.

Returns:

Median task duration in hours (defaults to 1.0 if no historical data)

Notes:

Prefers SQL-based calculation from persistence layer for efficiency
Falls back to in-memory calculation if persistence unavailable
Only considers successful completions with actual_hours > 0
More robust to outliers than mean (tasks that sat waiting for input)

update_project_tasks#

def update_project_tasks(tasks: List[Task]) -> None

Updates working memory with current project tasks for cascade analysis.

Parameters:

tasks: List of all project tasks

Notes:

Stores tasks in self.working["all_tasks"]
Required for dependency analysis and cascade effect predictions
Should be called when project task list changes

get_memory_stats#

def get_memory_stats() -> Dict[str, Any]

Returns memory system statistics across all tiers.

Returns: Dictionary with:

working_memory: Active tasks, recent events, project tasks count
episodic_memory: Total outcomes, days tracked
semantic_memory: Agent profiles count, task patterns count
procedural_memory: Workflows count, strategies count

Task Complexity Handling#

Simple Tasks#

Rely more on agent’s general success rate
Use basic duration estimates
Minimal risk factor analysis
Quick predictions with lower computational cost

Complex Tasks#

Deep analysis of similar historical tasks
Multiple risk factors considered
Detailed mitigation strategies provided
Cascade effect analysis for dependencies
Higher confidence thresholds required

Board-Specific Considerations#

While the Memory system is board-agnostic, it can adapt to different board types:

Kanban Boards: Track cycle time and throughput patterns
Sprint Boards: Learn velocity and burndown patterns
Custom Workflows: Adapt to board-specific state transitions

Cato Integration#

The Memory system is designed to integrate with Cato (Marcus’s reasoning engine):

Context Provider: Supplies historical context for decisions
Constraint Input: Provides performance constraints for optimization
Feedback Loop: Learns from Cato’s assignment outcomes
Prediction Enhancement: Cato can override Memory predictions with reasoning

Technical Excellence#

Async-First Design#

All operations are async, enabling:

Non-blocking predictions during task assignment
Parallel learning from multiple outcomes
Efficient integration with external services

Error Resilience#

Graceful handling of missing data
Fallback predictions when history unavailable
Continued operation despite persistence failures

Performance Optimization#

Lazy loading of historical data
Caching of frequently accessed profiles
Efficient similarity calculations
Bounded search spaces for predictions

Conclusion#

The Multi-Tier Memory System represents a sophisticated approach to organizational learning in autonomous agent systems. By combining cognitive psychology principles with modern software architecture, it provides Marcus with the ability to continuously improve task assignments, predict problems before they occur, and optimize team performance over time. The system’s extensible design ensures it can evolve alongside Marcus’s capabilities while maintaining its core mission of turning past experience into future success.