Multi-Tier Memory System Technical Documentation#

Overview#

The Multi-Tier Memory System is a sophisticated cognitive-inspired memory architecture that enables Marcus to learn from past experiences, predict task outcomes, and optimize agent-task assignments. The system models itself after human memory structures with four distinct tiers: Working Memory, Episodic Memory, Semantic Memory, and Procedural Memory.

Architecture#

Core Components#

  1. Base Memory System (memory.py)

    • Implements the foundational four-tier memory architecture

    • Handles task outcome recording and basic predictions

    • Manages agent performance profiling

    • Provides cascade effect analysis for project dependencies

  2. Advanced Memory System (memory_advanced.py)

    • Extends base system with enhanced prediction capabilities

    • Implements confidence intervals and complexity adjustments

    • Adds time-based relevance weighting

    • Provides risk factor analysis with mitigation suggestions

Memory Tiers#

1. Working Memory (Volatile, Current State)#

self.working = {
    "active_tasks": {},     # agent_id -> current task
    "recent_events": [],    # last N events
    "system_state": {},     # current system metrics
}
# Note: "all_tasks" is added dynamically via update_project_tasks() method
  • Maintains real-time state of active operations

  • Tracks which agents are working on what tasks

  • Stores recent events for immediate context

  • Project tasks added via update_project_tasks() for dependency analysis

2. Episodic Memory (Task Execution History)#

self.episodic = {
    "outcomes": [],                    # List of TaskOutcome objects
    "timeline": defaultdict(list),     # date -> events
}
  • Records specific task execution outcomes

  • Maintains chronological timeline of events

  • Preserves detailed context of each task execution

  • Enables pattern recognition across similar experiences

3. Semantic Memory (Learned Facts)#

self.semantic = {
    "agent_profiles": {},     # agent_id -> AgentProfile
    "task_patterns": {},      # pattern_id -> TaskPattern
    "success_factors": {},    # factor -> impact
}
  • Stores extracted knowledge and patterns

  • Maintains agent capability profiles

  • Identifies task type patterns and success factors

  • Builds knowledge base from experience

4. Procedural Memory (Workflows and Strategies)#

self.procedural = {
    "workflows": {},        # workflow_id -> steps
    "strategies": {},       # situation -> strategy
    "optimizations": {},    # pattern -> optimization
}
  • Captures learned workflows and best practices

  • Stores situation-specific strategies

  • Maintains optimization patterns

Integration with Marcus Ecosystem#

Event System Integration#

The Memory system publishes events through the Marcus Events system:

  • TASK_STARTED: When an agent begins a task

  • TASK_COMPLETED: When a task is finished (success or failure)

Persistence Integration#

  • Automatically loads historical data on initialization

  • Persists task outcomes and agent profiles

  • Enables long-term learning across system restarts

Workflow Integration#

The Memory system is invoked at key points in the typical Marcus workflow:

  1. create_project: No direct involvement

  2. register_agent: Creates new agent profile if needed

  3. request_next_task: Uses predictions to optimize task assignment

  4. report_progress: Updates working memory with progress events

  5. report_blocker: Records blockers in agent profiles and task outcomes

  6. finish_task: Records complete task outcome and triggers learning

Key Features#

1. Predictive Analytics#

Task Outcome Prediction#

async def predict_task_outcome(agent_id: str, task: Task) -> Dict[str, Any]

Provides:

  • Success probability (0-1)

  • Estimated duration with adjustments

  • Blockage risk assessment

  • Risk factors identification

Enhanced Predictions (Advanced System)#

async def predict_task_outcome_v2(agent_id: str, task: Task) -> Dict[str, Any]

Adds:

  • Confidence intervals based on sample size

  • Complexity factor adjustments

  • Time-based relevance weighting

  • Detailed risk analysis with mitigation suggestions

2. Agent Performance Tracking#

Agent Profiles#

Maintains comprehensive profiles including:

  • Total/successful/failed/blocked task counts

  • Skill-specific success rates

  • Average estimation accuracy

  • Common blockers encountered

  • Peak performance patterns

Performance Trajectory Analysis#

async def calculate_agent_performance_trajectory(agent_id: str) -> Dict[str, Any]

Provides:

  • Current skill levels

  • Improving vs struggling skills

  • 30-day skill projections

  • Personalized recommendations

3. Additional Public Prediction Methods#

predict_completion_time#

async def predict_completion_time(self, agent_id: str, task: Task) -> Dict[str, Any]

Returns estimated completion time with confidence intervals based on historical performance data.

Returns:

  • Estimated duration in hours

  • Confidence interval (lower and upper bounds)

  • Sample size used for estimation

  • Confidence level

predict_blockage_probability#

async def predict_blockage_probability(self, agent_id: str, task: Task) -> Dict[str, Any]

Returns the probability that a task will be blocked, along with a breakdown of risk factors.

Returns:

  • Blockage probability (0.0–1.0)

  • Risk breakdown by category

  • Historical blocker patterns for this agent/task type

  • Mitigation suggestions

find_similar_outcomes#

async def find_similar_outcomes(self, task: Task, limit: int = 5) -> List[TaskOutcome]

Finds historically similar task outcomes from episodic memory.

Parameters:

  • task: The task to find similar outcomes for

  • limit: Maximum number of similar outcomes to return (default 5)

Returns:

  • List of TaskOutcome objects from similar historical tasks, ordered by similarity

4. Cascade Effect Analysis#

async def predict_cascade_effects(self, task_id: str, delay_hours: float) -> Dict[str, Any]

Method on the Memory class (requires self). Calculates:

  • Tasks affected by delays

  • Total project delay impact

  • Critical path implications

  • "mitigation_options": list of suggested mitigation strategies (dict key is "mitigation_options")

4. Learning Algorithms#

Exponential Moving Average for Skill Updates#

new_rate = old_rate * (1 - learning_rate) + new_value * learning_rate
  • Learning rate: 0.1 (10% weight to new experiences)

  • Provides smooth skill evolution tracking

Time-Based Relevance Weighting#

weight = recency_decay ** weeks_old  # recency_decay = 0.95
  • Recent experiences weighted more heavily

  • Older data gradually loses influence

Implementation Details#

Data Models#

TaskOutcome#

@dataclass
class TaskOutcome:
    task_id: str
    agent_id: str
    task_name: str
    estimated_hours: float
    actual_hours: float
    success: bool
    blockers: List[str] = field(default_factory=list)
    started_at: Optional[datetime] = None
    completed_at: Optional[datetime] = None

AgentProfile#

@dataclass
class AgentProfile:
    agent_id: str
    total_tasks: int
    successful_tasks: int
    failed_tasks: int
    blocked_tasks: int
    skill_success_rates: Dict[str, float]
    average_estimation_accuracy: float
    common_blockers: Dict[str, int]
    peak_performance_hours: List[int]

TaskPattern#

@dataclass
class TaskPattern:
    pattern_type: str
    task_labels: List[str]
    recent_durations: List[float]
    success_rate: float
    common_blockers: List[str]
    prerequisites: List[str]
    best_agents: List[str]
    max_samples: int = 100  # Keep last 100 samples for median calculation

Confidence Calculation#

The system uses logarithmic growth for confidence:

  • 0-10 samples: Low confidence (0.1-0.5)

  • 10-20 samples: Medium confidence (0.5-0.8)

  • 20+ samples: High confidence (0.8-0.95)

Complexity Assessment#

Complexity factor calculation considers:

  1. Task duration vs agent’s typical tasks

  2. Task labels (complex, advanced, integration, etc.)

  3. Number and nature of dependencies

  4. Historical performance on similar tasks

Pros and Cons#

Pros#

  1. Data-Driven Decision Making: All predictions based on actual historical performance

  2. Continuous Learning: System improves with every completed task

  3. Risk Awareness: Proactively identifies and suggests mitigations for risks

  4. Personalized: Adapts to individual agent capabilities and patterns

  5. Holistic View: Considers project-wide impacts of individual decisions

  6. Resilience: Fallback mechanisms ensure system continues even with limited data

  7. Transparency: Provides reasoning and confidence levels for all predictions

Cons#

  1. Cold Start Problem: Limited effectiveness with new agents or task types

  2. Memory Growth: Episodic memory grows unbounded without cleanup

  3. Computational Overhead: Complex predictions can be resource-intensive

  4. Limited Pattern Recognition: Simple similarity matching (no ML yet)

  5. No Cross-Project Learning: Memory isolated per Marcus instance

  6. Manual Workflow Capture: Procedural memory not auto-populated

  7. Dependency on Historical Accuracy: Bad early data can skew predictions

Why This Approach#

The multi-tier cognitive model was chosen for several reasons:

  1. Biological Inspiration: Mirrors proven human memory systems

  2. Separation of Concerns: Each tier serves distinct purposes

  3. Temporal Flexibility: Handles both immediate and long-term needs

  4. Graceful Degradation: System functions even with missing tiers

  5. Extensibility: Easy to add new memory types or learning algorithms

  6. Interpretability: Clear what each component does and why

Future Evolution#

Short-term Enhancements#

  1. ML Integration: Replace similarity matching with trained models

  2. Cross-Project Learning: Share learned patterns across projects

  3. Automated Workflow Mining: Extract procedures from execution patterns

  4. Memory Pruning: Implement forgetting mechanisms for old data

  5. Real-time Adaptation: Adjust predictions during task execution

Long-term Vision#

  1. Predictive Project Planning: Generate optimal task sequences

  2. Agent Team Composition: Suggest ideal team configurations

  3. Anomaly Detection: Identify unusual patterns requiring attention

  4. Knowledge Transfer: Export/import learned knowledge

  5. Causal Reasoning: Understand why certain approaches succeed

Additional Utility Methods#

get_median_duration_by_type#

def get_median_duration_by_type(task_type: str) -> Optional[float]

Returns the median task duration for a specific task type label.

Parameters:

  • task_type: Task type label (e.g., “design”, “implement”, “test”)

Returns:

  • Median duration in hours, or None if no historical data available

Notes:

  • Uses median instead of average to be robust to outliers

  • First tries exact match on pattern type

  • Falls back to patterns containing the task type

get_global_median_duration#

async def get_global_median_duration() -> float

Returns the global median task duration from all completed tasks.

Returns:

  • Median task duration in hours (defaults to 1.0 if no historical data)

Notes:

  • Prefers SQL-based calculation from persistence layer for efficiency

  • Falls back to in-memory calculation if persistence unavailable

  • Only considers successful completions with actual_hours > 0

  • More robust to outliers than mean (tasks that sat waiting for input)

update_project_tasks#

def update_project_tasks(tasks: List[Task]) -> None

Updates working memory with current project tasks for cascade analysis.

Parameters:

  • tasks: List of all project tasks

Notes:

  • Stores tasks in self.working["all_tasks"]

  • Required for dependency analysis and cascade effect predictions

  • Should be called when project task list changes

get_memory_stats#

def get_memory_stats() -> Dict[str, Any]

Returns memory system statistics across all tiers.

Returns: Dictionary with:

  • working_memory: Active tasks, recent events, project tasks count

  • episodic_memory: Total outcomes, days tracked

  • semantic_memory: Agent profiles count, task patterns count

  • procedural_memory: Workflows count, strategies count

Task Complexity Handling#

Simple Tasks#

  • Rely more on agent’s general success rate

  • Use basic duration estimates

  • Minimal risk factor analysis

  • Quick predictions with lower computational cost

Complex Tasks#

  • Deep analysis of similar historical tasks

  • Multiple risk factors considered

  • Detailed mitigation strategies provided

  • Cascade effect analysis for dependencies

  • Higher confidence thresholds required

Board-Specific Considerations#

While the Memory system is board-agnostic, it can adapt to different board types:

  1. Kanban Boards: Track cycle time and throughput patterns

  2. Sprint Boards: Learn velocity and burndown patterns

  3. Custom Workflows: Adapt to board-specific state transitions

Cato Integration#

The Memory system is designed to integrate with Cato (Marcus’s reasoning engine):

  1. Context Provider: Supplies historical context for decisions

  2. Constraint Input: Provides performance constraints for optimization

  3. Feedback Loop: Learns from Cato’s assignment outcomes

  4. Prediction Enhancement: Cato can override Memory predictions with reasoning

Technical Excellence#

Async-First Design#

All operations are async, enabling:

  • Non-blocking predictions during task assignment

  • Parallel learning from multiple outcomes

  • Efficient integration with external services

Error Resilience#

  • Graceful handling of missing data

  • Fallback predictions when history unavailable

  • Continued operation despite persistence failures

Performance Optimization#

  • Lazy loading of historical data

  • Caching of frequently accessed profiles

  • Efficient similarity calculations

  • Bounded search spaces for predictions

Conclusion#

The Multi-Tier Memory System represents a sophisticated approach to organizational learning in autonomous agent systems. By combining cognitive psychology principles with modern software architecture, it provides Marcus with the ability to continuously improve task assignments, predict problems before they occur, and optimize team performance over time. The system’s extensible design ensures it can evolve alongside Marcus’s capabilities while maintaining its core mission of turning past experience into future success.