# Multi-Tier Memory System Technical Documentation

## Overview

The Multi-Tier Memory System is a sophisticated cognitive-inspired memory architecture that enables Marcus to learn from past experiences, predict task outcomes, and optimize agent-task assignments. The system models itself after human memory structures with four distinct tiers: Working Memory, Episodic Memory, Semantic Memory, and Procedural Memory.

## Architecture

### Core Components

1. **Base Memory System (`memory.py`)**
   - Implements the foundational four-tier memory architecture
   - Handles task outcome recording and basic predictions
   - Manages agent performance profiling
   - Provides cascade effect analysis for project dependencies

2. **Advanced Memory System (`memory_advanced.py`)**
   - Extends base system with enhanced prediction capabilities
   - Implements confidence intervals and complexity adjustments
   - Adds time-based relevance weighting
   - Provides risk factor analysis with mitigation suggestions

### Memory Tiers

#### 1. Working Memory (Volatile, Current State)
```python
self.working = {
    "active_tasks": {},     # agent_id -> current task
    "recent_events": [],    # last N events
    "system_state": {},     # current system metrics
}
# Note: "all_tasks" is added dynamically via update_project_tasks() method
```
- Maintains real-time state of active operations
- Tracks which agents are working on what tasks
- Stores recent events for immediate context
- Project tasks added via `update_project_tasks()` for dependency analysis

#### 2. Episodic Memory (Task Execution History)
```python
self.episodic = {
    "outcomes": [],                    # List of TaskOutcome objects
    "timeline": defaultdict(list),     # date -> events
}
```
- Records specific task execution outcomes
- Maintains chronological timeline of events
- Preserves detailed context of each task execution
- Enables pattern recognition across similar experiences

#### 3. Semantic Memory (Learned Facts)
```python
self.semantic = {
    "agent_profiles": {},     # agent_id -> AgentProfile
    "task_patterns": {},      # pattern_id -> TaskPattern
    "success_factors": {},    # factor -> impact
}
```
- Stores extracted knowledge and patterns
- Maintains agent capability profiles
- Identifies task type patterns and success factors
- Builds knowledge base from experience

#### 4. Procedural Memory (Workflows and Strategies)
```python
self.procedural = {
    "workflows": {},        # workflow_id -> steps
    "strategies": {},       # situation -> strategy
    "optimizations": {},    # pattern -> optimization
}
```
- Captures learned workflows and best practices
- Stores situation-specific strategies
- Maintains optimization patterns

## Integration with Marcus Ecosystem

### Event System Integration
The Memory system publishes events through the Marcus Events system:
- `TASK_STARTED`: When an agent begins a task
- `TASK_COMPLETED`: When a task is finished (success or failure)

### Persistence Integration
- Automatically loads historical data on initialization
- Persists task outcomes and agent profiles
- Enables long-term learning across system restarts

### Workflow Integration
The Memory system is invoked at key points in the typical Marcus workflow:

1. **`create_project`**: No direct involvement
2. **`register_agent`**: Creates new agent profile if needed
3. **`request_next_task`**: Uses predictions to optimize task assignment
4. **`report_progress`**: Updates working memory with progress events
5. **`report_blocker`**: Records blockers in agent profiles and task outcomes
6. **`finish_task`**: Records complete task outcome and triggers learning

## Key Features

### 1. Predictive Analytics

#### Task Outcome Prediction
```python
async def predict_task_outcome(agent_id: str, task: Task) -> Dict[str, Any]
```
Provides:
- Success probability (0-1)
- Estimated duration with adjustments
- Blockage risk assessment
- Risk factors identification

#### Enhanced Predictions (Advanced System)
```python
async def predict_task_outcome_v2(agent_id: str, task: Task) -> Dict[str, Any]
```
Adds:
- Confidence intervals based on sample size
- Complexity factor adjustments
- Time-based relevance weighting
- Detailed risk analysis with mitigation suggestions

### 2. Agent Performance Tracking

#### Agent Profiles
Maintains comprehensive profiles including:
- Total/successful/failed/blocked task counts
- Skill-specific success rates
- Average estimation accuracy
- Common blockers encountered
- Peak performance patterns

#### Performance Trajectory Analysis
```python
async def calculate_agent_performance_trajectory(agent_id: str) -> Dict[str, Any]
```
Provides:
- Current skill levels
- Improving vs struggling skills
- 30-day skill projections
- Personalized recommendations

### 3. Additional Public Prediction Methods

#### predict_completion_time
```python
async def predict_completion_time(self, agent_id: str, task: Task) -> Dict[str, Any]
```
Returns estimated completion time with confidence intervals based on historical performance data.

**Returns:**
- Estimated duration in hours
- Confidence interval (lower and upper bounds)
- Sample size used for estimation
- Confidence level

#### predict_blockage_probability
```python
async def predict_blockage_probability(self, agent_id: str, task: Task) -> Dict[str, Any]
```
Returns the probability that a task will be blocked, along with a breakdown of risk factors.

**Returns:**
- Blockage probability (0.0–1.0)
- Risk breakdown by category
- Historical blocker patterns for this agent/task type
- Mitigation suggestions

#### find_similar_outcomes
```python
async def find_similar_outcomes(self, task: Task, limit: int = 5) -> List[TaskOutcome]
```
Finds historically similar task outcomes from episodic memory.

**Parameters:**
- `task`: The task to find similar outcomes for
- `limit`: Maximum number of similar outcomes to return (default 5)

**Returns:**
- List of `TaskOutcome` objects from similar historical tasks, ordered by similarity

### 4. Cascade Effect Analysis

```python
async def predict_cascade_effects(self, task_id: str, delay_hours: float) -> Dict[str, Any]
```
Method on the `Memory` class (requires `self`). Calculates:
- Tasks affected by delays
- Total project delay impact
- Critical path implications
- `"mitigation_options"`: list of suggested mitigation strategies (dict key is `"mitigation_options"`)

### 4. Learning Algorithms

#### Exponential Moving Average for Skill Updates
```python
new_rate = old_rate * (1 - learning_rate) + new_value * learning_rate
```
- Learning rate: 0.1 (10% weight to new experiences)
- Provides smooth skill evolution tracking

#### Time-Based Relevance Weighting
```python
weight = recency_decay ** weeks_old  # recency_decay = 0.95
```
- Recent experiences weighted more heavily
- Older data gradually loses influence

## Implementation Details

### Data Models

#### TaskOutcome
```python
@dataclass
class TaskOutcome:
    task_id: str
    agent_id: str
    task_name: str
    estimated_hours: float
    actual_hours: float
    success: bool
    blockers: List[str] = field(default_factory=list)
    started_at: Optional[datetime] = None
    completed_at: Optional[datetime] = None
```

#### AgentProfile
```python
@dataclass
class AgentProfile:
    agent_id: str
    total_tasks: int
    successful_tasks: int
    failed_tasks: int
    blocked_tasks: int
    skill_success_rates: Dict[str, float]
    average_estimation_accuracy: float
    common_blockers: Dict[str, int]
    peak_performance_hours: List[int]
```

#### TaskPattern
```python
@dataclass
class TaskPattern:
    pattern_type: str
    task_labels: List[str]
    recent_durations: List[float]
    success_rate: float
    common_blockers: List[str]
    prerequisites: List[str]
    best_agents: List[str]
    max_samples: int = 100  # Keep last 100 samples for median calculation
```

### Confidence Calculation

The system uses logarithmic growth for confidence:
- 0-10 samples: Low confidence (0.1-0.5)
- 10-20 samples: Medium confidence (0.5-0.8)
- 20+ samples: High confidence (0.8-0.95)

### Complexity Assessment

Complexity factor calculation considers:
1. Task duration vs agent's typical tasks
2. Task labels (complex, advanced, integration, etc.)
3. Number and nature of dependencies
4. Historical performance on similar tasks

## Pros and Cons

### Pros

1. **Data-Driven Decision Making**: All predictions based on actual historical performance
2. **Continuous Learning**: System improves with every completed task
3. **Risk Awareness**: Proactively identifies and suggests mitigations for risks
4. **Personalized**: Adapts to individual agent capabilities and patterns
5. **Holistic View**: Considers project-wide impacts of individual decisions
6. **Resilience**: Fallback mechanisms ensure system continues even with limited data
7. **Transparency**: Provides reasoning and confidence levels for all predictions

### Cons

1. **Cold Start Problem**: Limited effectiveness with new agents or task types
2. **Memory Growth**: Episodic memory grows unbounded without cleanup
3. **Computational Overhead**: Complex predictions can be resource-intensive
4. **Limited Pattern Recognition**: Simple similarity matching (no ML yet)
5. **No Cross-Project Learning**: Memory isolated per Marcus instance
6. **Manual Workflow Capture**: Procedural memory not auto-populated
7. **Dependency on Historical Accuracy**: Bad early data can skew predictions

## Why This Approach

The multi-tier cognitive model was chosen for several reasons:

1. **Biological Inspiration**: Mirrors proven human memory systems
2. **Separation of Concerns**: Each tier serves distinct purposes
3. **Temporal Flexibility**: Handles both immediate and long-term needs
4. **Graceful Degradation**: System functions even with missing tiers
5. **Extensibility**: Easy to add new memory types or learning algorithms
6. **Interpretability**: Clear what each component does and why

## Future Evolution

### Short-term Enhancements

1. **ML Integration**: Replace similarity matching with trained models
2. **Cross-Project Learning**: Share learned patterns across projects
3. **Automated Workflow Mining**: Extract procedures from execution patterns
4. **Memory Pruning**: Implement forgetting mechanisms for old data
5. **Real-time Adaptation**: Adjust predictions during task execution

### Long-term Vision

1. **Predictive Project Planning**: Generate optimal task sequences
2. **Agent Team Composition**: Suggest ideal team configurations
3. **Anomaly Detection**: Identify unusual patterns requiring attention
4. **Knowledge Transfer**: Export/import learned knowledge
5. **Causal Reasoning**: Understand why certain approaches succeed

## Additional Utility Methods

### get_median_duration_by_type
```python
def get_median_duration_by_type(task_type: str) -> Optional[float]
```
Returns the median task duration for a specific task type label.

**Parameters:**
- `task_type`: Task type label (e.g., "design", "implement", "test")

**Returns:**
- Median duration in hours, or None if no historical data available

**Notes:**
- Uses median instead of average to be robust to outliers
- First tries exact match on pattern type
- Falls back to patterns containing the task type

### get_global_median_duration
```python
async def get_global_median_duration() -> float
```
Returns the global median task duration from all completed tasks.

**Returns:**
- Median task duration in hours (defaults to 1.0 if no historical data)

**Notes:**
- Prefers SQL-based calculation from persistence layer for efficiency
- Falls back to in-memory calculation if persistence unavailable
- Only considers successful completions with actual_hours > 0
- More robust to outliers than mean (tasks that sat waiting for input)

### update_project_tasks
```python
def update_project_tasks(tasks: List[Task]) -> None
```
Updates working memory with current project tasks for cascade analysis.

**Parameters:**
- `tasks`: List of all project tasks

**Notes:**
- Stores tasks in `self.working["all_tasks"]`
- Required for dependency analysis and cascade effect predictions
- Should be called when project task list changes

### get_memory_stats
```python
def get_memory_stats() -> Dict[str, Any]
```
Returns memory system statistics across all tiers.

**Returns:**
Dictionary with:
- `working_memory`: Active tasks, recent events, project tasks count
- `episodic_memory`: Total outcomes, days tracked
- `semantic_memory`: Agent profiles count, task patterns count
- `procedural_memory`: Workflows count, strategies count

## Task Complexity Handling

### Simple Tasks
- Rely more on agent's general success rate
- Use basic duration estimates
- Minimal risk factor analysis
- Quick predictions with lower computational cost

### Complex Tasks
- Deep analysis of similar historical tasks
- Multiple risk factors considered
- Detailed mitigation strategies provided
- Cascade effect analysis for dependencies
- Higher confidence thresholds required

## Board-Specific Considerations

While the Memory system is board-agnostic, it can adapt to different board types:

1. **Kanban Boards**: Track cycle time and throughput patterns
2. **Sprint Boards**: Learn velocity and burndown patterns
3. **Custom Workflows**: Adapt to board-specific state transitions

## Cato Integration

The Memory system is designed to integrate with Cato (Marcus's reasoning engine):

1. **Context Provider**: Supplies historical context for decisions
2. **Constraint Input**: Provides performance constraints for optimization
3. **Feedback Loop**: Learns from Cato's assignment outcomes
4. **Prediction Enhancement**: Cato can override Memory predictions with reasoning

## Technical Excellence

### Async-First Design
All operations are async, enabling:
- Non-blocking predictions during task assignment
- Parallel learning from multiple outcomes
- Efficient integration with external services

### Error Resilience
- Graceful handling of missing data
- Fallback predictions when history unavailable
- Continued operation despite persistence failures

### Performance Optimization
- Lazy loading of historical data
- Caching of frequently accessed profiles
- Efficient similarity calculations
- Bounded search spaces for predictions

## Conclusion

The Multi-Tier Memory System represents a sophisticated approach to organizational learning in autonomous agent systems. By combining cognitive psychology principles with modern software architecture, it provides Marcus with the ability to continuously improve task assignments, predict problems before they occur, and optimize team performance over time. The system's extensible design ensures it can evolve alongside Marcus's capabilities while maintaining its core mission of turning past experience into future success.