Core Models System#

Overview#

The Core Models system (src/core/models.py) is the foundational data modeling layer of the Marcus architecture. It defines the fundamental data structures and enumerations that represent all core business entities within the Marcus ecosystem, providing type-safe, well-documented abstractions for tasks, projects, workers, assignments, and system state.

System Architecture#

Core Data Models#

TaskStatus       TaskAssignment      WorkerStatus
    ↓                  ↓                 ↓
   Task ──────────────────────────── ProjectState
    ↓                                     ↓
BlockerReport                      ProjectRisk

The system is built around six primary data classes and three enumerations:

Enumerations:

  • TaskStatus - Lifecycle states (TODO, IN_PROGRESS, DONE, BLOCKED)

  • Priority - Urgency levels (LOW, MEDIUM, HIGH, URGENT)

  • RiskLevel - Impact severity (LOW, MEDIUM, HIGH, CRITICAL)

Core Models:

  • Task - Individual work items with dependencies and metadata

  • ProjectState - Aggregate project health and metrics

  • WorkerStatus - Agent capabilities, workload, and performance

  • TaskAssignment - Execution context for assigned work

  • BlockerReport - Issue tracking and resolution

  • ProjectRisk - Risk assessment and mitigation planning

Marcus Ecosystem Integration#

The Core Models system sits at the architectural center of Marcus, serving as the common language between all subsystems:

graph TD
    A[Core Models] --> B[AI Engine]
    A --> C[Task Assignment]
    A --> D[Kanban Integration]
    A --> E[MCP Server]
    A --> F[Workflow Management]
    A --> G[Monitoring Systems]
    A --> H[Learning Systems]
    A --> I[Context Management]

Workflow Position#

In the typical Marcus scenario workflow, Core Models are invoked at every stage:

1. create_project#

  • ProjectState instances created to track metrics

  • ProjectRisk models initialized for assessment

  • Project metadata stored in model structures

2. register_agent#

  • WorkerStatus model populated with agent capabilities

  • Skills, availability, and performance scores recorded

  • Agent pool state maintained through models

3. request_next_task#

  • Task models filtered and analyzed for assignment

  • TaskAssignment created with execution context

  • Dependency graphs traversed using model relationships

4. report_progress#

  • Task.actual_hours and status updated

  • ProjectState metrics recalculated automatically

  • Progress tracking through model state transitions

5. report_blocker#

  • BlockerReport instances created with AI analysis

  • RiskLevel assessed and propagated to project state

  • Resolution workflows tracked through model lifecycle

6. finish_task#

  • Task.status transitioned to DONE

  • ProjectState.completed_tasks incremented

  • Dependency chains unblocked automatically

What Makes This System Special#

1. Type Safety with Flexibility#

The system uses Python dataclasses with type hints for compile-time safety while maintaining runtime flexibility:

@dataclass
class Task:
    id: str
    name: str
    description: str
    status: TaskStatus
    priority: Priority
    # Optional fields with smart defaults
    actual_hours: float = 0.0
    dependencies: List[str] = field(default_factory=list)
    labels: List[str] = field(default_factory=list)

2. Temporal Awareness#

All models include temporal tracking for audit trails and analytics:

  • created_at and updated_at for lifecycle tracking

  • assigned_at for assignment timing

  • reported_at for issue tracking

  • identified_at for risk management

3. Hierarchical Context#

Models support both flat and hierarchical project structures:

  • Optional project_id and project_name for multi-project support

  • Backward compatibility with single-project deployments

  • Context propagation through model relationships

4. Performance Optimization#

  • Immutable enum types for efficient comparison

  • Default factory functions prevent mutable default arguments

  • Minimal object creation overhead through dataclass optimization

Technical Implementation Details#

Enum Design Philosophy#

class TaskStatus(Enum):
    TODO = "todo"
    IN_PROGRESS = "in_progress"
    DONE = "done"
    BLOCKED = "blocked"

String-based enums chosen over integers for:

  • JSON serialization compatibility with external systems

  • Human-readable database storage

  • API transparency and debugging

  • Internationalization support

Dataclass Field Strategies#

# Safe mutable defaults
dependencies: List[str] = field(default_factory=list)

# Performance-optimized defaults
actual_hours: float = 0.0

# Optional context fields
project_id: Optional[str] = None

Dependency Graph Modeling#

Tasks use string-based dependency references rather than object references:

  • Prevents circular import issues

  • Enables lazy loading and partial graphs

  • Supports distributed task storage

  • Simplifies serialization/deserialization

Security Isolation Features#

The TaskAssignment model includes security-focused fields:

workspace_path: Optional[str] = None
forbidden_paths: List[str] = field(default_factory=list)

These enable sandbox isolation for worker agents, preventing unauthorized file system access.

Simple vs Complex Task Handling#

Simple Tasks#

For basic tasks, the models provide lightweight tracking:

  • Minimal required fields (id, name, description, status, priority)

  • Default values for complex fields

  • Direct status transitions

Complex Tasks#

For sophisticated workflows, models scale up naturally:

  • Rich dependency networks through dependencies lists

  • Multi-project context via project_id/project_name

  • Performance tracking via estimated_hours/actual_hours

  • Risk assessment through BlockerReport and ProjectRisk

Adaptive Complexity#

The AI-powered task assignment system (src/core/ai_powered_task_assignment.py) uses model metadata to determine task complexity:

# Phase 1: Safety filtering uses task.labels and dependencies
safe_tasks = await self._filter_safe_tasks(available_tasks)

# Phase 2: Dependency analysis leverages model relationships
dependency_scores = await self._analyze_dependencies(safe_tasks)

# Phase 3: AI matching considers all model attributes
ai_scores = await self._get_ai_recommendations(safe_tasks, agent_info)

Board-Specific Considerations#

Kanban Integration#

Models provide seamless integration with external Kanban systems:

# Planka mapping
task.status -> Planka card status
task.labels -> Planka tags
task.description -> Planka card content

# Linear mapping
task.priority -> Linear priority levels
task.dependencies -> Linear parent/child relationships
task.estimated_hours -> Linear time estimates

Board Quality Validation#

The system includes board-specific quality checks:

  • Required field validation per provider

  • Status transition rules enforcement

  • Dependency cycle detection

  • Data consistency verification

Provider Abstraction#

Models abstract away provider-specific details:

  • Normalized priority levels across systems

  • Standardized status workflows

  • Common dependency representations

  • Unified metadata handling

Current Implementation: Pros and Cons#

Pros#

1. Consistency Across System

  • Single source of truth for data structures

  • Consistent field naming and types

  • Unified validation rules

2. Type Safety

  • Compile-time error detection

  • IDE autocomplete support

  • Reduced runtime type errors

3. Documentation Integration

  • Numpy-style docstrings for all models

  • Field-level documentation

  • Usage examples included

4. Extensibility

  • Easy to add new fields without breaking changes

  • Optional fields support gradual feature rollout

  • Enum values can be extended safely

5. Performance

  • Lightweight dataclass implementation

  • Efficient enum comparisons

  • Minimal memory overhead

Cons#

1. Coupling Risk

  • Central models create dependency bottlenecks

  • Changes require careful impact analysis

  • Version compatibility challenges

2. Serialization Complexity

  • Datetime handling across timezones

  • Enum serialization for different targets

  • Nested object serialization overhead

3. Validation Limitations

  • No built-in field validation

  • Complex constraint checking requires external logic

  • Cross-field validation not enforced

4. Database Mapping

  • ORM impedance mismatch potential

  • No built-in persistence layer

  • Manual mapping to storage formats

Design Decision Rationale#

Why Dataclasses Over Classes?#

  • Reduced boilerplate: Automatic __init__, __repr__, __eq__

  • Type safety: Native type hint support

  • Immutability options: frozen=True when needed

  • Performance: Optimized memory layout

Why String Enums Over IntEnum?#

  • API clarity: Self-documenting values

  • Debugging ease: Human-readable in logs

  • JSON compatibility: No conversion needed

  • Database storage: Readable column values

Why Optional Fields Over Required?#

  • Backward compatibility: Existing code continues working

  • Gradual migration: Features can be adopted incrementally

  • Flexibility: Different use cases have different requirements

  • Default handling: Sensible defaults reduce configuration burden

Why Flat Dependencies Over Object References?#

  • Serialization: Easy JSON conversion

  • Distributed systems: Works across process boundaries

  • Lazy loading: Dependencies resolved on demand

  • Circular reference avoidance: No memory leaks

Future Evolution#

Planned Enhancements#

1. Validation Framework

  • Pydantic integration for field validation

  • Cross-field constraint checking

  • Custom validation rules per provider

2. Persistence Layer

  • SQLAlchemy model mapping

  • Document database support

  • Caching layer integration

3. Event Sourcing

  • Model state change events

  • Audit trail automation

  • Replay capability for debugging

4. Schema Evolution

  • Automatic migration support

  • Version compatibility checking

  • Backward compatibility guarantees

Performance Optimizations#

1. Memory Efficiency

  • __slots__ for reduced memory usage

  • Interning for common string values

  • Lazy property evaluation

2. Serialization Speed

  • Custom serializers for hot paths

  • Binary serialization options

  • Compression for large datasets

3. Caching Strategy

  • Model instance caching

  • Computed property memoization

  • Query result caching

Integration Points#

Cato Integration#

Currently, the Core Models system does not have direct Cato integration, as Cato appears to be a future enhancement. However, the models are designed to support AI coaching systems through:

Context Awareness:

# Models provide rich context for AI analysis
task_context = {
    "complexity": len(task.dependencies),
    "urgency": task.priority.value,
    "historical_performance": task.actual_hours / task.estimated_hours
}

Coaching Metadata:

  • performance_score in WorkerStatus for coaching recommendations

  • BlockerReport patterns for learning opportunities

  • ProjectRisk analysis for proactive coaching

Future Cato Integration Points:

  • Agent performance coaching based on WorkerStatus metrics

  • Task difficulty prediction using Task historical data

  • Risk mitigation coaching through ProjectRisk patterns

AI Engine Integration#

Models serve as the primary interface to Marcus’s AI capabilities:

Analysis Context:

# Task complexity analysis
analysis_context = AnalysisContext(
    task_count=len(project_tasks),
    avg_priority=avg([t.priority for t in tasks]),
    dependency_depth=max_dependency_depth(tasks)
)

Assignment Context:

# Agent matching context
assignment_context = AssignmentContext(
    agent_skills=worker.skills,
    current_workload=len(worker.current_tasks),
    performance_history=worker.performance_score
)

Error Framework Integration#

Models integrate with Marcus’s error framework for robust error handling:

# Model validation errors
from src.core.error_framework import ValidationError

if not task.name.strip():
    raise ValidationError(
        field_name="name",
        field_value=task.name,
        validation_rule="non_empty_string"
    )

Best Practices#

Model Usage Guidelines#

1. Immutability Where Possible

# Good: Use frozen dataclasses for immutable data
@dataclass(frozen=True)
class TaskSnapshot:
    task_id: str
    status: TaskStatus
    timestamp: datetime

2. Defensive Field Access

# Good: Handle optional fields safely
priority_weight = task.priority.value if task.priority else "medium"

3. Type Validation

# Good: Validate enum assignments
if status_value in [s.value for s in TaskStatus]:
    task.status = TaskStatus(status_value)

Integration Patterns#

1. Model Conversion

# Converting to external format
def to_kanban_card(task: Task) -> Dict[str, Any]:
    return {
        "title": task.name,
        "description": task.description,
        "status": task.status.value,
        "labels": task.labels
    }

2. Batch Operations

# Efficient bulk operations
completed_tasks = [t for t in tasks if t.status == TaskStatus.DONE]
total_effort = sum(t.actual_hours for t in completed_tasks)

The Core Models system represents a mature, well-architected foundation that successfully balances simplicity with extensibility, type safety with flexibility, and performance with maintainability. Its position at the center of the Marcus architecture makes it a critical success factor for the entire system’s reliability and evolution.