# Core Models System ## Overview The Core Models system (`src/core/models.py`) is the foundational data modeling layer of the Marcus architecture. It defines the fundamental data structures and enumerations that represent all core business entities within the Marcus ecosystem, providing type-safe, well-documented abstractions for tasks, projects, workers, assignments, and system state. ## System Architecture ### Core Data Models ``` TaskStatus TaskAssignment WorkerStatus ↓ ↓ ↓ Task ──────────────────────────── ProjectState ↓ ↓ BlockerReport ProjectRisk ``` The system is built around six primary data classes and three enumerations: **Enumerations:** - `TaskStatus` - Lifecycle states (TODO, IN_PROGRESS, DONE, BLOCKED) - `Priority` - Urgency levels (LOW, MEDIUM, HIGH, URGENT) - `RiskLevel` - Impact severity (LOW, MEDIUM, HIGH, CRITICAL) **Core Models:** - `Task` - Individual work items with dependencies and metadata - `ProjectState` - Aggregate project health and metrics - `WorkerStatus` - Agent capabilities, workload, and performance - `TaskAssignment` - Execution context for assigned work - `BlockerReport` - Issue tracking and resolution - `ProjectRisk` - Risk assessment and mitigation planning ### Marcus Ecosystem Integration The Core Models system sits at the architectural center of Marcus, serving as the common language between all subsystems: ```mermaid graph TD A[Core Models] --> B[AI Engine] A --> C[Task Assignment] A --> D[Kanban Integration] A --> E[MCP Server] A --> F[Workflow Management] A --> G[Monitoring Systems] A --> H[Learning Systems] A --> I[Context Management] ``` ## Workflow Position In the typical Marcus scenario workflow, Core Models are invoked at every stage: ### 1. create_project - `ProjectState` instances created to track metrics - `ProjectRisk` models initialized for assessment - Project metadata stored in model structures ### 2. register_agent - `WorkerStatus` model populated with agent capabilities - Skills, availability, and performance scores recorded - Agent pool state maintained through models ### 3. request_next_task - `Task` models filtered and analyzed for assignment - `TaskAssignment` created with execution context - Dependency graphs traversed using model relationships ### 4. report_progress - `Task.actual_hours` and status updated - `ProjectState` metrics recalculated automatically - Progress tracking through model state transitions ### 5. report_blocker - `BlockerReport` instances created with AI analysis - `RiskLevel` assessed and propagated to project state - Resolution workflows tracked through model lifecycle ### 6. finish_task - `Task.status` transitioned to DONE - `ProjectState.completed_tasks` incremented - Dependency chains unblocked automatically ## What Makes This System Special ### 1. Type Safety with Flexibility The system uses Python dataclasses with type hints for compile-time safety while maintaining runtime flexibility: ```python @dataclass class Task: id: str name: str description: str status: TaskStatus priority: Priority # Optional fields with smart defaults actual_hours: float = 0.0 dependencies: List[str] = field(default_factory=list) labels: List[str] = field(default_factory=list) ``` ### 2. Temporal Awareness All models include temporal tracking for audit trails and analytics: - `created_at` and `updated_at` for lifecycle tracking - `assigned_at` for assignment timing - `reported_at` for issue tracking - `identified_at` for risk management ### 3. Hierarchical Context Models support both flat and hierarchical project structures: - Optional `project_id` and `project_name` for multi-project support - Backward compatibility with single-project deployments - Context propagation through model relationships ### 4. Performance Optimization - Immutable enum types for efficient comparison - Default factory functions prevent mutable default arguments - Minimal object creation overhead through dataclass optimization ## Technical Implementation Details ### Enum Design Philosophy ```python class TaskStatus(Enum): TODO = "todo" IN_PROGRESS = "in_progress" DONE = "done" BLOCKED = "blocked" ``` String-based enums chosen over integers for: - JSON serialization compatibility with external systems - Human-readable database storage - API transparency and debugging - Internationalization support ### Dataclass Field Strategies ```python # Safe mutable defaults dependencies: List[str] = field(default_factory=list) # Performance-optimized defaults actual_hours: float = 0.0 # Optional context fields project_id: Optional[str] = None ``` ### Dependency Graph Modeling Tasks use string-based dependency references rather than object references: - Prevents circular import issues - Enables lazy loading and partial graphs - Supports distributed task storage - Simplifies serialization/deserialization ### Security Isolation Features The `TaskAssignment` model includes security-focused fields: ```python workspace_path: Optional[str] = None forbidden_paths: List[str] = field(default_factory=list) ``` These enable sandbox isolation for worker agents, preventing unauthorized file system access. ## Simple vs Complex Task Handling ### Simple Tasks For basic tasks, the models provide lightweight tracking: - Minimal required fields (id, name, description, status, priority) - Default values for complex fields - Direct status transitions ### Complex Tasks For sophisticated workflows, models scale up naturally: - Rich dependency networks through `dependencies` lists - Multi-project context via `project_id`/`project_name` - Performance tracking via `estimated_hours`/`actual_hours` - Risk assessment through `BlockerReport` and `ProjectRisk` ### Adaptive Complexity The AI-powered task assignment system (`src/core/ai_powered_task_assignment.py`) uses model metadata to determine task complexity: ```python # Phase 1: Safety filtering uses task.labels and dependencies safe_tasks = await self._filter_safe_tasks(available_tasks) # Phase 2: Dependency analysis leverages model relationships dependency_scores = await self._analyze_dependencies(safe_tasks) # Phase 3: AI matching considers all model attributes ai_scores = await self._get_ai_recommendations(safe_tasks, agent_info) ``` ## Board-Specific Considerations ### Kanban Integration Models provide seamless integration with external Kanban systems: ```python # Planka mapping task.status -> Planka card status task.labels -> Planka tags task.description -> Planka card content # Linear mapping task.priority -> Linear priority levels task.dependencies -> Linear parent/child relationships task.estimated_hours -> Linear time estimates ``` ### Board Quality Validation The system includes board-specific quality checks: - Required field validation per provider - Status transition rules enforcement - Dependency cycle detection - Data consistency verification ### Provider Abstraction Models abstract away provider-specific details: - Normalized priority levels across systems - Standardized status workflows - Common dependency representations - Unified metadata handling ## Current Implementation: Pros and Cons ### Pros **1. Consistency Across System** - Single source of truth for data structures - Consistent field naming and types - Unified validation rules **2. Type Safety** - Compile-time error detection - IDE autocomplete support - Reduced runtime type errors **3. Documentation Integration** - Numpy-style docstrings for all models - Field-level documentation - Usage examples included **4. Extensibility** - Easy to add new fields without breaking changes - Optional fields support gradual feature rollout - Enum values can be extended safely **5. Performance** - Lightweight dataclass implementation - Efficient enum comparisons - Minimal memory overhead ### Cons **1. Coupling Risk** - Central models create dependency bottlenecks - Changes require careful impact analysis - Version compatibility challenges **2. Serialization Complexity** - Datetime handling across timezones - Enum serialization for different targets - Nested object serialization overhead **3. Validation Limitations** - No built-in field validation - Complex constraint checking requires external logic - Cross-field validation not enforced **4. Database Mapping** - ORM impedance mismatch potential - No built-in persistence layer - Manual mapping to storage formats ## Design Decision Rationale ### Why Dataclasses Over Classes? - **Reduced boilerplate**: Automatic `__init__`, `__repr__`, `__eq__` - **Type safety**: Native type hint support - **Immutability options**: `frozen=True` when needed - **Performance**: Optimized memory layout ### Why String Enums Over IntEnum? - **API clarity**: Self-documenting values - **Debugging ease**: Human-readable in logs - **JSON compatibility**: No conversion needed - **Database storage**: Readable column values ### Why Optional Fields Over Required? - **Backward compatibility**: Existing code continues working - **Gradual migration**: Features can be adopted incrementally - **Flexibility**: Different use cases have different requirements - **Default handling**: Sensible defaults reduce configuration burden ### Why Flat Dependencies Over Object References? - **Serialization**: Easy JSON conversion - **Distributed systems**: Works across process boundaries - **Lazy loading**: Dependencies resolved on demand - **Circular reference avoidance**: No memory leaks ## Future Evolution ### Planned Enhancements **1. Validation Framework** - Pydantic integration for field validation - Cross-field constraint checking - Custom validation rules per provider **2. Persistence Layer** - SQLAlchemy model mapping - Document database support - Caching layer integration **3. Event Sourcing** - Model state change events - Audit trail automation - Replay capability for debugging **4. Schema Evolution** - Automatic migration support - Version compatibility checking - Backward compatibility guarantees ### Performance Optimizations **1. Memory Efficiency** - `__slots__` for reduced memory usage - Interning for common string values - Lazy property evaluation **2. Serialization Speed** - Custom serializers for hot paths - Binary serialization options - Compression for large datasets **3. Caching Strategy** - Model instance caching - Computed property memoization - Query result caching ## Integration Points ### Cato Integration Currently, the Core Models system does not have direct Cato integration, as Cato appears to be a future enhancement. However, the models are designed to support AI coaching systems through: **Context Awareness:** ```python # Models provide rich context for AI analysis task_context = { "complexity": len(task.dependencies), "urgency": task.priority.value, "historical_performance": task.actual_hours / task.estimated_hours } ``` **Coaching Metadata:** - `performance_score` in `WorkerStatus` for coaching recommendations - `BlockerReport` patterns for learning opportunities - `ProjectRisk` analysis for proactive coaching **Future Cato Integration Points:** - Agent performance coaching based on `WorkerStatus` metrics - Task difficulty prediction using `Task` historical data - Risk mitigation coaching through `ProjectRisk` patterns ### AI Engine Integration Models serve as the primary interface to Marcus's AI capabilities: **Analysis Context:** ```python # Task complexity analysis analysis_context = AnalysisContext( task_count=len(project_tasks), avg_priority=avg([t.priority for t in tasks]), dependency_depth=max_dependency_depth(tasks) ) ``` **Assignment Context:** ```python # Agent matching context assignment_context = AssignmentContext( agent_skills=worker.skills, current_workload=len(worker.current_tasks), performance_history=worker.performance_score ) ``` ### Error Framework Integration Models integrate with Marcus's error framework for robust error handling: ```python # Model validation errors from src.core.error_framework import ValidationError if not task.name.strip(): raise ValidationError( field_name="name", field_value=task.name, validation_rule="non_empty_string" ) ``` ## Best Practices ### Model Usage Guidelines **1. Immutability Where Possible** ```python # Good: Use frozen dataclasses for immutable data @dataclass(frozen=True) class TaskSnapshot: task_id: str status: TaskStatus timestamp: datetime ``` **2. Defensive Field Access** ```python # Good: Handle optional fields safely priority_weight = task.priority.value if task.priority else "medium" ``` **3. Type Validation** ```python # Good: Validate enum assignments if status_value in [s.value for s in TaskStatus]: task.status = TaskStatus(status_value) ``` ### Integration Patterns **1. Model Conversion** ```python # Converting to external format def to_kanban_card(task: Task) -> Dict[str, Any]: return { "title": task.name, "description": task.description, "status": task.status.value, "labels": task.labels } ``` **2. Batch Operations** ```python # Efficient bulk operations completed_tasks = [t for t in tasks if t.status == TaskStatus.DONE] total_effort = sum(t.actual_hours for t in completed_tasks) ``` The Core Models system represents a mature, well-architected foundation that successfully balances simplicity with extensibility, type safety with flexibility, and performance with maintainability. Its position at the center of the Marcus architecture makes it a critical success factor for the entire system's reliability and evolution.