# 37. Board Health Analyzer System

## Executive Summary

The Board Health Analyzer System is a sophisticated diagnostic tool that identifies six critical board health issues: skill mismatches, circular dependencies, bottlenecks, chain blocks, stale tasks, and workload imbalances. It provides both real-time analysis through MCP tools and comprehensive health reports with actionable recommendations for resolving detected issues.

## System Architecture

### Core Components

The Board Health Analyzer consists of:

```
Board Health Analyzer Architecture
├── board_health_analyzer.py (Core Analysis)
│   ├── BoardHealthAnalyzer (Main analyzer class)
│   ├── HealthIssue (Issue data structure)
│   ├── HealthIssueType (Enum of issue types)
│   ├── IssueSeverity (LOW, MEDIUM, HIGH, CRITICAL)
│   ├── BoardHealth (Result container)
│   └── Six Analysis Methods:
│       ├── _detect_skill_mismatches()
│       ├── _detect_circular_dependencies()
│       ├── _detect_bottlenecks()
│       ├── _detect_chain_blocks()
│       ├── _detect_stale_tasks()
│       └── _analyze_agent_workload()
└── tools/board_health.py (MCP Tool Integration)
    ├── check_board_health (Full health analysis)
    └── check_task_dependencies (Dependency graph)
```

### Analysis Flow

```
Board State (Tasks + Agents)
            │
            ▼
    Load Board Data
            │
            ├─► Skill Analysis ──────► Mismatches Found
            │
            ├─► Dependency Graph ────► Circular Deps
            │
            ├─► Column Analysis ─────► Bottlenecks
            │
            ├─► Chain Detection ─────► Blocked Chains
            │
            ├─► Time Analysis ───────► Stale Tasks
            │
            └─► Workload Check ──────► Imbalances
                        │
                        ▼
                 Aggregate Issues
                        │
                        ▼
                Generate Recommendations
                        │
                        ▼
                  Health Report
```

## Core Health Issues Detected

### 1. Skill Mismatches

Detects when required skills aren't available:

```python
async def _detect_skill_mismatches(
    self,
    tasks: List[Task],
    agents: Dict[str, WorkerStatus]
) -> List[HealthIssue]:
    """Detect tasks that cannot be handled by available agents."""
    issues = []

    # Collect all available skills from active agents
    available_skills = set()
    for agent in agents.values():
        available_skills.update(skill.lower() for skill in agent.skills)

    # Check each TODO/BLOCKED task
    for task in tasks:
        if task.status in [TaskStatus.TODO, TaskStatus.BLOCKED]:
            if hasattr(task, 'labels') and task.labels:
                missing = set(s.lower() for s in task.labels) - available_skills
                if missing:
                    issues.append(HealthIssue(
                        type=HealthIssueType.SKILL_MISMATCH,
                        severity=IssueSeverity.HIGH,
                        title="Missing Required Skills",
                        description=f"Task '{task.name}' requires {missing} but no active agents have these skills",
                        affected_tasks=[task.id],
                        recommendations=[
                            f"Register agents with skills: {', '.join(missing)}",
                            "Break down task to use available skills"
                        ]
                    ))

    return issues
```

### 2. Circular Dependencies

Detects dependency cycles using DFS:

```python
async def _detect_circular_dependencies(
    self,
    tasks: List[Task]
) -> List[HealthIssue]:
    """Detect circular dependencies in task graph"""
    # Build dependency graph
    graph: Dict[str, List[str]] = {}
    task_map = {task.id: task for task in tasks}

    for task in tasks:
        if hasattr(task, 'dependencies') and task.dependencies:
            graph[task.id] = task.dependencies
        else:
            graph[task.id] = []

    # Find cycles using DFS
    cycles = []
    visited = set()
    rec_stack = set()

    def dfs(node: str, path: List[str]) -> None:
        visited.add(node)
        rec_stack.add(node)
        path.append(node)

        for neighbor in graph.get(node, []):
            if neighbor in rec_stack:
                # Found cycle
                cycle_start = path.index(neighbor)
                cycle = path[cycle_start:]
                cycles.append(cycle)
            elif neighbor not in visited and neighbor in task_map:
                dfs(neighbor, path.copy())

        rec_stack.remove(node)

    # Check all nodes
    for task_id in graph:
        if task_id not in visited:
            dfs(task_id, [])

    # Create issues for cycles
    if cycles:
        return [HealthIssue(
            type=HealthIssueType.CIRCULAR_DEPENDENCY,
            severity=IssueSeverity.CRITICAL,
            title=f"Circular Dependency Detected",
            description=f"Tasks form a dependency cycle: {' → '.join(cycle + [cycle[0]])}",
            affected_tasks=cycle,
            recommendations=[
                "Break the cycle by removing one dependency",
                "Restructure tasks to eliminate circular references",
                "Consider merging related tasks"
            ]
        ) for cycle in cycles]

    return []
```

### 3. Bottleneck Detection

Identifies columns with too many tasks:

```python
async def _detect_bottlenecks(self, tasks: List[Task]) -> List[HealthIssue]:
    """Identify bottlenecks in the workflow"""
    issues = []

    # Count tasks by status
    status_counts = {}
    for task in tasks:
        status = task.status.value if hasattr(task.status, 'value') else str(task.status)
        status_counts[status] = status_counts.get(status, 0) + 1

    # Thresholds for bottlenecks
    thresholds = {
        'TODO': 20,
        'IN_PROGRESS': 10,
        'BLOCKED': 5,
        'IN_REVIEW': 8
    }

    for status, count in status_counts.items():
        threshold = thresholds.get(status.upper(), 15)
        if count > threshold:
            severity = IssueSeverity.HIGH if count > threshold * 1.5 else IssueSeverity.MEDIUM

            issues.append(HealthIssue(
                type=HealthIssueType.BOTTLENECK,
                severity=severity,
                title=f"Bottleneck in {status}",
                description=f"{count} tasks in {status} (threshold: {threshold})",
                affected_tasks=[t.id for t in tasks if str(t.status).upper() == status.upper()],
                recommendations=[
                    f"Review and prioritize {status} tasks",
                    "Assign more resources to this stage",
                    "Identify and remove blockers",
                    "Consider work-in-progress limits"
                ]
            ))

    return issues
```

## Additional Health Checks

### 4. Chain Block Detection

Finds chains of blocked dependencies:

```python
async def _detect_chain_blocks(
    self,
    tasks: List[Task],
    active_assignments: Dict[str, str]
) -> List[HealthIssue]:
    """Find chains where blocked tasks block other tasks"""
    issues = []
    task_map = {task.id: task for task in tasks}

    # Find blocked tasks that have dependents
    blocked_tasks = [t for t in tasks if t.status == TaskStatus.BLOCKED]

    for blocked_task in blocked_tasks:
        # Find tasks depending on this blocked task
        dependent_tasks = [
            t for t in tasks
            if hasattr(t, 'dependencies') and
            blocked_task.id in t.dependencies
        ]

        if dependent_tasks:
            chain_length = 1 + len(dependent_tasks)
            severity = IssueSeverity.HIGH if chain_length > 3 else IssueSeverity.MEDIUM

            issues.append(HealthIssue(
                type=HealthIssueType.CHAIN_BLOCK,
                severity=severity,
                title=f"Blocked Task Creating Chain",
                description=(
                    f"Blocked task '{blocked_task.title}' is blocking "
                    f"{len(dependent_tasks)} other tasks"
                ),
                affected_tasks=[blocked_task.id] + [t.id for t in dependent_tasks],
                recommendations=[
                    f"Prioritize unblocking '{blocked_task.title}'",
                    "Consider alternative approaches for dependent tasks",
                    "Review if dependencies can be relaxed"
                ]
            ))

    return issues
```

### 5. Stale Task Detection

Identifies tasks that haven't been updated:

```python
async def _detect_stale_tasks(self, tasks: List[Task]) -> List[HealthIssue]:
    """Find tasks that haven't been updated recently"""
    issues = []
    now = datetime.now()

    # Thresholds by status
    staleness_thresholds = {
        TaskStatus.IN_PROGRESS: timedelta(days=3),
        TaskStatus.IN_REVIEW: timedelta(days=2),
        TaskStatus.BLOCKED: timedelta(days=7),
        TaskStatus.TODO: timedelta(days=14)
    }

    stale_tasks = []
    for task in tasks:
        if task.status == TaskStatus.DONE:
            continue

        threshold = staleness_thresholds.get(task.status, timedelta(days=7))
        last_update = task.updated_at if hasattr(task, 'updated_at') else task.created_at

        if now - last_update > threshold:
            stale_tasks.append((task, now - last_update))

    if stale_tasks:
        stale_tasks.sort(key=lambda x: x[1], reverse=True)

        description_parts = []
        for task, age in stale_tasks[:5]:  # Show top 5
            age_days = age.days
            description_parts.append(f"• '{task.title}' ({age_days} days old)")

        issues.append(HealthIssue(
            type=HealthIssueType.STALE_TASK,
            severity=IssueSeverity.MEDIUM,
            title=f"{len(stale_tasks)} Stale Tasks Detected",
            description="\n".join(description_parts),
            affected_tasks=[t[0].id for t in stale_tasks],
            recommendations=[
                "Review and update stale tasks",
                "Close tasks that are no longer relevant",
                "Reassign tasks that are stuck",
                "Add progress updates to active tasks"
            ]
        ))

    return issues
```

### 6. Workload Balance Analysis

Checks for uneven task distribution:

```python
async def _analyze_agent_workload(
    self,
    agents: Dict[str, WorkerStatus],
    active_assignments: Dict[str, str]
) -> List[HealthIssue]:
    """Analyze if workload is balanced across agents"""
    issues = []

    # Count tasks per agent
    agent_task_count = {}
    for agent in agents:
        if agent.status == WorkerStatus.ACTIVE:
            agent_task_count[agent.id] = 0

    # Count assigned tasks
    for task in tasks:
        if task.status == TaskStatus.IN_PROGRESS and task.assigned_to:
            if task.assigned_to in agent_task_count:
                agent_task_count[task.assigned_to] += 1

    if not agent_task_count:
        return issues

    # Calculate statistics
    counts = list(agent_task_count.values())
    avg_tasks = sum(counts) / len(counts) if counts else 0
    max_tasks = max(counts) if counts else 0
    min_tasks = min(counts) if counts else 0

    # Check for imbalance
    if max_tasks > avg_tasks * 2 and max_tasks >= 3:
        overloaded = [aid for aid, count in agent_task_count.items() if count == max_tasks]
        underutilized = [aid for aid, count in agent_task_count.items() if count <= 1]

        issues.append(HealthIssue(
            type=HealthIssueType.WORKLOAD_IMBALANCE,
            severity=IssueSeverity.MEDIUM,
            title="Uneven Workload Distribution",
            description=(
                f"Some agents have {max_tasks} tasks while others have {min_tasks}. "
                f"Average is {avg_tasks:.1f} tasks per agent."
            ),
            affected_tasks=[],
            recommendations=[
                f"Reassign tasks from overloaded agents: {', '.join(overloaded)}",
                f"Utilize available agents: {', '.join(underutilized)}",
                "Review task assignment algorithm",
                "Consider agent skills when distributing tasks"
            ]
        ))

    return issues
```

## Issue Data Structure

```python
@dataclass
class HealthIssue:
    """Represents a board health issue."""
    type: HealthIssueType
    severity: IssueSeverity
    title: str
    description: str
    affected_tasks: List[str]
    affected_agents: List[str] = field(default_factory=list)
    recommendations: List[str] = field(default_factory=list)
    metadata: Dict[str, Any] = field(default_factory=dict)

class HealthIssueType(Enum):
    SKILL_MISMATCH = "skill_mismatch"
    CIRCULAR_DEPENDENCY = "circular_dependency"
    BOTTLENECK = "bottleneck"
    CHAIN_BLOCK = "chain_block"
    STALE_TASK = "stale_task"
    WORKLOAD_IMBALANCE = "workload_imbalance"

class IssueSeverity(Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    CRITICAL = "critical"
```

## MCP Tool Integration

### check_board_health Tool

Provides comprehensive board analysis:

```python
async def check_board_health(state: Any) -> Dict[str, Any]:
    """Analyze board health and return issues with recommendations."""
    analyzer = BoardHealthAnalyzer(kanban_client=state.kanban_client)

    # agents: Dict[str, WorkerStatus], active_assignments: Dict[str, str]
    active_assignments = {
        agent_id: assignment.task_id
        for agent_id, assignment in state.agent_tasks.items()
    }
    board_health = await analyzer.analyze_board_health(
        agents=state.agent_status,
        active_assignments=active_assignments,
    )

    return {
        "health_score": board_health.health_score,
        "issue_count": len(board_health.issues),
        "critical_issues": sum(
            1 for i in board_health.issues if i.severity == IssueSeverity.CRITICAL
        ),
        "issues": [
            {
                "type": issue.type.value,
                "severity": issue.severity.value,
                "title": issue.title,
                "description": issue.description,
                "affected_tasks": issue.affected_tasks,
                "recommendations": issue.recommendations,
            }
            for issue in board_health.issues
        ],
        "recommendations": board_health.recommendations,
    }
```

### check_task_dependencies Tool

Analyzes task dependency graph:

```python
async def check_task_dependencies(
    task_id: str,
    kanban_client: KanbanInterface
) -> Dict[str, Any]:
    """Check dependencies for a specific task"""

    tasks = await kanban_client.get_all_tasks()
    task_map = {t.id: t for t in tasks}

    if task_id not in task_map:
        raise ValueError(f"Task {task_id} not found")

    target_task = task_map[task_id]

    # Build dependency information
    dependencies = {
        "direct_dependencies": [],
        "direct_dependents": [],
        "transitive_dependencies": [],
        "transitive_dependents": [],
        "is_blocked": False,
        "blocking_tasks": [],
        "is_part_of_cycle": False,
        "cycle_tasks": []
    }

    # Analyze dependencies
    # ... (implementation details)

    return dependencies
```

## Real-World Examples

### Example 1: Circular Dependency Detection

```bash
$ check_board_health

🚑 CRITICAL: Circular Dependency Detected
Tasks form a dependency cycle: task-123 → task-456 → task-789 → task-123

Recommendations:
• Break the cycle by removing one dependency
• Restructure tasks to eliminate circular references
• Consider merging related tasks
```

### Example 2: Skill Mismatch Alert

```bash
$ check_board_health

⚠️  HIGH: Missing Required Skills
Task 'Implement OAuth2' requires {'oauth', 'security'} but no active agents have these skills

Recommendations:
• Find agents with skills: oauth, security
• Consider training existing agents
• Break down task to use available skills
```

### Example 3: Bottleneck Warning

```bash
$ check_board_health

⚠️  HIGH: Bottleneck in IN_REVIEW
18 tasks in IN_REVIEW (threshold: 8)

Recommendations:
• Review and prioritize IN_REVIEW tasks
• Assign more resources to this stage
• Identify and remove blockers
• Consider work-in-progress limits
```

## Implementation Details

### Complete Analysis Method

```python
class BoardHealthAnalyzer:
    """Analyzes board-level health and detects various types of deadlocks."""

    def __init__(
        self,
        kanban_client: KanbanInterface,
        stale_task_days: int = 7,
        max_tasks_per_agent: int = 3,
    ):
        self.kanban_client = kanban_client
        self.stale_task_days = stale_task_days
        self.max_tasks_per_agent = max_tasks_per_agent

    async def analyze_board_health(
        self,
        agents: Dict[str, WorkerStatus],
        active_assignments: Dict[str, str],  # agent_id -> task_id
    ) -> BoardHealth:
        """Run all health checks and return a BoardHealth result."""
        # Fetches tasks from kanban internally
        all_tasks = await self.kanban_client.get_all_tasks()
        issues = []

        issues.extend(await self._detect_skill_mismatches(all_tasks, agents))
        issues.extend(await self._detect_circular_dependencies(all_tasks))
        issues.extend(await self._detect_bottlenecks(all_tasks))
        issues.extend(await self._detect_chain_blocks(all_tasks, active_assignments))
        issues.extend(await self._detect_stale_tasks(all_tasks))
        issues.extend(await self._analyze_agent_workload(agents, active_assignments))

        metrics = self._calculate_health_metrics(all_tasks, agents, issues)
        recommendations = self._generate_overall_recommendations(issues, metrics)
        health_score = self._calculate_health_score(issues, metrics)

        return BoardHealth(
            health_score=health_score,
            issues=issues,
            metrics=metrics,
            recommendations=recommendations,
            timestamp=datetime.now(timezone.utc),
        )
```

### Summary Generation

```python
def _generate_health_summary(issues: List[BoardHealthIssue]) -> str:
    """Generate a human-readable summary of board health"""
    if not issues:
        return "🎉 Board is healthy! No issues detected."

    summary_parts = []

    # Count by severity
    severity_counts = {}
    for issue in issues:
        severity_counts[issue.severity] = severity_counts.get(issue.severity, 0) + 1

    # Build summary
    if IssueSeverity.CRITICAL in severity_counts:
        summary_parts.append(
            f"🚑 {severity_counts[IssueSeverity.CRITICAL]} CRITICAL issues"
        )
    if IssueSeverity.HIGH in severity_counts:
        summary_parts.append(
            f"⚠️  {severity_counts[IssueSeverity.HIGH]} HIGH priority issues"
        )
    if IssueSeverity.MEDIUM in severity_counts:
        summary_parts.append(
            f"🟡 {severity_counts[IssueSeverity.MEDIUM]} MEDIUM priority issues"
        )
    if IssueSeverity.LOW in severity_counts:
        summary_parts.append(
            f"🟢 {severity_counts[IssueSeverity.LOW]} LOW priority issues"
        )

    return " | ".join(summary_parts)
```

## Configuration

### Analysis Thresholds

Configurable in `config_marcus.json`:

```json
{
  "board_health": {
    "enabled": true,
    "bottleneck_thresholds": {
      "TODO": 20,
      "IN_PROGRESS": 10,
      "BLOCKED": 5,
      "IN_REVIEW": 8
    },
    "staleness_days": {
      "IN_PROGRESS": 3,
      "IN_REVIEW": 2,
      "BLOCKED": 7,
      "TODO": 14
    },
    "workload_imbalance_factor": 2.0,
    "min_tasks_for_imbalance_check": 3
  }
}
```

## Pros and Cons

### Advantages

1. **Comprehensive Detection**: Covers 6 major types of board issues
2. **Actionable Insights**: Each issue comes with specific recommendations
3. **Severity Ranking**: Prioritizes issues by impact
4. **Dependency Analysis**: Detects complex circular dependencies
5. **Resource Optimization**: Identifies skill gaps and workload imbalances
6. **Easy Integration**: Simple MCP tool interface
7. **Real-Time Analysis**: On-demand health checks

### Disadvantages

1. **Static Thresholds**: Fixed limits may not suit all projects
2. **No Historical Tracking**: Doesn't track health trends over time
3. **Limited Context**: May miss project-specific nuances
4. **Manual Invocation**: Requires explicit tool calls
5. **No Auto-Remediation**: Provides recommendations but doesn't fix issues

## Why This Approach

The focused issue detection approach was chosen because:

1. **Specific Problems**: Targets known pain points in Kanban boards
2. **Actionable Results**: Each issue has clear remediation steps
3. **Quick Analysis**: Fast execution for real-time feedback
4. **Developer-Friendly**: Clear categories match developer mental models
5. **Integration**: Works seamlessly with existing Marcus workflow
6. **Practical Focus**: Addresses real problems teams face daily

## Usage Examples

### Basic Health Check

```python
# From MCP client
result = await client.call_tool(
    "check_board_health",
    {}
)

if not result["healthy"]:
    print(f"Found {result['issue_count']} issues:")
    for issue in result["issues"]:
        print(f"- [{issue['severity']}] {issue['title']}")
```

### Dependency Analysis

```python
# Check specific task dependencies
result = await client.call_tool(
    "check_task_dependencies",
    {"task_id": "task-123"}
)

if result["is_blocked"]:
    print(f"Task is blocked by: {result['blocking_tasks']}")

if result["is_part_of_cycle"]:
    print(f"WARNING: Task is in a dependency cycle with: {result['cycle_tasks']}")
```

### Automated Health Monitoring

```python
# Set up periodic health checks
async def monitor_board_health():
    while True:
        result = await client.call_tool("check_board_health", {})

        critical_count = result["critical_issues"]
        if critical_count > 0:
            # Send alert
            await notify_team(
                f"CRITICAL: {critical_count} critical board health issues detected!"
            )

        await asyncio.sleep(300)  # Check every 5 minutes
```

## Integration with Other Systems

### Assignment Lease System

Health analyzer can detect stuck tasks from lease data:

```python
# Detect tasks with too many lease renewals
if hasattr(state, 'lease_manager'):
    lease_stats = state.lease_manager.get_statistics()
    if lease_stats['stuck_tasks'] > 0:
        issues.append(HealthIssue(
            type=HealthIssueType.STALE_TASK,
            severity=IssueSeverity.HIGH,
            title=f"{lease_stats['stuck_tasks']} Stuck Tasks (Lease System)",
            description="Tasks have been renewed too many times",
            recommendations=["Review stuck tasks", "Consider reassignment"]
        ))
```

### Assignment Monitor

Integrates with assignment monitor for orphan detection:

```python
# Check for orphaned assignments
if hasattr(state, 'assignment_monitor'):
    health = await state.assignment_monitor.check_assignment_health()
    if not health['healthy']:
        for issue in health['issues']:
            if issue['type'] == 'orphaned_assignments':
                # Add to board health issues
                ...
```

## Future Enhancements

### Short-term Improvements

1. **Auto-Remediation**: Automatically fix simple issues (e.g., unblock tasks)
2. **Health Trends**: Track health over time for pattern detection
3. **Custom Checks**: Allow project-specific health checks
4. **Integration API**: Webhook notifications for critical issues

### Long-term Vision

1. **Predictive Analysis**: Forecast future bottlenecks
2. **AI Recommendations**: ML-based suggestion improvements
3. **Team Analytics**: Correlate health with team performance
4. **Automated Workflows**: Trigger actions based on health status

## Conclusion

The Board Health Analyzer System provides Marcus with targeted diagnostic capabilities that identify and help resolve six critical board health issues. By analyzing skill mismatches, circular dependencies, bottlenecks, chain blocks, stale tasks, and workload imbalances, the system helps teams maintain healthy, efficient Kanban boards.

The analyzer's practical focus on real-world problems, combined with actionable recommendations for each issue type, makes it an essential tool for project managers and team leads. Its integration as simple MCP tools ensures easy access for both human users and AI agents, enabling proactive board management and preventing common workflow problems before they impact project delivery.