37. Board Health Analyzer System#
Executive Summary#
The Board Health Analyzer System is a sophisticated diagnostic tool that identifies six critical board health issues: skill mismatches, circular dependencies, bottlenecks, chain blocks, stale tasks, and workload imbalances. It provides both real-time analysis through MCP tools and comprehensive health reports with actionable recommendations for resolving detected issues.
System Architecture#
Core Components#
The Board Health Analyzer consists of:
Board Health Analyzer Architecture
βββ board_health_analyzer.py (Core Analysis)
β βββ BoardHealthAnalyzer (Main analyzer class)
β βββ HealthIssue (Issue data structure)
β βββ HealthIssueType (Enum of issue types)
β βββ IssueSeverity (LOW, MEDIUM, HIGH, CRITICAL)
β βββ BoardHealth (Result container)
β βββ Six Analysis Methods:
β βββ _detect_skill_mismatches()
β βββ _detect_circular_dependencies()
β βββ _detect_bottlenecks()
β βββ _detect_chain_blocks()
β βββ _detect_stale_tasks()
β βββ _analyze_agent_workload()
βββ tools/board_health.py (MCP Tool Integration)
βββ check_board_health (Full health analysis)
βββ check_task_dependencies (Dependency graph)
Analysis Flow#
Board State (Tasks + Agents)
β
βΌ
Load Board Data
β
βββΊ Skill Analysis βββββββΊ Mismatches Found
β
βββΊ Dependency Graph βββββΊ Circular Deps
β
βββΊ Column Analysis ββββββΊ Bottlenecks
β
βββΊ Chain Detection ββββββΊ Blocked Chains
β
βββΊ Time Analysis ββββββββΊ Stale Tasks
β
βββΊ Workload Check βββββββΊ Imbalances
β
βΌ
Aggregate Issues
β
βΌ
Generate Recommendations
β
βΌ
Health Report
Core Health Issues Detected#
1. Skill Mismatches#
Detects when required skills arenβt available:
async def _detect_skill_mismatches(
self,
tasks: List[Task],
agents: Dict[str, WorkerStatus]
) -> List[HealthIssue]:
"""Detect tasks that cannot be handled by available agents."""
issues = []
# Collect all available skills from active agents
available_skills = set()
for agent in agents.values():
available_skills.update(skill.lower() for skill in agent.skills)
# Check each TODO/BLOCKED task
for task in tasks:
if task.status in [TaskStatus.TODO, TaskStatus.BLOCKED]:
if hasattr(task, 'labels') and task.labels:
missing = set(s.lower() for s in task.labels) - available_skills
if missing:
issues.append(HealthIssue(
type=HealthIssueType.SKILL_MISMATCH,
severity=IssueSeverity.HIGH,
title="Missing Required Skills",
description=f"Task '{task.name}' requires {missing} but no active agents have these skills",
affected_tasks=[task.id],
recommendations=[
f"Register agents with skills: {', '.join(missing)}",
"Break down task to use available skills"
]
))
return issues
2. Circular Dependencies#
Detects dependency cycles using DFS:
async def _detect_circular_dependencies(
self,
tasks: List[Task]
) -> List[HealthIssue]:
"""Detect circular dependencies in task graph"""
# Build dependency graph
graph: Dict[str, List[str]] = {}
task_map = {task.id: task for task in tasks}
for task in tasks:
if hasattr(task, 'dependencies') and task.dependencies:
graph[task.id] = task.dependencies
else:
graph[task.id] = []
# Find cycles using DFS
cycles = []
visited = set()
rec_stack = set()
def dfs(node: str, path: List[str]) -> None:
visited.add(node)
rec_stack.add(node)
path.append(node)
for neighbor in graph.get(node, []):
if neighbor in rec_stack:
# Found cycle
cycle_start = path.index(neighbor)
cycle = path[cycle_start:]
cycles.append(cycle)
elif neighbor not in visited and neighbor in task_map:
dfs(neighbor, path.copy())
rec_stack.remove(node)
# Check all nodes
for task_id in graph:
if task_id not in visited:
dfs(task_id, [])
# Create issues for cycles
if cycles:
return [HealthIssue(
type=HealthIssueType.CIRCULAR_DEPENDENCY,
severity=IssueSeverity.CRITICAL,
title=f"Circular Dependency Detected",
description=f"Tasks form a dependency cycle: {' β '.join(cycle + [cycle[0]])}",
affected_tasks=cycle,
recommendations=[
"Break the cycle by removing one dependency",
"Restructure tasks to eliminate circular references",
"Consider merging related tasks"
]
) for cycle in cycles]
return []
3. Bottleneck Detection#
Identifies columns with too many tasks:
async def _detect_bottlenecks(self, tasks: List[Task]) -> List[HealthIssue]:
"""Identify bottlenecks in the workflow"""
issues = []
# Count tasks by status
status_counts = {}
for task in tasks:
status = task.status.value if hasattr(task.status, 'value') else str(task.status)
status_counts[status] = status_counts.get(status, 0) + 1
# Thresholds for bottlenecks
thresholds = {
'TODO': 20,
'IN_PROGRESS': 10,
'BLOCKED': 5,
'IN_REVIEW': 8
}
for status, count in status_counts.items():
threshold = thresholds.get(status.upper(), 15)
if count > threshold:
severity = IssueSeverity.HIGH if count > threshold * 1.5 else IssueSeverity.MEDIUM
issues.append(HealthIssue(
type=HealthIssueType.BOTTLENECK,
severity=severity,
title=f"Bottleneck in {status}",
description=f"{count} tasks in {status} (threshold: {threshold})",
affected_tasks=[t.id for t in tasks if str(t.status).upper() == status.upper()],
recommendations=[
f"Review and prioritize {status} tasks",
"Assign more resources to this stage",
"Identify and remove blockers",
"Consider work-in-progress limits"
]
))
return issues
Additional Health Checks#
4. Chain Block Detection#
Finds chains of blocked dependencies:
async def _detect_chain_blocks(
self,
tasks: List[Task],
active_assignments: Dict[str, str]
) -> List[HealthIssue]:
"""Find chains where blocked tasks block other tasks"""
issues = []
task_map = {task.id: task for task in tasks}
# Find blocked tasks that have dependents
blocked_tasks = [t for t in tasks if t.status == TaskStatus.BLOCKED]
for blocked_task in blocked_tasks:
# Find tasks depending on this blocked task
dependent_tasks = [
t for t in tasks
if hasattr(t, 'dependencies') and
blocked_task.id in t.dependencies
]
if dependent_tasks:
chain_length = 1 + len(dependent_tasks)
severity = IssueSeverity.HIGH if chain_length > 3 else IssueSeverity.MEDIUM
issues.append(HealthIssue(
type=HealthIssueType.CHAIN_BLOCK,
severity=severity,
title=f"Blocked Task Creating Chain",
description=(
f"Blocked task '{blocked_task.title}' is blocking "
f"{len(dependent_tasks)} other tasks"
),
affected_tasks=[blocked_task.id] + [t.id for t in dependent_tasks],
recommendations=[
f"Prioritize unblocking '{blocked_task.title}'",
"Consider alternative approaches for dependent tasks",
"Review if dependencies can be relaxed"
]
))
return issues
5. Stale Task Detection#
Identifies tasks that havenβt been updated:
async def _detect_stale_tasks(self, tasks: List[Task]) -> List[HealthIssue]:
"""Find tasks that haven't been updated recently"""
issues = []
now = datetime.now()
# Thresholds by status
staleness_thresholds = {
TaskStatus.IN_PROGRESS: timedelta(days=3),
TaskStatus.IN_REVIEW: timedelta(days=2),
TaskStatus.BLOCKED: timedelta(days=7),
TaskStatus.TODO: timedelta(days=14)
}
stale_tasks = []
for task in tasks:
if task.status == TaskStatus.DONE:
continue
threshold = staleness_thresholds.get(task.status, timedelta(days=7))
last_update = task.updated_at if hasattr(task, 'updated_at') else task.created_at
if now - last_update > threshold:
stale_tasks.append((task, now - last_update))
if stale_tasks:
stale_tasks.sort(key=lambda x: x[1], reverse=True)
description_parts = []
for task, age in stale_tasks[:5]: # Show top 5
age_days = age.days
description_parts.append(f"β’ '{task.title}' ({age_days} days old)")
issues.append(HealthIssue(
type=HealthIssueType.STALE_TASK,
severity=IssueSeverity.MEDIUM,
title=f"{len(stale_tasks)} Stale Tasks Detected",
description="\n".join(description_parts),
affected_tasks=[t[0].id for t in stale_tasks],
recommendations=[
"Review and update stale tasks",
"Close tasks that are no longer relevant",
"Reassign tasks that are stuck",
"Add progress updates to active tasks"
]
))
return issues
6. Workload Balance Analysis#
Checks for uneven task distribution:
async def _analyze_agent_workload(
self,
agents: Dict[str, WorkerStatus],
active_assignments: Dict[str, str]
) -> List[HealthIssue]:
"""Analyze if workload is balanced across agents"""
issues = []
# Count tasks per agent
agent_task_count = {}
for agent in agents:
if agent.status == WorkerStatus.ACTIVE:
agent_task_count[agent.id] = 0
# Count assigned tasks
for task in tasks:
if task.status == TaskStatus.IN_PROGRESS and task.assigned_to:
if task.assigned_to in agent_task_count:
agent_task_count[task.assigned_to] += 1
if not agent_task_count:
return issues
# Calculate statistics
counts = list(agent_task_count.values())
avg_tasks = sum(counts) / len(counts) if counts else 0
max_tasks = max(counts) if counts else 0
min_tasks = min(counts) if counts else 0
# Check for imbalance
if max_tasks > avg_tasks * 2 and max_tasks >= 3:
overloaded = [aid for aid, count in agent_task_count.items() if count == max_tasks]
underutilized = [aid for aid, count in agent_task_count.items() if count <= 1]
issues.append(HealthIssue(
type=HealthIssueType.WORKLOAD_IMBALANCE,
severity=IssueSeverity.MEDIUM,
title="Uneven Workload Distribution",
description=(
f"Some agents have {max_tasks} tasks while others have {min_tasks}. "
f"Average is {avg_tasks:.1f} tasks per agent."
),
affected_tasks=[],
recommendations=[
f"Reassign tasks from overloaded agents: {', '.join(overloaded)}",
f"Utilize available agents: {', '.join(underutilized)}",
"Review task assignment algorithm",
"Consider agent skills when distributing tasks"
]
))
return issues
Issue Data Structure#
@dataclass
class HealthIssue:
"""Represents a board health issue."""
type: HealthIssueType
severity: IssueSeverity
title: str
description: str
affected_tasks: List[str]
affected_agents: List[str] = field(default_factory=list)
recommendations: List[str] = field(default_factory=list)
metadata: Dict[str, Any] = field(default_factory=dict)
class HealthIssueType(Enum):
SKILL_MISMATCH = "skill_mismatch"
CIRCULAR_DEPENDENCY = "circular_dependency"
BOTTLENECK = "bottleneck"
CHAIN_BLOCK = "chain_block"
STALE_TASK = "stale_task"
WORKLOAD_IMBALANCE = "workload_imbalance"
class IssueSeverity(Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
CRITICAL = "critical"
MCP Tool Integration#
check_board_health Tool#
Provides comprehensive board analysis:
async def check_board_health(state: Any) -> Dict[str, Any]:
"""Analyze board health and return issues with recommendations."""
analyzer = BoardHealthAnalyzer(kanban_client=state.kanban_client)
# agents: Dict[str, WorkerStatus], active_assignments: Dict[str, str]
active_assignments = {
agent_id: assignment.task_id
for agent_id, assignment in state.agent_tasks.items()
}
board_health = await analyzer.analyze_board_health(
agents=state.agent_status,
active_assignments=active_assignments,
)
return {
"health_score": board_health.health_score,
"issue_count": len(board_health.issues),
"critical_issues": sum(
1 for i in board_health.issues if i.severity == IssueSeverity.CRITICAL
),
"issues": [
{
"type": issue.type.value,
"severity": issue.severity.value,
"title": issue.title,
"description": issue.description,
"affected_tasks": issue.affected_tasks,
"recommendations": issue.recommendations,
}
for issue in board_health.issues
],
"recommendations": board_health.recommendations,
}
check_task_dependencies Tool#
Analyzes task dependency graph:
async def check_task_dependencies(
task_id: str,
kanban_client: KanbanInterface
) -> Dict[str, Any]:
"""Check dependencies for a specific task"""
tasks = await kanban_client.get_all_tasks()
task_map = {t.id: t for t in tasks}
if task_id not in task_map:
raise ValueError(f"Task {task_id} not found")
target_task = task_map[task_id]
# Build dependency information
dependencies = {
"direct_dependencies": [],
"direct_dependents": [],
"transitive_dependencies": [],
"transitive_dependents": [],
"is_blocked": False,
"blocking_tasks": [],
"is_part_of_cycle": False,
"cycle_tasks": []
}
# Analyze dependencies
# ... (implementation details)
return dependencies
Real-World Examples#
Example 1: Circular Dependency Detection#
$ check_board_health
π CRITICAL: Circular Dependency Detected
Tasks form a dependency cycle: task-123 β task-456 β task-789 β task-123
Recommendations:
β’ Break the cycle by removing one dependency
β’ Restructure tasks to eliminate circular references
β’ Consider merging related tasks
Example 2: Skill Mismatch Alert#
$ check_board_health
β οΈ HIGH: Missing Required Skills
Task 'Implement OAuth2' requires {'oauth', 'security'} but no active agents have these skills
Recommendations:
β’ Find agents with skills: oauth, security
β’ Consider training existing agents
β’ Break down task to use available skills
Example 3: Bottleneck Warning#
$ check_board_health
β οΈ HIGH: Bottleneck in IN_REVIEW
18 tasks in IN_REVIEW (threshold: 8)
Recommendations:
β’ Review and prioritize IN_REVIEW tasks
β’ Assign more resources to this stage
β’ Identify and remove blockers
β’ Consider work-in-progress limits
Implementation Details#
Complete Analysis Method#
class BoardHealthAnalyzer:
"""Analyzes board-level health and detects various types of deadlocks."""
def __init__(
self,
kanban_client: KanbanInterface,
stale_task_days: int = 7,
max_tasks_per_agent: int = 3,
):
self.kanban_client = kanban_client
self.stale_task_days = stale_task_days
self.max_tasks_per_agent = max_tasks_per_agent
async def analyze_board_health(
self,
agents: Dict[str, WorkerStatus],
active_assignments: Dict[str, str], # agent_id -> task_id
) -> BoardHealth:
"""Run all health checks and return a BoardHealth result."""
# Fetches tasks from kanban internally
all_tasks = await self.kanban_client.get_all_tasks()
issues = []
issues.extend(await self._detect_skill_mismatches(all_tasks, agents))
issues.extend(await self._detect_circular_dependencies(all_tasks))
issues.extend(await self._detect_bottlenecks(all_tasks))
issues.extend(await self._detect_chain_blocks(all_tasks, active_assignments))
issues.extend(await self._detect_stale_tasks(all_tasks))
issues.extend(await self._analyze_agent_workload(agents, active_assignments))
metrics = self._calculate_health_metrics(all_tasks, agents, issues)
recommendations = self._generate_overall_recommendations(issues, metrics)
health_score = self._calculate_health_score(issues, metrics)
return BoardHealth(
health_score=health_score,
issues=issues,
metrics=metrics,
recommendations=recommendations,
timestamp=datetime.now(timezone.utc),
)
Summary Generation#
def _generate_health_summary(issues: List[BoardHealthIssue]) -> str:
"""Generate a human-readable summary of board health"""
if not issues:
return "π Board is healthy! No issues detected."
summary_parts = []
# Count by severity
severity_counts = {}
for issue in issues:
severity_counts[issue.severity] = severity_counts.get(issue.severity, 0) + 1
# Build summary
if IssueSeverity.CRITICAL in severity_counts:
summary_parts.append(
f"π {severity_counts[IssueSeverity.CRITICAL]} CRITICAL issues"
)
if IssueSeverity.HIGH in severity_counts:
summary_parts.append(
f"β οΈ {severity_counts[IssueSeverity.HIGH]} HIGH priority issues"
)
if IssueSeverity.MEDIUM in severity_counts:
summary_parts.append(
f"π‘ {severity_counts[IssueSeverity.MEDIUM]} MEDIUM priority issues"
)
if IssueSeverity.LOW in severity_counts:
summary_parts.append(
f"π’ {severity_counts[IssueSeverity.LOW]} LOW priority issues"
)
return " | ".join(summary_parts)
Configuration#
Analysis Thresholds#
Configurable in config_marcus.json:
{
"board_health": {
"enabled": true,
"bottleneck_thresholds": {
"TODO": 20,
"IN_PROGRESS": 10,
"BLOCKED": 5,
"IN_REVIEW": 8
},
"staleness_days": {
"IN_PROGRESS": 3,
"IN_REVIEW": 2,
"BLOCKED": 7,
"TODO": 14
},
"workload_imbalance_factor": 2.0,
"min_tasks_for_imbalance_check": 3
}
}
Pros and Cons#
Advantages#
Comprehensive Detection: Covers 6 major types of board issues
Actionable Insights: Each issue comes with specific recommendations
Severity Ranking: Prioritizes issues by impact
Dependency Analysis: Detects complex circular dependencies
Resource Optimization: Identifies skill gaps and workload imbalances
Easy Integration: Simple MCP tool interface
Real-Time Analysis: On-demand health checks
Disadvantages#
Static Thresholds: Fixed limits may not suit all projects
No Historical Tracking: Doesnβt track health trends over time
Limited Context: May miss project-specific nuances
Manual Invocation: Requires explicit tool calls
No Auto-Remediation: Provides recommendations but doesnβt fix issues
Why This Approach#
The focused issue detection approach was chosen because:
Specific Problems: Targets known pain points in Kanban boards
Actionable Results: Each issue has clear remediation steps
Quick Analysis: Fast execution for real-time feedback
Developer-Friendly: Clear categories match developer mental models
Integration: Works seamlessly with existing Marcus workflow
Practical Focus: Addresses real problems teams face daily
Usage Examples#
Basic Health Check#
# From MCP client
result = await client.call_tool(
"check_board_health",
{}
)
if not result["healthy"]:
print(f"Found {result['issue_count']} issues:")
for issue in result["issues"]:
print(f"- [{issue['severity']}] {issue['title']}")
Dependency Analysis#
# Check specific task dependencies
result = await client.call_tool(
"check_task_dependencies",
{"task_id": "task-123"}
)
if result["is_blocked"]:
print(f"Task is blocked by: {result['blocking_tasks']}")
if result["is_part_of_cycle"]:
print(f"WARNING: Task is in a dependency cycle with: {result['cycle_tasks']}")
Automated Health Monitoring#
# Set up periodic health checks
async def monitor_board_health():
while True:
result = await client.call_tool("check_board_health", {})
critical_count = result["critical_issues"]
if critical_count > 0:
# Send alert
await notify_team(
f"CRITICAL: {critical_count} critical board health issues detected!"
)
await asyncio.sleep(300) # Check every 5 minutes
Integration with Other Systems#
Assignment Lease System#
Health analyzer can detect stuck tasks from lease data:
# Detect tasks with too many lease renewals
if hasattr(state, 'lease_manager'):
lease_stats = state.lease_manager.get_statistics()
if lease_stats['stuck_tasks'] > 0:
issues.append(HealthIssue(
type=HealthIssueType.STALE_TASK,
severity=IssueSeverity.HIGH,
title=f"{lease_stats['stuck_tasks']} Stuck Tasks (Lease System)",
description="Tasks have been renewed too many times",
recommendations=["Review stuck tasks", "Consider reassignment"]
))
Assignment Monitor#
Integrates with assignment monitor for orphan detection:
# Check for orphaned assignments
if hasattr(state, 'assignment_monitor'):
health = await state.assignment_monitor.check_assignment_health()
if not health['healthy']:
for issue in health['issues']:
if issue['type'] == 'orphaned_assignments':
# Add to board health issues
...
Future Enhancements#
Short-term Improvements#
Auto-Remediation: Automatically fix simple issues (e.g., unblock tasks)
Health Trends: Track health over time for pattern detection
Custom Checks: Allow project-specific health checks
Integration API: Webhook notifications for critical issues
Long-term Vision#
Predictive Analysis: Forecast future bottlenecks
AI Recommendations: ML-based suggestion improvements
Team Analytics: Correlate health with team performance
Automated Workflows: Trigger actions based on health status
Conclusion#
The Board Health Analyzer System provides Marcus with targeted diagnostic capabilities that identify and help resolve six critical board health issues. By analyzing skill mismatches, circular dependencies, bottlenecks, chain blocks, stale tasks, and workload imbalances, the system helps teams maintain healthy, efficient Kanban boards.
The analyzerβs practical focus on real-world problems, combined with actionable recommendations for each issue type, makes it an essential tool for project managers and team leads. Its integration as simple MCP tools ensures easy access for both human users and AI agents, enabling proactive board management and preventing common workflow problems before they impact project delivery.