# Activity Tracking vs. Diagnostics ## Core Philosophy Marcus employs a clear separation between **activity tracking** (recording what happened) and **diagnostics** (analyzing why it happened). This separation improves system maintainability, reduces false assumptions, and provides appropriate information to different audiences. ## The Problem: Everything Mixed Together ### Anti-Pattern: Diagnostic Logging Everywhere A common mistake in distributed systems is mixing activity tracking with diagnostic analysis: ```python # ❌ BAD: Mixed concerns def handle_task_request(agent_id): result = find_task_for_agent(agent_id) if not result: # Logging mixes WHAT with assumed WHY if has_dependency_keywords(error): logger.critical("DEPENDENCY ISSUE: ...") # Assumption! elif has_busy_keywords(error): logger.warning("Agent busy: ...") else: logger.error("Unknown failure: ...") # Diagnostic analysis in logging code run_dependency_analysis() # Wrong place! check_agent_skills() # Wrong place! return result ``` **Problems:** 1. **False assumptions:** Keywords don't equal root cause 2. **Mixed audiences:** Operators need different info than agents 3. **Maintenance burden:** Changes require updating multiple locations 4. **Poor separation:** Activity tracking code becomes diagnostic code ## Marcus Solution: Clear Separation ### Two-Layer Design ``` ┌─────────────────────────────────────────────────────┐ │ ACTIVITY TRACKING LAYER │ │ Records: WHAT happened, WHEN, WHO was involved │ │ Purpose: Index/table of contents for operations │ │ Audience: Quick overview, correlation │ │ Example: MCP Tool Logger │ │ Location: Conversation logs │ └─────────────────────────────────────────────────────┘ ↓ Points to (when needed) ↓ ┌─────────────────────────────────────────────────────┐ │ DIAGNOSTIC LAYER │ │ Analyzes: WHY it happened, root cause, context │ │ Purpose: Deep investigation, problem solving │ │ Audience: Operators fixing issues │ │ Example: Task Diagnostics, Dependency Analyzer │ │ Location: Python logs, specialized reports │ └─────────────────────────────────────────────────────┘ ``` ### Activity Tracking Layer **Characteristics:** - **Simple:** Records events as they happen - **Factual:** No interpretation or analysis - **Fast:** Minimal processing overhead - **Consistent:** Same format for all events - **Indexable:** Easy to search and correlate **Example: MCP Tool Logger** ```python # ✅ GOOD: Just record the activity def log_mcp_tool_response(tool_name, arguments, response): """Record WHAT failed and WHEN.""" if response["success"]: logger.debug(f"Tool '{tool_name}' succeeded") else: logger.warning( f"Tool '{tool_name}' returned failure", tool_name=tool_name, arguments=arguments, error=response.get("error"), response=response, # Full context preserved ) # Point to diagnostics (don't run them here!) if tool_name == "request_next_task": logger.debug("Check Python logs for 'Diagnostic Report'") ``` **Benefits:** - ✅ No assumptions about cause - ✅ Consistent WARNING level - ✅ Full context preserved - ✅ Fast execution - ✅ Easy to maintain ### Diagnostic Layer **Characteristics:** - **Deep:** Analyzes context and relationships - **Specialized:** Purpose-built for specific problems - **Selective:** Only runs when needed - **Detailed:** Provides actionable insights - **Separate:** Runs in appropriate context **Example: Task Diagnostics** ```python # ✅ GOOD: Separate diagnostic system async def run_automatic_diagnostics(project_tasks, completed_ids, assigned_ids): """ Deep analysis of WHY tasks can't be assigned. Runs automatically when request_next_task fails, NOT during every tool call. """ # Collect comprehensive data collector = TaskDiagnosticCollector(project_tasks) stats = collector.collect_filtering_stats(completed_ids, assigned_ids) # Analyze dependencies analyzer = DependencyChainAnalyzer(project_tasks) dependency_issues = analyzer.analyze_chains() # Analyze skills skill_mismatches = analyzer.analyze_skill_requirements() # Generate actionable report report = DiagnosticReportGenerator( project_tasks, stats, dependency_issues, skill_mismatches ) # Log to Python logs (separate stream) logger.info(f"Diagnostic Report (for operators):\n{report.format()}") return report ``` **Benefits:** - ✅ Accurate root cause analysis - ✅ Runs in appropriate context - ✅ Separate log stream - ✅ Actionable recommendations - ✅ Purpose-built for problem ## Real-World Example: request_next_task ### The Scenario Agent calls `request_next_task` → receives `{"success": false, "error": "No suitable tasks available"}` ### Why is it failing? (Multiple possible causes) 1. **All tasks assigned** to other agents 2. **Dependencies blocking** - tasks depend on incomplete work 3. **Skill mismatch** - agent lacks required skills 4. **Circular dependencies** - deadlock in task chain 5. **Tasks filtered** by other criteria (status, priority, etc.) The MCP response doesn't tell us which! ### Activity Tracking Records WHAT/WHEN **Conversation Log Entry:** ```json { "timestamp": "2025-01-15T10:35:22.456Z", "level": "warning", "message": "MCP tool 'request_next_task' returned failure", "tool_name": "request_next_task", "arguments": {"agent_id": "agent_002"}, "error": "No suitable tasks available" } ``` **What we know:** - ✅ WHAT: request_next_task failed - ✅ WHEN: 10:35:22 on Jan 15 - ✅ WHO: agent_002 - ❌ WHY: Unknown (need diagnostics) ### Diagnostics Analyze WHY **Automatic Trigger:** ```python # In src/marcus_mcp/tools/task.py if todo_tasks and not assignable_tasks: # Activity tracker already logged the failure # Now run diagnostics to understand WHY diagnostic_report = await run_automatic_diagnostics(...) ``` **Python Log Entry:** ``` 2025-01-15 10:35:22,450 INFO - Diagnostic Report (for operators): === Task Assignment Diagnostics === Total Tasks: 5 TODO Tasks: 3 In Progress: 1 Completed: 1 Filtering Results: - Started with: 3 TODO tasks - After dependency filter: 0 tasks ← HERE'S WHY! - After skill filter: 0 tasks - After assignment filter: 0 tasks Dependency Chain Analysis: - Task "Implement API" (task_456) blocked by incomplete "Setup Database" (task_123) - Task "Write Tests" (task_789) blocked by incomplete "Implement API" (task_456) Root Cause: Dependencies blocking Recommendation: Complete "Setup Database" to unblock chain ``` **Now we know:** - ✅ WHY: Dependencies blocking - ✅ WHICH tasks: Specific IDs and names - ✅ WHAT to do: Complete task_123 - ✅ CONTEXT: Full dependency chain ## Why Mixing is Bad: Real Examples ### Example 1: Keyword-Based Categorization ```python # ❌ BAD: Activity tracker tries to diagnose def log_failure(error_msg): if "dependency" in error_msg.lower(): logger.critical("DEPENDENCY ISSUE!") # Assumption! return "dependency_issue" # Reality: error_msg = "No suitable tasks available" # Could be dependency issue, but also could be: # - All tasks assigned # - Skill mismatch # - No tasks exist # We can't tell from the message! ``` **Problem:** Keyword ≠ Root cause ### Example 2: Wrong Log Level Escalation ```python # ❌ BAD: Escalate based on keywords if "blocked by" in error: logger.critical("Critical dependency issue!") # Misleading! else: logger.warning("Normal failure") # Reality: # "blocked by" might appear in retry message # But diagnostic shows: "Actually, all tasks are assigned" # Operator sees CRITICAL but it's not a dependency issue! ``` **Problem:** False urgency, wasted investigation time ### Example 3: Analysis in Wrong Place ```python # ❌ BAD: Diagnostic logic in logging code def log_mcp_failure(tool_name, response): logger.warning(f"{tool_name} failed") # Diagnostic work in logging layer! if tool_name == "request_next_task": tasks = get_all_tasks() # Expensive! deps = analyze_dependencies(tasks) # More expensive! logger.info(f"Dependency analysis: {deps}") # Problems: # 1. Runs on EVERY failure (wasteful) # 2. Mixed in logging code (wrong place) # 3. Duplicate work (diagnostics run elsewhere too) # 4. Can't leverage existing diagnostic context ``` **Problem:** Wrong layer, duplicate work, performance impact ## Design Principles ### 1. Single Responsibility **Activity Tracking:** - Records events - Preserves context - Points to diagnostics **Diagnostics:** - Analyzes problems - Determines root cause - Recommends actions Don't mix them! ### 2. Appropriate Timing **Activity Tracking:** - Runs: Always (low overhead) - When: During/after operation - Fast: < 1ms **Diagnostics:** - Runs: When needed (selective) - When: After failure detected - Thorough: May take seconds ### 3. Correct Audience **Activity Tracking:** - **For:** Quick overview, correlation - **Format:** Structured logs, searchable - **Location:** Conversation logs (indexed) **Diagnostics:** - **For:** Deep investigation - **Format:** Detailed reports - **Location:** Python logs (detailed) ### 4. No Assumptions **Activity Tracking:** - Records facts only - No interpretation - No categorization - Full context preserved **Diagnostics:** - Makes informed analysis - Uses full system context - Considers relationships - Provides evidence ## Practical Benefits ### For Operators **Quick Investigation:** ```bash # Step 1: What failed recently? grep 'returned failure' logs/conversations/marcus_*.log | tail -10 # Step 2: Lots of request_next_task failures? grep 'request_next_task.*failure' logs/conversations/marcus_*.log | wc -l # Step 3: Why? Check diagnostics near that time grep -A 30 'Diagnostic Report' logs/marcus_*.log | tail -50 ``` **Benefits:** - ✅ Fast triage (activity logs) - ✅ Deep dive when needed (diagnostics) - ✅ Clear correlation (timestamps) - ✅ No false assumptions ### For Developers **Maintainability:** ```python # Activity tracker: Simple, stable def log_activity(tool, result): """Just record what happened.""" logger.log(level, message, **context) # Diagnostics: Complex, evolving class TaskDiagnostics: """Deep analysis, can evolve independently.""" def analyze_dependencies(self): ... def analyze_skills(self): ... def generate_report(self): ... ``` **Benefits:** - ✅ Clear separation of concerns - ✅ Easy to test independently - ✅ Can evolve separately - ✅ Diagnostic complexity doesn't affect logging ### For System Performance **Efficient Resource Usage:** ```python # Activity logging: Always on, minimal cost log_activity() # < 1ms, always safe # Diagnostics: Selective, when needed if failure_needs_investigation: run_diagnostics() # 10-100ms, but selective ``` **Benefits:** - ✅ Low overhead for activity tracking - ✅ Expensive analysis only when needed - ✅ No performance impact on happy path ## Anti-Patterns to Avoid ### ❌ Diagnostic Logic in Activity Tracking ```python # BAD: Mixing concerns def log_tool_failure(tool_name, response): logger.warning(f"{tool_name} failed") # Don't do diagnostic work here! if "dependency" in str(response): # Keyword matching category = "dependency_issue" # Assumption! logger.critical("Dependency problem detected!") # Wrong level! analyze_dependencies() # Wrong place! ``` ### ❌ Activity Tracking in Diagnostic Code ```python # BAD: Diagnostics shouldn't do activity logging def run_diagnostics(): # Diagnostic code deps = analyze_dependencies() # Don't log activity here! logger.warning("Tool failed because dependencies") # Wrong place! return report ``` ### ❌ Duplicate Information ```python # BAD: Logging same info in multiple places def handle_failure(): # Activity tracker logs it log_activity("tool_name", result) # Diagnostics also log the failure logger.warning("Tool failed") # Duplicate! # Analysis repeats failure details report = f"Tool failed because..." # More duplication! ``` ## Implementation Checklist When building a new feature, ensure clear separation: ### Activity Tracking Implementation - [ ] Records event occurrence (WHAT/WHEN) - [ ] Uses consistent log level - [ ] Preserves full context - [ ] No interpretation or analysis - [ ] Fast execution (< 1ms) - [ ] Points to diagnostics if available - [ ] Uses conversation logs (or appropriate stream) ### Diagnostic Implementation - [ ] Runs selectively (when needed) - [ ] Analyzes root cause (WHY) - [ ] Provides actionable recommendations - [ ] Uses full system context - [ ] Separate log stream (Python logs, reports) - [ ] Can take time (thorough analysis) - [ ] Purpose-built for specific problems ## Related Concepts ### Activity Tracking in Marcus - **MCP Tool Logger** - Tracks MCP tool operations - **Agent Event Logs** - Tracks agent lifecycle - **Conversation Logs** - Tracks PM decisions, worker messages ### Diagnostics in Marcus - **Task Diagnostics** - Analyzes task assignment failures - **Dependency Analyzer** - Analyzes dependency chains - **Assignment Monitor** - Detects assignment issues - **Error Predictor** - Predicts potential failures ### Integration Patterns - **Hybrid Monitoring** - Activity + diagnostics working together - **Correlation IDs** - Linking activity to diagnostic reports - **Layered Logging** - Multiple log streams for different purposes ## Conclusion The separation between activity tracking and diagnostics is a fundamental design principle in Marcus. It ensures: 1. **Appropriate information** for different audiences 2. **Clear responsibilities** for each component 3. **No false assumptions** about failure causes 4. **Efficient resource usage** (selective expensive analysis) 5. **Maintainability** (concerns evolve independently) When in doubt: - **Activity tracking:** Record what happened (facts only) - **Diagnostics:** Analyze why it happened (when needed) Never mix them in the same code!