Activity Tracking vs. Diagnostics#
Core Philosophy#
Marcus employs a clear separation between activity tracking (recording what happened) and diagnostics (analyzing why it happened). This separation improves system maintainability, reduces false assumptions, and provides appropriate information to different audiences.
The Problem: Everything Mixed Together#
Anti-Pattern: Diagnostic Logging Everywhere#
A common mistake in distributed systems is mixing activity tracking with diagnostic analysis:
# β BAD: Mixed concerns
def handle_task_request(agent_id):
result = find_task_for_agent(agent_id)
if not result:
# Logging mixes WHAT with assumed WHY
if has_dependency_keywords(error):
logger.critical("DEPENDENCY ISSUE: ...") # Assumption!
elif has_busy_keywords(error):
logger.warning("Agent busy: ...")
else:
logger.error("Unknown failure: ...")
# Diagnostic analysis in logging code
run_dependency_analysis() # Wrong place!
check_agent_skills() # Wrong place!
return result
Problems:
False assumptions: Keywords donβt equal root cause
Mixed audiences: Operators need different info than agents
Maintenance burden: Changes require updating multiple locations
Poor separation: Activity tracking code becomes diagnostic code
Marcus Solution: Clear Separation#
Two-Layer Design#
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ACTIVITY TRACKING LAYER β
β Records: WHAT happened, WHEN, WHO was involved β
β Purpose: Index/table of contents for operations β
β Audience: Quick overview, correlation β
β Example: MCP Tool Logger β
β Location: Conversation logs β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
Points to (when needed)
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DIAGNOSTIC LAYER β
β Analyzes: WHY it happened, root cause, context β
β Purpose: Deep investigation, problem solving β
β Audience: Operators fixing issues β
β Example: Task Diagnostics, Dependency Analyzer β
β Location: Python logs, specialized reports β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Activity Tracking Layer#
Characteristics:
Simple: Records events as they happen
Factual: No interpretation or analysis
Fast: Minimal processing overhead
Consistent: Same format for all events
Indexable: Easy to search and correlate
Example: MCP Tool Logger
# β
GOOD: Just record the activity
def log_mcp_tool_response(tool_name, arguments, response):
"""Record WHAT failed and WHEN."""
if response["success"]:
logger.debug(f"Tool '{tool_name}' succeeded")
else:
logger.warning(
f"Tool '{tool_name}' returned failure",
tool_name=tool_name,
arguments=arguments,
error=response.get("error"),
response=response, # Full context preserved
)
# Point to diagnostics (don't run them here!)
if tool_name == "request_next_task":
logger.debug("Check Python logs for 'Diagnostic Report'")
Benefits:
β No assumptions about cause
β Consistent WARNING level
β Full context preserved
β Fast execution
β Easy to maintain
Diagnostic Layer#
Characteristics:
Deep: Analyzes context and relationships
Specialized: Purpose-built for specific problems
Selective: Only runs when needed
Detailed: Provides actionable insights
Separate: Runs in appropriate context
Example: Task Diagnostics
# β
GOOD: Separate diagnostic system
async def run_automatic_diagnostics(project_tasks, completed_ids, assigned_ids):
"""
Deep analysis of WHY tasks can't be assigned.
Runs automatically when request_next_task fails,
NOT during every tool call.
"""
# Collect comprehensive data
collector = TaskDiagnosticCollector(project_tasks)
stats = collector.collect_filtering_stats(completed_ids, assigned_ids)
# Analyze dependencies
analyzer = DependencyChainAnalyzer(project_tasks)
dependency_issues = analyzer.analyze_chains()
# Analyze skills
skill_mismatches = analyzer.analyze_skill_requirements()
# Generate actionable report
report = DiagnosticReportGenerator(
project_tasks, stats, dependency_issues, skill_mismatches
)
# Log to Python logs (separate stream)
logger.info(f"Diagnostic Report (for operators):\n{report.format()}")
return report
Benefits:
β Accurate root cause analysis
β Runs in appropriate context
β Separate log stream
β Actionable recommendations
β Purpose-built for problem
Real-World Example: request_next_task#
The Scenario#
Agent calls request_next_task β receives {"success": false, "error": "No suitable tasks available"}
Why is it failing? (Multiple possible causes)#
All tasks assigned to other agents
Dependencies blocking - tasks depend on incomplete work
Skill mismatch - agent lacks required skills
Circular dependencies - deadlock in task chain
Tasks filtered by other criteria (status, priority, etc.)
The MCP response doesnβt tell us which!
Activity Tracking Records WHAT/WHEN#
Conversation Log Entry:
{
"timestamp": "2025-01-15T10:35:22.456Z",
"level": "warning",
"message": "MCP tool 'request_next_task' returned failure",
"tool_name": "request_next_task",
"arguments": {"agent_id": "agent_002"},
"error": "No suitable tasks available"
}
What we know:
β WHAT: request_next_task failed
β WHEN: 10:35:22 on Jan 15
β WHO: agent_002
β WHY: Unknown (need diagnostics)
Diagnostics Analyze WHY#
Automatic Trigger:
# In src/marcus_mcp/tools/task.py
if todo_tasks and not assignable_tasks:
# Activity tracker already logged the failure
# Now run diagnostics to understand WHY
diagnostic_report = await run_automatic_diagnostics(...)
Python Log Entry:
2025-01-15 10:35:22,450 INFO - Diagnostic Report (for operators):
=== Task Assignment Diagnostics ===
Total Tasks: 5
TODO Tasks: 3
In Progress: 1
Completed: 1
Filtering Results:
- Started with: 3 TODO tasks
- After dependency filter: 0 tasks β HERE'S WHY!
- After skill filter: 0 tasks
- After assignment filter: 0 tasks
Dependency Chain Analysis:
- Task "Implement API" (task_456) blocked by incomplete "Setup Database" (task_123)
- Task "Write Tests" (task_789) blocked by incomplete "Implement API" (task_456)
Root Cause: Dependencies blocking
Recommendation: Complete "Setup Database" to unblock chain
Now we know:
β WHY: Dependencies blocking
β WHICH tasks: Specific IDs and names
β WHAT to do: Complete task_123
β CONTEXT: Full dependency chain
Why Mixing is Bad: Real Examples#
Example 1: Keyword-Based Categorization#
# β BAD: Activity tracker tries to diagnose
def log_failure(error_msg):
if "dependency" in error_msg.lower():
logger.critical("DEPENDENCY ISSUE!") # Assumption!
return "dependency_issue"
# Reality:
error_msg = "No suitable tasks available"
# Could be dependency issue, but also could be:
# - All tasks assigned
# - Skill mismatch
# - No tasks exist
# We can't tell from the message!
Problem: Keyword β Root cause
Example 2: Wrong Log Level Escalation#
# β BAD: Escalate based on keywords
if "blocked by" in error:
logger.critical("Critical dependency issue!") # Misleading!
else:
logger.warning("Normal failure")
# Reality:
# "blocked by" might appear in retry message
# But diagnostic shows: "Actually, all tasks are assigned"
# Operator sees CRITICAL but it's not a dependency issue!
Problem: False urgency, wasted investigation time
Example 3: Analysis in Wrong Place#
# β BAD: Diagnostic logic in logging code
def log_mcp_failure(tool_name, response):
logger.warning(f"{tool_name} failed")
# Diagnostic work in logging layer!
if tool_name == "request_next_task":
tasks = get_all_tasks() # Expensive!
deps = analyze_dependencies(tasks) # More expensive!
logger.info(f"Dependency analysis: {deps}")
# Problems:
# 1. Runs on EVERY failure (wasteful)
# 2. Mixed in logging code (wrong place)
# 3. Duplicate work (diagnostics run elsewhere too)
# 4. Can't leverage existing diagnostic context
Problem: Wrong layer, duplicate work, performance impact
Design Principles#
1. Single Responsibility#
Activity Tracking:
Records events
Preserves context
Points to diagnostics
Diagnostics:
Analyzes problems
Determines root cause
Recommends actions
Donβt mix them!
2. Appropriate Timing#
Activity Tracking:
Runs: Always (low overhead)
When: During/after operation
Fast: < 1ms
Diagnostics:
Runs: When needed (selective)
When: After failure detected
Thorough: May take seconds
3. Correct Audience#
Activity Tracking:
For: Quick overview, correlation
Format: Structured logs, searchable
Location: Conversation logs (indexed)
Diagnostics:
For: Deep investigation
Format: Detailed reports
Location: Python logs (detailed)
4. No Assumptions#
Activity Tracking:
Records facts only
No interpretation
No categorization
Full context preserved
Diagnostics:
Makes informed analysis
Uses full system context
Considers relationships
Provides evidence
Practical Benefits#
For Operators#
Quick Investigation:
# Step 1: What failed recently?
grep 'returned failure' logs/conversations/marcus_*.log | tail -10
# Step 2: Lots of request_next_task failures?
grep 'request_next_task.*failure' logs/conversations/marcus_*.log | wc -l
# Step 3: Why? Check diagnostics near that time
grep -A 30 'Diagnostic Report' logs/marcus_*.log | tail -50
Benefits:
β Fast triage (activity logs)
β Deep dive when needed (diagnostics)
β Clear correlation (timestamps)
β No false assumptions
For Developers#
Maintainability:
# Activity tracker: Simple, stable
def log_activity(tool, result):
"""Just record what happened."""
logger.log(level, message, **context)
# Diagnostics: Complex, evolving
class TaskDiagnostics:
"""Deep analysis, can evolve independently."""
def analyze_dependencies(self): ...
def analyze_skills(self): ...
def generate_report(self): ...
Benefits:
β Clear separation of concerns
β Easy to test independently
β Can evolve separately
β Diagnostic complexity doesnβt affect logging
For System Performance#
Efficient Resource Usage:
# Activity logging: Always on, minimal cost
log_activity() # < 1ms, always safe
# Diagnostics: Selective, when needed
if failure_needs_investigation:
run_diagnostics() # 10-100ms, but selective
Benefits:
β Low overhead for activity tracking
β Expensive analysis only when needed
β No performance impact on happy path
Anti-Patterns to Avoid#
β Diagnostic Logic in Activity Tracking#
# BAD: Mixing concerns
def log_tool_failure(tool_name, response):
logger.warning(f"{tool_name} failed")
# Don't do diagnostic work here!
if "dependency" in str(response): # Keyword matching
category = "dependency_issue" # Assumption!
logger.critical("Dependency problem detected!") # Wrong level!
analyze_dependencies() # Wrong place!
β Activity Tracking in Diagnostic Code#
# BAD: Diagnostics shouldn't do activity logging
def run_diagnostics():
# Diagnostic code
deps = analyze_dependencies()
# Don't log activity here!
logger.warning("Tool failed because dependencies") # Wrong place!
return report
β Duplicate Information#
# BAD: Logging same info in multiple places
def handle_failure():
# Activity tracker logs it
log_activity("tool_name", result)
# Diagnostics also log the failure
logger.warning("Tool failed") # Duplicate!
# Analysis repeats failure details
report = f"Tool failed because..." # More duplication!
Implementation Checklist#
When building a new feature, ensure clear separation:
Activity Tracking Implementation#
Records event occurrence (WHAT/WHEN)
Uses consistent log level
Preserves full context
No interpretation or analysis
Fast execution (< 1ms)
Points to diagnostics if available
Uses conversation logs (or appropriate stream)
Diagnostic Implementation#
Runs selectively (when needed)
Analyzes root cause (WHY)
Provides actionable recommendations
Uses full system context
Separate log stream (Python logs, reports)
Can take time (thorough analysis)
Purpose-built for specific problems
Conclusion#
The separation between activity tracking and diagnostics is a fundamental design principle in Marcus. It ensures:
Appropriate information for different audiences
Clear responsibilities for each component
No false assumptions about failure causes
Efficient resource usage (selective expensive analysis)
Maintainability (concerns evolve independently)
When in doubt:
Activity tracking: Record what happened (facts only)
Diagnostics: Analyze why it happened (when needed)
Never mix them in the same code!