MCP Tool Logging System#

Overview#

The MCP Tool Logging System provides comprehensive activity tracking for all Marcus MCP (Model Context Protocol) tool operations. It acts as a “table of contents” for MCP tool activity, logging WHAT failed and WHEN, while pointing operators to detailed diagnostic logs for WHY root cause analysis.

This logging system is intentionally simplified to avoid making assumptions about failure causes, instead serving as an index that guides operators to the appropriate diagnostic systems for investigation.

Purpose and Philosophy#

Design Principle: Activity Tracking, Not Diagnosis#

The MCP tool logger follows a clear separation of concerns:

Activity Tracker (WHAT/WHEN): Records that a tool was called and whether it succeeded or failed
Diagnostic Systems (WHY): Provide detailed root cause analysis (e.g., task_diagnostics.py)

This separation is crucial because:

MCP responses are intentionally vague - They don’t contain enough context to determine root cause
Agents shouldn’t overthink - Detailed failure analysis in MCP responses causes agents to try alternative strategies instead of following instructions
Diagnostics already exist - Detailed analysis systems run automatically and log to Python logs

Target Audience#

Operators/Developers: Need to investigate why tools fail to fix code or configuration issues
NOT Agents: Agents receive simple retry instructions, not diagnostic details

Architecture#

Integration Point#

The logger integrates at the centralized MCP handler dispatch point:

┌─────────────────────────────────────────────────────────┐
│              MCP Server (src/marcus_mcp/)               │
│                                                         │
│  ┌───────────────────────────────────────────────────┐ │
│  │  handlers.py::handle_tool_call()                  │ │
│  │  • Dispatches ALL MCP tool calls                  │ │
│  │  • Returns result to agent                        │ │
│  │  • Logs result via mcp_tool_logger ←────────────┐ │ │
│  └───────────────────────────────────────────────────┘ │ │
│                                                         │ │
│  Tool Functions:                                        │ │
│  • request_next_task()                                  │ │
│  • register_agent()                                     │ │
│  • create_project()                                     │ │
│  • [30+ other tools]                                    │ │
│                                                         │ │
└─────────────────────────────────────────────────────────┘
         │
         ├─── Agent receives simple response
         │    (success/failure + retry instructions)
         │
         └─── Logger captures activity
              │
              ├─── Conversation logs (logs/conversations/)
              │    • WARNING entries for failures
              │    • Full response context
              │    • Diagnostic pointers
              │
              └─── Python logs (logs/marcus_*.log)
                   • Detailed diagnostic reports
                   • Root cause analysis
                   • Automatic when needed

Log Stream Architecture#

Marcus uses multiple specialized log streams:

Conversation Logs (logs/conversations/marcus_*.log)
- Structured JSON logs (structlog)
- PM decisions, worker messages, MCP tool activity
- MCP tool logger writes here
Python Logs (logs/marcus_*.log)
- Traditional Python logging
- Diagnostic reports from task_diagnostics.py
- Dependency analysis, filtering statistics
Agent Event Logs (logs/agent_events/*.jsonl)
- Lightweight event tracking (JSON Lines)
- Agent lifecycle events
- Used by visualization systems

The MCP tool logger bridges streams by recording activity in conversation logs and pointing to diagnostic details in Python logs.

Implementation#

Core Module: `src/logging/mcp_tool_logger.py`#

Main Function: `log_mcp_tool_response()`#

def log_mcp_tool_response(
    tool_name: str,
    arguments: dict[str, Any],
    response: dict[str, Any],
) -> None:
    """
    Log MCP tool response for activity tracking.

    Records WHAT failed and WHEN. Does not attempt to determine
    WHY failures occurred (root cause analysis is in diagnostic logs).
    """
    success = response.get("success", False)

    if success:
        # DEBUG level - low noise
        conversation_logger.log_pm_thinking(
            f"MCP tool '{tool_name}' succeeded",
            context={
                "tool_name": tool_name,
                "arguments": arguments,
                "response": response,
            },
        )
    else:
        # WARNING level - needs attention
        _log_mcp_tool_failure(tool_name, arguments, response)

Key Design Decisions:

Success = DEBUG level: Low noise, only for detailed debugging
Failure = WARNING level: Consistent level for all failures
No categorization: Removed aggressive keyword-based classification
Full context: Entire response preserved for investigation

Failure Logging: `_log_mcp_tool_failure()`#

def _log_mcp_tool_failure(
    tool_name: str,
    arguments: dict[str, Any],
    response: dict[str, Any],
) -> None:
    """Log MCP tool failure for activity tracking."""
    error_msg = response.get("error", "Unknown error")
    retry_reason = response.get("retry_reason", "")

    # Log at WARNING level (no categorization)
    conversation_logger.pm_logger.warning(
        f"MCP tool '{tool_name}' returned failure",
        tool_name=tool_name,
        arguments=arguments,
        error=error_msg,
        retry_reason=retry_reason,
        response=response,
    )

    # Special case: request_next_task gets diagnostic pointer
    if tool_name == "request_next_task":
        conversation_logger.log_pm_thinking(
            "For root cause analysis of request_next_task failure, "
            "check Python logs (logs/marcus_*.log) for 'Diagnostic Report' "
            "entries near this timestamp",
            context={"hint": "Diagnostics run automatically when no tasks assignable"},
        )

Key Features:

Consistent WARNING level: No escalation to CRITICAL based on keywords
Diagnostic pointer: Only for request_next_task (most critical tool)
No assumptions: Logs error message as-is without interpretation
Full response preserved: All fields included for operator investigation

Integration: `src/marcus_mcp/handlers.py`#

The logger integrates at the centralized handler dispatch point (line 1414-1420):

# After tool execution
response = [types.TextContent(type="text", text=json.dumps(result, indent=2))]

# Log MCP tool response (especially failures) for diagnostics
if isinstance(result, dict):
    log_mcp_tool_response(
        tool_name=name,
        arguments=arguments,
        response=result,
    )

Benefits of this integration point:

✅ Automatic coverage: All 30+ MCP tools logged automatically
✅ Centralized: Single integration point, easy to maintain
✅ Future-proof: New tools automatically logged
✅ Post-execution: Logs actual results, not assumptions

Logging Behavior#

Success Cases#

Log Entry (DEBUG level):

{
  "timestamp": "2025-01-15T10:30:45.123Z",
  "event": "pm_thinking",
  "message": "MCP tool 'request_next_task' succeeded",
  "context": {
    "tool_name": "request_next_task",
    "arguments": {"agent_id": "agent_001"},
    "response": {
      "success": true,
      "task": {
        "id": "task_123",
        "name": "Implement API endpoint",
        "description": "...",
        "status": "todo"
      }
    }
  }
}

Failure Cases#

request_next_task Failure#

Conversation Log (WARNING level):

{
  "timestamp": "2025-01-15T10:35:22.456Z",
  "level": "warning",
  "event": "pm_logger",
  "message": "MCP tool 'request_next_task' returned failure",
  "tool_name": "request_next_task",
  "arguments": {"agent_id": "agent_002"},
  "error": "No suitable tasks available",
  "retry_reason": "All tasks assigned or blocked",
  "response": {
    "success": false,
    "error": "No suitable tasks available",
    "retry_reason": "All tasks assigned or blocked",
    "retry_after_seconds": 30
  }
}

Diagnostic Pointer (DEBUG level):

{
  "timestamp": "2025-01-15T10:35:22.457Z",
  "event": "pm_thinking",
  "message": "For root cause analysis of request_next_task failure, check Python logs (logs/marcus_*.log) for 'Diagnostic Report' entries near this timestamp",
  "context": {
    "hint": "Diagnostics run automatically when no tasks assignable"
  }
}

Python Log (Automatic Diagnostic Report):

2025-01-15 10:35:22,450 INFO - Diagnostic Report (for operators):
=== Task Assignment Diagnostics ===

Total Tasks: 5
TODO Tasks: 3
In Progress: 1
Completed: 1

Filtering Results:
- Started with: 3 TODO tasks
- After dependency filter: 0 tasks (blocked by incomplete dependencies)
- After skill filter: 0 tasks
- After assignment filter: 0 tasks

Dependency Chain Analysis:
- Task "Implement API" (task_456) blocked by incomplete "Setup Database" (task_123)
- Task "Write Tests" (task_789) blocked by incomplete "Implement API" (task_456)

Recommendation: Review dependency chain for potential circular dependencies

Other Tool Failures#

Conversation Log (WARNING level, NO diagnostic pointer):

{
  "timestamp": "2025-01-15T10:40:10.789Z",
  "level": "warning",
  "event": "pm_logger",
  "message": "MCP tool 'create_project' returned failure",
  "tool_name": "create_project",
  "arguments": {"name": "test-project", "description": "..."},
  "error": "Invalid project configuration",
  "retry_reason": "",
  "response": {
    "success": false,
    "error": "Invalid project configuration"
  }
}

Note: No diagnostic pointer for non-request_next_task tools.

Relationship to Diagnostic Systems#

Separation of Concerns#

The MCP tool logger is NOT a diagnostic system. It is an activity tracker that points to diagnostic systems:

Aspect	MCP Tool Logger	Diagnostic Systems
Purpose	Activity tracking	Root cause analysis
Answers	WHAT failed, WHEN	WHY it failed
Location	`src/logging/mcp_tool_logger.py`	`src/core/task_diagnostics.py`
Log Stream	Conversation logs	Python logs
Trigger	Every MCP tool call	Automatic when failures occur
Assumptions	None	Deep analysis
Audience	Quick overview	Detailed investigation

Diagnostic System: `task_diagnostics.py`#

When request_next_task cannot find suitable tasks, it automatically runs diagnostics:

# From src/marcus_mcp/tools/task.py (lines 946-989)
if todo_tasks:
    # Tasks exist but can't be assigned - run diagnostics
    logger.warning(f"No tasks assignable but {len(todo_tasks)} TODO tasks exist")

    diagnostic_report = await run_automatic_diagnostics(
        project_tasks=state.project_tasks,
        completed_task_ids=completed_task_ids,
        assigned_task_ids=assigned_task_ids,
    )

    # Format and log for operators
    formatted_report = format_diagnostic_report(diagnostic_report)
    logger.info(f"Diagnostic Report (for operators):\n{formatted_report}")

What Diagnostics Provide:

Dependency chain analysis (blocking tasks)
Task filtering statistics (why tasks were excluded)
Skill mismatch detection
Circular dependency detection
Recommended actions

Why Separate:

Diagnostics run automatically when needed
Detailed analysis logged to Python logs
MCP responses stay simple for agents
Operators get full context in separate stream

Why Categorization Was Removed#

Original Design (Removed)#

The initial implementation attempted to categorize failures:

# REMOVED: Aggressive categorization
def _categorize_failure(response: dict[str, Any]) -> str:
    """Categorize based on keywords."""
    error_msg = response.get("error", "").lower()

    if "dependency" in error_msg or "blocked by" in error_msg:
        return "dependency_issue"  # Assumption!
    if "busy" in error_msg:
        return "agent_busy"
    if "no suitable" in error_msg:
        return "no_suitable_tasks"
    return "unknown"

Problems Discovered#

MCP responses lack context:

# Response says:
{"error": "No suitable tasks available"}

# But WHY? Could be:
# - All tasks assigned
# - Dependencies blocking
# - Skill mismatch
# - Tasks filtered by other criteria
# - Circular dependencies

# We can't tell from the response!

Keyword matching is unreliable:
- “Task waiting for dependencies” → Labeled as dependency issue
- But diagnostic might show: “Actually all tasks are assigned”
- Keyword doesn’t equal root cause

Decision happens elsewhere:

# From src/core/ai_powered_task_assignment.py
unblocked_tasks = await self._filter_unblocked_tasks(...)
if not unblocked_tasks:
    logger.info("No unblocked tasks available")  # Logs here
    return None  # But doesn't pass info back!

The filtering logic knows WHY but doesn’t include it in the MCP response.

Diagnostics already exist:
- run_automatic_diagnostics() provides accurate analysis
- Runs automatically when needed
- No need to guess from keywords

Simplified Design (Current)#

# CURRENT: No categorization, just facts
def _log_mcp_tool_failure(...):
    """Log the failure as-is, no assumptions."""
    conversation_logger.pm_logger.warning(
        f"MCP tool '{tool_name}' returned failure",
        error=error_msg,  # Log what we know
        # NO failure_category field
    )

    # Point to real diagnostics
    if tool_name == "request_next_task":
        log_pm_thinking("Check Python logs for 'Diagnostic Report'...")

Benefits:

✅ No false assumptions
✅ Clear separation: activity tracking vs. diagnostics
✅ Points to accurate analysis
✅ Simpler, more maintainable

Usage Patterns#

For Operators#

When investigating MCP tool failures:

Check conversation logs for activity overview:

grep 'returned failure' logs/conversations/marcus_*.log | tail -20

For request_next_task failures, check Python logs for diagnostics:

grep -A 30 'Diagnostic Report' logs/marcus_*.log | tail -50

Filter by tool:

grep 'create_project.*returned failure' logs/conversations/marcus_*.log

Time-based investigation:

# Find failures around 10:35 AM
grep '10:35:' logs/conversations/marcus_*.log | grep 'returned failure'

For Developers#

Adding new MCP tools:

No changes needed! The logger automatically handles all tools via the centralized dispatch point.

Changing log format:

Update structured logging fields in _log_mcp_tool_failure():

context = {
    "tool_name": tool_name,
    "arguments": arguments,
    "error": error_msg,
    "retry_reason": retry_reason,
    "response": response,
    # Add new fields here
}

Adding diagnostic pointers for other tools:

# In _log_mcp_tool_failure()
if tool_name in ["request_next_task", "your_new_tool"]:
    conversation_logger.log_pm_thinking(
        "Check diagnostics at location X...",
        context={"hint": "..."}
    )

Testing#

The MCP tool logger has comprehensive test coverage across three levels:

1. Unit Tests (14 tests)#

Location: tests/unit/logging/test_mcp_tool_logger.py

Run:

pytest tests/unit/logging/test_mcp_tool_logger.py -v

Coverage:

Success/failure logging
Log levels (DEBUG/WARNING)
Diagnostic pointers
Integration with conversation logger
Activity tracking behavior (WHAT/WHEN, not WHY)
No categorization assumptions

2. Integration Tests (11 tests)#

Location: tests/integration/logging/test_mcp_tool_logger_integration.py

Run:

pytest tests/integration/logging/test_mcp_tool_logger_integration.py -v

Coverage:

Real-world failure scenarios
Log entry creation with real logger
Structured logging format
End-to-end convenience functions

3. Live System Testing#

Location: dev-tools/examples/test_mcp_logger_live.py

Run:

# Terminal 1: Start Marcus
python -m src.marcus_mcp.server --http

# Terminal 2: Run live test
python dev-tools/examples/test_mcp_logger_live.py

Tests 4 scenarios:

No tasks available
Agent already busy
All tasks assigned
Other tool failures

Verification:

# Activity tracking
grep 'returned failure' logs/conversations/marcus_*.log

# Diagnostic reports
grep -A 30 'Diagnostic Report' logs/marcus_*.log

For complete testing guide, see: docs/testing/MCP_TOOL_LOGGER_TESTING.md

Performance Characteristics#

Low Overhead Design#

Async-friendly: Uses structlog’s async logging
Minimal processing: No complex categorization logic
Conditional logging: DEBUG logs only if enabled
Efficient serialization: JSON serialization by structlog

Performance Metrics#

# Measured performance (1000 log entries)
Average: 0.8ms per log entry
Total overhead: <1% of MCP tool execution time

Why it’s fast:

Single log write per tool call
No synchronous I/O blocking
No expensive analysis or categorization
Structlog handles async buffering

Monitoring and Maintenance#

Health Indicators#

Healthy System:

# Mostly successes, few failures
grep 'MCP tool' logs/conversations/marcus_*.log | \
    grep -c 'succeeded'
# Example output: 245

grep 'returned failure' logs/conversations/marcus_*.log | wc -l
# Example output: 12

Problem Indicators:

# High failure rate for specific tool
grep 'request_next_task.*returned failure' logs/conversations/marcus_*.log | wc -l
# If consistently high: investigate diagnostics

# Unexpected tool failures
grep 'create_project.*returned failure' logs/conversations/marcus_*.log
# If frequent: check project configuration

Log Rotation#

Conversation logs use automatic rotation (via structlog configuration):

Daily rotation
Compression after 7 days
Retention: 30 days

Configuration: src/logging/conversation_logger.py

Troubleshooting#

No log entries created:

# Check logger initialization
python -c "from src.logging.mcp_tool_logger import log_mcp_tool_response; print('OK')"

# Check log directory
ls -la logs/conversations/

Missing diagnostic pointers:

Verify tool name is exactly "request_next_task" (case-sensitive)
Check _log_mcp_tool_failure() logic at line 102-108

Wrong log levels:

Verify conversation logger configuration
Check structlog setup in conversation_logger.py

Comparison to Other Monitoring Systems#

vs. Assignment Monitor#

Aspect	MCP Tool Logger	Assignment Monitor
Scope	All MCP tool calls	Task assignments only
Trigger	Every tool call	Assignment state changes
Purpose	Activity tracking	Detect assignment reversions
Log Type	Structured JSON	Specialized alerts
File	`mcp_tool_logger.py`	`assignment_monitor.py`

vs. Error Predictor#

Aspect	MCP Tool Logger	Error Predictor
Scope	Historical activity	Predictive analysis
Timing	Real-time (during execution)	Proactive (before execution)
Purpose	Record what happened	Predict what might happen
Method	Logging	AI pattern analysis
File	`mcp_tool_logger.py`	`error_predictor.py`

vs. Task Diagnostics#

Aspect	MCP Tool Logger	Task Diagnostics
Scope	All tools	Task assignment only
Depth	Activity tracking	Deep analysis
Answers	WHAT/WHEN	WHY
Trigger	Every call	Automatic on failures
Log Stream	Conversation logs	Python logs
File	`mcp_tool_logger.py`	`task_diagnostics.py`

Complementary Design: These systems work together, each serving a specific purpose.

Future Enhancements#

Potential Improvements#

Metrics Collection:
- Tool success/failure rates
- Average response times
- Most common failure patterns
- Export to dashboard
Enhanced Diagnostic Linking:
- Include diagnostic report IDs in logs
- Direct links between activity and diagnostics
- Correlation IDs across log streams
Alert Integration:
- Trigger alerts on sustained high failure rates
- Integrate with monitoring dashboards
- Webhook notifications for critical tools
Structured Metrics:
- Prometheus/OpenTelemetry integration
- Real-time dashboards
- Trend analysis

Non-Goals#

❌ Root cause analysis (handled by diagnostics)
❌ Failure categorization (unreliable from MCP responses)
❌ Predictive analysis (handled by error predictor)
❌ Assignment tracking (handled by assignment monitor)

The MCP tool logger maintains its focused role as an activity tracker, leaving specialized analysis to purpose-built systems.

MCP Tool Logging System#

Overview#

Purpose and Philosophy#

Design Principle: Activity Tracking, Not Diagnosis#

Target Audience#

Architecture#

Integration Point#

Log Stream Architecture#

Implementation#

Core Module: src/logging/mcp_tool_logger.py#

Main Function: log_mcp_tool_response()#

Failure Logging: _log_mcp_tool_failure()#

Integration: src/marcus_mcp/handlers.py#

Logging Behavior#

Success Cases#

Failure Cases#

request_next_task Failure#

Other Tool Failures#

Relationship to Diagnostic Systems#

Separation of Concerns#

Diagnostic System: task_diagnostics.py#

Why Categorization Was Removed#

Original Design (Removed)#

Problems Discovered#

Simplified Design (Current)#

Usage Patterns#

For Operators#

For Developers#

Testing#

1. Unit Tests (14 tests)#

2. Integration Tests (11 tests)#

3. Live System Testing#

Performance Characteristics#

Low Overhead Design#

Performance Metrics#

Monitoring and Maintenance#

Health Indicators#

Log Rotation#

Troubleshooting#

Comparison to Other Monitoring Systems#

vs. Assignment Monitor#

vs. Error Predictor#

vs. Task Diagnostics#

Related Documentation#

Implementation#

Testing#

Related Systems#

Future Enhancements#

Potential Improvements#

Non-Goals#

Core Module: `src/logging/mcp_tool_logger.py`#

Main Function: `log_mcp_tool_response()`#

Failure Logging: `_log_mcp_tool_failure()`#

Integration: `src/marcus_mcp/handlers.py`#

Diagnostic System: `task_diagnostics.py`#