MCP Tool Logging System#

Overview#

The MCP Tool Logging System provides comprehensive activity tracking for all Marcus MCP (Model Context Protocol) tool operations. It acts as a β€œtable of contents” for MCP tool activity, logging WHAT failed and WHEN, while pointing operators to detailed diagnostic logs for WHY root cause analysis.

This logging system is intentionally simplified to avoid making assumptions about failure causes, instead serving as an index that guides operators to the appropriate diagnostic systems for investigation.

Purpose and Philosophy#

Design Principle: Activity Tracking, Not Diagnosis#

The MCP tool logger follows a clear separation of concerns:

  • Activity Tracker (WHAT/WHEN): Records that a tool was called and whether it succeeded or failed

  • Diagnostic Systems (WHY): Provide detailed root cause analysis (e.g., task_diagnostics.py)

This separation is crucial because:

  1. MCP responses are intentionally vague - They don’t contain enough context to determine root cause

  2. Agents shouldn’t overthink - Detailed failure analysis in MCP responses causes agents to try alternative strategies instead of following instructions

  3. Diagnostics already exist - Detailed analysis systems run automatically and log to Python logs

Target Audience#

  • Operators/Developers: Need to investigate why tools fail to fix code or configuration issues

  • NOT Agents: Agents receive simple retry instructions, not diagnostic details

Architecture#

Integration Point#

The logger integrates at the centralized MCP handler dispatch point:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              MCP Server (src/marcus_mcp/)               β”‚
β”‚                                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  handlers.py::handle_tool_call()                  β”‚ β”‚
β”‚  β”‚  β€’ Dispatches ALL MCP tool calls                  β”‚ β”‚
β”‚  β”‚  β€’ Returns result to agent                        β”‚ β”‚
β”‚  β”‚  β€’ Logs result via mcp_tool_logger ←────────────┐ β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚                                                         β”‚ β”‚
β”‚  Tool Functions:                                        β”‚ β”‚
β”‚  β€’ request_next_task()                                  β”‚ β”‚
β”‚  β€’ register_agent()                                     β”‚ β”‚
β”‚  β€’ create_project()                                     β”‚ β”‚
β”‚  β€’ [30+ other tools]                                    β”‚ β”‚
β”‚                                                         β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β”œβ”€β”€β”€ Agent receives simple response
         β”‚    (success/failure + retry instructions)
         β”‚
         └─── Logger captures activity
              β”‚
              β”œβ”€β”€β”€ Conversation logs (logs/conversations/)
              β”‚    β€’ WARNING entries for failures
              β”‚    β€’ Full response context
              β”‚    β€’ Diagnostic pointers
              β”‚
              └─── Python logs (logs/marcus_*.log)
                   β€’ Detailed diagnostic reports
                   β€’ Root cause analysis
                   β€’ Automatic when needed

Log Stream Architecture#

Marcus uses multiple specialized log streams:

  1. Conversation Logs (logs/conversations/marcus_*.log)

    • Structured JSON logs (structlog)

    • PM decisions, worker messages, MCP tool activity

    • MCP tool logger writes here

  2. Python Logs (logs/marcus_*.log)

    • Traditional Python logging

    • Diagnostic reports from task_diagnostics.py

    • Dependency analysis, filtering statistics

  3. Agent Event Logs (logs/agent_events/*.jsonl)

    • Lightweight event tracking (JSON Lines)

    • Agent lifecycle events

    • Used by visualization systems

The MCP tool logger bridges streams by recording activity in conversation logs and pointing to diagnostic details in Python logs.

Implementation#

Core Module: src/logging/mcp_tool_logger.py#

Main Function: log_mcp_tool_response()#

def log_mcp_tool_response(
    tool_name: str,
    arguments: dict[str, Any],
    response: dict[str, Any],
) -> None:
    """
    Log MCP tool response for activity tracking.

    Records WHAT failed and WHEN. Does not attempt to determine
    WHY failures occurred (root cause analysis is in diagnostic logs).
    """
    success = response.get("success", False)

    if success:
        # DEBUG level - low noise
        conversation_logger.log_pm_thinking(
            f"MCP tool '{tool_name}' succeeded",
            context={
                "tool_name": tool_name,
                "arguments": arguments,
                "response": response,
            },
        )
    else:
        # WARNING level - needs attention
        _log_mcp_tool_failure(tool_name, arguments, response)

Key Design Decisions:

  • Success = DEBUG level: Low noise, only for detailed debugging

  • Failure = WARNING level: Consistent level for all failures

  • No categorization: Removed aggressive keyword-based classification

  • Full context: Entire response preserved for investigation

Failure Logging: _log_mcp_tool_failure()#

def _log_mcp_tool_failure(
    tool_name: str,
    arguments: dict[str, Any],
    response: dict[str, Any],
) -> None:
    """Log MCP tool failure for activity tracking."""
    error_msg = response.get("error", "Unknown error")
    retry_reason = response.get("retry_reason", "")

    # Log at WARNING level (no categorization)
    conversation_logger.pm_logger.warning(
        f"MCP tool '{tool_name}' returned failure",
        tool_name=tool_name,
        arguments=arguments,
        error=error_msg,
        retry_reason=retry_reason,
        response=response,
    )

    # Special case: request_next_task gets diagnostic pointer
    if tool_name == "request_next_task":
        conversation_logger.log_pm_thinking(
            "For root cause analysis of request_next_task failure, "
            "check Python logs (logs/marcus_*.log) for 'Diagnostic Report' "
            "entries near this timestamp",
            context={"hint": "Diagnostics run automatically when no tasks assignable"},
        )

Key Features:

  • Consistent WARNING level: No escalation to CRITICAL based on keywords

  • Diagnostic pointer: Only for request_next_task (most critical tool)

  • No assumptions: Logs error message as-is without interpretation

  • Full response preserved: All fields included for operator investigation

Integration: src/marcus_mcp/handlers.py#

The logger integrates at the centralized handler dispatch point (line 1414-1420):

# After tool execution
response = [types.TextContent(type="text", text=json.dumps(result, indent=2))]

# Log MCP tool response (especially failures) for diagnostics
if isinstance(result, dict):
    log_mcp_tool_response(
        tool_name=name,
        arguments=arguments,
        response=result,
    )

Benefits of this integration point:

  • βœ… Automatic coverage: All 30+ MCP tools logged automatically

  • βœ… Centralized: Single integration point, easy to maintain

  • βœ… Future-proof: New tools automatically logged

  • βœ… Post-execution: Logs actual results, not assumptions

Logging Behavior#

Success Cases#

Log Entry (DEBUG level):

{
  "timestamp": "2025-01-15T10:30:45.123Z",
  "event": "pm_thinking",
  "message": "MCP tool 'request_next_task' succeeded",
  "context": {
    "tool_name": "request_next_task",
    "arguments": {"agent_id": "agent_001"},
    "response": {
      "success": true,
      "task": {
        "id": "task_123",
        "name": "Implement API endpoint",
        "description": "...",
        "status": "todo"
      }
    }
  }
}

Failure Cases#

request_next_task Failure#

Conversation Log (WARNING level):

{
  "timestamp": "2025-01-15T10:35:22.456Z",
  "level": "warning",
  "event": "pm_logger",
  "message": "MCP tool 'request_next_task' returned failure",
  "tool_name": "request_next_task",
  "arguments": {"agent_id": "agent_002"},
  "error": "No suitable tasks available",
  "retry_reason": "All tasks assigned or blocked",
  "response": {
    "success": false,
    "error": "No suitable tasks available",
    "retry_reason": "All tasks assigned or blocked",
    "retry_after_seconds": 30
  }
}

Diagnostic Pointer (DEBUG level):

{
  "timestamp": "2025-01-15T10:35:22.457Z",
  "event": "pm_thinking",
  "message": "For root cause analysis of request_next_task failure, check Python logs (logs/marcus_*.log) for 'Diagnostic Report' entries near this timestamp",
  "context": {
    "hint": "Diagnostics run automatically when no tasks assignable"
  }
}

Python Log (Automatic Diagnostic Report):

2025-01-15 10:35:22,450 INFO - Diagnostic Report (for operators):
=== Task Assignment Diagnostics ===

Total Tasks: 5
TODO Tasks: 3
In Progress: 1
Completed: 1

Filtering Results:
- Started with: 3 TODO tasks
- After dependency filter: 0 tasks (blocked by incomplete dependencies)
- After skill filter: 0 tasks
- After assignment filter: 0 tasks

Dependency Chain Analysis:
- Task "Implement API" (task_456) blocked by incomplete "Setup Database" (task_123)
- Task "Write Tests" (task_789) blocked by incomplete "Implement API" (task_456)

Recommendation: Review dependency chain for potential circular dependencies

Other Tool Failures#

Conversation Log (WARNING level, NO diagnostic pointer):

{
  "timestamp": "2025-01-15T10:40:10.789Z",
  "level": "warning",
  "event": "pm_logger",
  "message": "MCP tool 'create_project' returned failure",
  "tool_name": "create_project",
  "arguments": {"name": "test-project", "description": "..."},
  "error": "Invalid project configuration",
  "retry_reason": "",
  "response": {
    "success": false,
    "error": "Invalid project configuration"
  }
}

Note: No diagnostic pointer for non-request_next_task tools.

Relationship to Diagnostic Systems#

Separation of Concerns#

The MCP tool logger is NOT a diagnostic system. It is an activity tracker that points to diagnostic systems:

Aspect

MCP Tool Logger

Diagnostic Systems

Purpose

Activity tracking

Root cause analysis

Answers

WHAT failed, WHEN

WHY it failed

Location

src/logging/mcp_tool_logger.py

src/core/task_diagnostics.py

Log Stream

Conversation logs

Python logs

Trigger

Every MCP tool call

Automatic when failures occur

Assumptions

None

Deep analysis

Audience

Quick overview

Detailed investigation

Diagnostic System: task_diagnostics.py#

When request_next_task cannot find suitable tasks, it automatically runs diagnostics:

# From src/marcus_mcp/tools/task.py (lines 946-989)
if todo_tasks:
    # Tasks exist but can't be assigned - run diagnostics
    logger.warning(f"No tasks assignable but {len(todo_tasks)} TODO tasks exist")

    diagnostic_report = await run_automatic_diagnostics(
        project_tasks=state.project_tasks,
        completed_task_ids=completed_task_ids,
        assigned_task_ids=assigned_task_ids,
    )

    # Format and log for operators
    formatted_report = format_diagnostic_report(diagnostic_report)
    logger.info(f"Diagnostic Report (for operators):\n{formatted_report}")

What Diagnostics Provide:

  • Dependency chain analysis (blocking tasks)

  • Task filtering statistics (why tasks were excluded)

  • Skill mismatch detection

  • Circular dependency detection

  • Recommended actions

Why Separate:

  • Diagnostics run automatically when needed

  • Detailed analysis logged to Python logs

  • MCP responses stay simple for agents

  • Operators get full context in separate stream

Why Categorization Was Removed#

Original Design (Removed)#

The initial implementation attempted to categorize failures:

# REMOVED: Aggressive categorization
def _categorize_failure(response: dict[str, Any]) -> str:
    """Categorize based on keywords."""
    error_msg = response.get("error", "").lower()

    if "dependency" in error_msg or "blocked by" in error_msg:
        return "dependency_issue"  # Assumption!
    if "busy" in error_msg:
        return "agent_busy"
    if "no suitable" in error_msg:
        return "no_suitable_tasks"
    return "unknown"

Problems Discovered#

  1. MCP responses lack context:

    # Response says:
    {"error": "No suitable tasks available"}
    
    # But WHY? Could be:
    # - All tasks assigned
    # - Dependencies blocking
    # - Skill mismatch
    # - Tasks filtered by other criteria
    # - Circular dependencies
    
    # We can't tell from the response!
    
  2. Keyword matching is unreliable:

    • β€œTask waiting for dependencies” β†’ Labeled as dependency issue

    • But diagnostic might show: β€œActually all tasks are assigned”

    • Keyword doesn’t equal root cause

  3. Decision happens elsewhere:

    # From src/core/ai_powered_task_assignment.py
    unblocked_tasks = await self._filter_unblocked_tasks(...)
    if not unblocked_tasks:
        logger.info("No unblocked tasks available")  # Logs here
        return None  # But doesn't pass info back!
    

    The filtering logic knows WHY but doesn’t include it in the MCP response.

  4. Diagnostics already exist:

    • run_automatic_diagnostics() provides accurate analysis

    • Runs automatically when needed

    • No need to guess from keywords

Simplified Design (Current)#

# CURRENT: No categorization, just facts
def _log_mcp_tool_failure(...):
    """Log the failure as-is, no assumptions."""
    conversation_logger.pm_logger.warning(
        f"MCP tool '{tool_name}' returned failure",
        error=error_msg,  # Log what we know
        # NO failure_category field
    )

    # Point to real diagnostics
    if tool_name == "request_next_task":
        log_pm_thinking("Check Python logs for 'Diagnostic Report'...")

Benefits:

  • βœ… No false assumptions

  • βœ… Clear separation: activity tracking vs. diagnostics

  • βœ… Points to accurate analysis

  • βœ… Simpler, more maintainable

Usage Patterns#

For Operators#

When investigating MCP tool failures:

  1. Check conversation logs for activity overview:

    grep 'returned failure' logs/conversations/marcus_*.log | tail -20
    
  2. For request_next_task failures, check Python logs for diagnostics:

    grep -A 30 'Diagnostic Report' logs/marcus_*.log | tail -50
    
  3. Filter by tool:

    grep 'create_project.*returned failure' logs/conversations/marcus_*.log
    
  4. Time-based investigation:

    # Find failures around 10:35 AM
    grep '10:35:' logs/conversations/marcus_*.log | grep 'returned failure'
    

For Developers#

Adding new MCP tools:

No changes needed! The logger automatically handles all tools via the centralized dispatch point.

Changing log format:

Update structured logging fields in _log_mcp_tool_failure():

context = {
    "tool_name": tool_name,
    "arguments": arguments,
    "error": error_msg,
    "retry_reason": retry_reason,
    "response": response,
    # Add new fields here
}

Adding diagnostic pointers for other tools:

# In _log_mcp_tool_failure()
if tool_name in ["request_next_task", "your_new_tool"]:
    conversation_logger.log_pm_thinking(
        "Check diagnostics at location X...",
        context={"hint": "..."}
    )

Testing#

The MCP tool logger has comprehensive test coverage across three levels:

1. Unit Tests (14 tests)#

Location: tests/unit/logging/test_mcp_tool_logger.py

Run:

pytest tests/unit/logging/test_mcp_tool_logger.py -v

Coverage:

  • Success/failure logging

  • Log levels (DEBUG/WARNING)

  • Diagnostic pointers

  • Integration with conversation logger

  • Activity tracking behavior (WHAT/WHEN, not WHY)

  • No categorization assumptions

2. Integration Tests (11 tests)#

Location: tests/integration/logging/test_mcp_tool_logger_integration.py

Run:

pytest tests/integration/logging/test_mcp_tool_logger_integration.py -v

Coverage:

  • Real-world failure scenarios

  • Log entry creation with real logger

  • Structured logging format

  • End-to-end convenience functions

3. Live System Testing#

Location: dev-tools/examples/test_mcp_logger_live.py

Run:

# Terminal 1: Start Marcus
python -m src.marcus_mcp.server --http

# Terminal 2: Run live test
python dev-tools/examples/test_mcp_logger_live.py

Tests 4 scenarios:

  1. No tasks available

  2. Agent already busy

  3. All tasks assigned

  4. Other tool failures

Verification:

# Activity tracking
grep 'returned failure' logs/conversations/marcus_*.log

# Diagnostic reports
grep -A 30 'Diagnostic Report' logs/marcus_*.log

For complete testing guide, see: docs/testing/MCP_TOOL_LOGGER_TESTING.md

Performance Characteristics#

Low Overhead Design#

  • Async-friendly: Uses structlog’s async logging

  • Minimal processing: No complex categorization logic

  • Conditional logging: DEBUG logs only if enabled

  • Efficient serialization: JSON serialization by structlog

Performance Metrics#

# Measured performance (1000 log entries)
Average: 0.8ms per log entry
Total overhead: <1% of MCP tool execution time

Why it’s fast:

  • Single log write per tool call

  • No synchronous I/O blocking

  • No expensive analysis or categorization

  • Structlog handles async buffering

Monitoring and Maintenance#

Health Indicators#

Healthy System:

# Mostly successes, few failures
grep 'MCP tool' logs/conversations/marcus_*.log | \
    grep -c 'succeeded'
# Example output: 245

grep 'returned failure' logs/conversations/marcus_*.log | wc -l
# Example output: 12

Problem Indicators:

# High failure rate for specific tool
grep 'request_next_task.*returned failure' logs/conversations/marcus_*.log | wc -l
# If consistently high: investigate diagnostics

# Unexpected tool failures
grep 'create_project.*returned failure' logs/conversations/marcus_*.log
# If frequent: check project configuration

Log Rotation#

Conversation logs use automatic rotation (via structlog configuration):

  • Daily rotation

  • Compression after 7 days

  • Retention: 30 days

Configuration: src/logging/conversation_logger.py

Troubleshooting#

No log entries created:

# Check logger initialization
python -c "from src.logging.mcp_tool_logger import log_mcp_tool_response; print('OK')"

# Check log directory
ls -la logs/conversations/

Missing diagnostic pointers:

  • Verify tool name is exactly "request_next_task" (case-sensitive)

  • Check _log_mcp_tool_failure() logic at line 102-108

Wrong log levels:

  • Verify conversation logger configuration

  • Check structlog setup in conversation_logger.py

Comparison to Other Monitoring Systems#

vs. Assignment Monitor#

Aspect

MCP Tool Logger

Assignment Monitor

Scope

All MCP tool calls

Task assignments only

Trigger

Every tool call

Assignment state changes

Purpose

Activity tracking

Detect assignment reversions

Log Type

Structured JSON

Specialized alerts

File

mcp_tool_logger.py

assignment_monitor.py

vs. Error Predictor#

Aspect

MCP Tool Logger

Error Predictor

Scope

Historical activity

Predictive analysis

Timing

Real-time (during execution)

Proactive (before execution)

Purpose

Record what happened

Predict what might happen

Method

Logging

AI pattern analysis

File

mcp_tool_logger.py

error_predictor.py

vs. Task Diagnostics#

Aspect

MCP Tool Logger

Task Diagnostics

Scope

All tools

Task assignment only

Depth

Activity tracking

Deep analysis

Answers

WHAT/WHEN

WHY

Trigger

Every call

Automatic on failures

Log Stream

Conversation logs

Python logs

File

mcp_tool_logger.py

task_diagnostics.py

Complementary Design: These systems work together, each serving a specific purpose.

Future Enhancements#

Potential Improvements#

  1. Metrics Collection:

    • Tool success/failure rates

    • Average response times

    • Most common failure patterns

    • Export to dashboard

  2. Enhanced Diagnostic Linking:

    • Include diagnostic report IDs in logs

    • Direct links between activity and diagnostics

    • Correlation IDs across log streams

  3. Alert Integration:

    • Trigger alerts on sustained high failure rates

    • Integrate with monitoring dashboards

    • Webhook notifications for critical tools

  4. Structured Metrics:

    • Prometheus/OpenTelemetry integration

    • Real-time dashboards

    • Trend analysis

Non-Goals#

  • ❌ Root cause analysis (handled by diagnostics)

  • ❌ Failure categorization (unreliable from MCP responses)

  • ❌ Predictive analysis (handled by error predictor)

  • ❌ Assignment tracking (handled by assignment monitor)

The MCP tool logger maintains its focused role as an activity tracker, leaving specialized analysis to purpose-built systems.