MCP Tool Logging System#
Overview#
The MCP Tool Logging System provides comprehensive activity tracking for all Marcus MCP (Model Context Protocol) tool operations. It acts as a βtable of contentsβ for MCP tool activity, logging WHAT failed and WHEN, while pointing operators to detailed diagnostic logs for WHY root cause analysis.
This logging system is intentionally simplified to avoid making assumptions about failure causes, instead serving as an index that guides operators to the appropriate diagnostic systems for investigation.
Purpose and Philosophy#
Design Principle: Activity Tracking, Not Diagnosis#
The MCP tool logger follows a clear separation of concerns:
Activity Tracker (WHAT/WHEN): Records that a tool was called and whether it succeeded or failed
Diagnostic Systems (WHY): Provide detailed root cause analysis (e.g.,
task_diagnostics.py)
This separation is crucial because:
MCP responses are intentionally vague - They donβt contain enough context to determine root cause
Agents shouldnβt overthink - Detailed failure analysis in MCP responses causes agents to try alternative strategies instead of following instructions
Diagnostics already exist - Detailed analysis systems run automatically and log to Python logs
Target Audience#
Operators/Developers: Need to investigate why tools fail to fix code or configuration issues
NOT Agents: Agents receive simple retry instructions, not diagnostic details
Architecture#
Integration Point#
The logger integrates at the centralized MCP handler dispatch point:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MCP Server (src/marcus_mcp/) β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β handlers.py::handle_tool_call() β β
β β β’ Dispatches ALL MCP tool calls β β
β β β’ Returns result to agent β β
β β β’ Logs result via mcp_tool_logger ββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β
β Tool Functions: β β
β β’ request_next_task() β β
β β’ register_agent() β β
β β’ create_project() β β
β β’ [30+ other tools] β β
β β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββ Agent receives simple response
β (success/failure + retry instructions)
β
ββββ Logger captures activity
β
ββββ Conversation logs (logs/conversations/)
β β’ WARNING entries for failures
β β’ Full response context
β β’ Diagnostic pointers
β
ββββ Python logs (logs/marcus_*.log)
β’ Detailed diagnostic reports
β’ Root cause analysis
β’ Automatic when needed
Log Stream Architecture#
Marcus uses multiple specialized log streams:
Conversation Logs (
logs/conversations/marcus_*.log)Structured JSON logs (structlog)
PM decisions, worker messages, MCP tool activity
MCP tool logger writes here
Python Logs (
logs/marcus_*.log)Traditional Python logging
Diagnostic reports from
task_diagnostics.pyDependency analysis, filtering statistics
Agent Event Logs (
logs/agent_events/*.jsonl)Lightweight event tracking (JSON Lines)
Agent lifecycle events
Used by visualization systems
The MCP tool logger bridges streams by recording activity in conversation logs and pointing to diagnostic details in Python logs.
Implementation#
Core Module: src/logging/mcp_tool_logger.py#
Main Function: log_mcp_tool_response()#
def log_mcp_tool_response(
tool_name: str,
arguments: dict[str, Any],
response: dict[str, Any],
) -> None:
"""
Log MCP tool response for activity tracking.
Records WHAT failed and WHEN. Does not attempt to determine
WHY failures occurred (root cause analysis is in diagnostic logs).
"""
success = response.get("success", False)
if success:
# DEBUG level - low noise
conversation_logger.log_pm_thinking(
f"MCP tool '{tool_name}' succeeded",
context={
"tool_name": tool_name,
"arguments": arguments,
"response": response,
},
)
else:
# WARNING level - needs attention
_log_mcp_tool_failure(tool_name, arguments, response)
Key Design Decisions:
Success = DEBUG level: Low noise, only for detailed debugging
Failure = WARNING level: Consistent level for all failures
No categorization: Removed aggressive keyword-based classification
Full context: Entire response preserved for investigation
Failure Logging: _log_mcp_tool_failure()#
def _log_mcp_tool_failure(
tool_name: str,
arguments: dict[str, Any],
response: dict[str, Any],
) -> None:
"""Log MCP tool failure for activity tracking."""
error_msg = response.get("error", "Unknown error")
retry_reason = response.get("retry_reason", "")
# Log at WARNING level (no categorization)
conversation_logger.pm_logger.warning(
f"MCP tool '{tool_name}' returned failure",
tool_name=tool_name,
arguments=arguments,
error=error_msg,
retry_reason=retry_reason,
response=response,
)
# Special case: request_next_task gets diagnostic pointer
if tool_name == "request_next_task":
conversation_logger.log_pm_thinking(
"For root cause analysis of request_next_task failure, "
"check Python logs (logs/marcus_*.log) for 'Diagnostic Report' "
"entries near this timestamp",
context={"hint": "Diagnostics run automatically when no tasks assignable"},
)
Key Features:
Consistent WARNING level: No escalation to CRITICAL based on keywords
Diagnostic pointer: Only for
request_next_task(most critical tool)No assumptions: Logs error message as-is without interpretation
Full response preserved: All fields included for operator investigation
Integration: src/marcus_mcp/handlers.py#
The logger integrates at the centralized handler dispatch point (line 1414-1420):
# After tool execution
response = [types.TextContent(type="text", text=json.dumps(result, indent=2))]
# Log MCP tool response (especially failures) for diagnostics
if isinstance(result, dict):
log_mcp_tool_response(
tool_name=name,
arguments=arguments,
response=result,
)
Benefits of this integration point:
β Automatic coverage: All 30+ MCP tools logged automatically
β Centralized: Single integration point, easy to maintain
β Future-proof: New tools automatically logged
β Post-execution: Logs actual results, not assumptions
Logging Behavior#
Success Cases#
Log Entry (DEBUG level):
{
"timestamp": "2025-01-15T10:30:45.123Z",
"event": "pm_thinking",
"message": "MCP tool 'request_next_task' succeeded",
"context": {
"tool_name": "request_next_task",
"arguments": {"agent_id": "agent_001"},
"response": {
"success": true,
"task": {
"id": "task_123",
"name": "Implement API endpoint",
"description": "...",
"status": "todo"
}
}
}
}
Failure Cases#
request_next_task Failure#
Conversation Log (WARNING level):
{
"timestamp": "2025-01-15T10:35:22.456Z",
"level": "warning",
"event": "pm_logger",
"message": "MCP tool 'request_next_task' returned failure",
"tool_name": "request_next_task",
"arguments": {"agent_id": "agent_002"},
"error": "No suitable tasks available",
"retry_reason": "All tasks assigned or blocked",
"response": {
"success": false,
"error": "No suitable tasks available",
"retry_reason": "All tasks assigned or blocked",
"retry_after_seconds": 30
}
}
Diagnostic Pointer (DEBUG level):
{
"timestamp": "2025-01-15T10:35:22.457Z",
"event": "pm_thinking",
"message": "For root cause analysis of request_next_task failure, check Python logs (logs/marcus_*.log) for 'Diagnostic Report' entries near this timestamp",
"context": {
"hint": "Diagnostics run automatically when no tasks assignable"
}
}
Python Log (Automatic Diagnostic Report):
2025-01-15 10:35:22,450 INFO - Diagnostic Report (for operators):
=== Task Assignment Diagnostics ===
Total Tasks: 5
TODO Tasks: 3
In Progress: 1
Completed: 1
Filtering Results:
- Started with: 3 TODO tasks
- After dependency filter: 0 tasks (blocked by incomplete dependencies)
- After skill filter: 0 tasks
- After assignment filter: 0 tasks
Dependency Chain Analysis:
- Task "Implement API" (task_456) blocked by incomplete "Setup Database" (task_123)
- Task "Write Tests" (task_789) blocked by incomplete "Implement API" (task_456)
Recommendation: Review dependency chain for potential circular dependencies
Other Tool Failures#
Conversation Log (WARNING level, NO diagnostic pointer):
{
"timestamp": "2025-01-15T10:40:10.789Z",
"level": "warning",
"event": "pm_logger",
"message": "MCP tool 'create_project' returned failure",
"tool_name": "create_project",
"arguments": {"name": "test-project", "description": "..."},
"error": "Invalid project configuration",
"retry_reason": "",
"response": {
"success": false,
"error": "Invalid project configuration"
}
}
Note: No diagnostic pointer for non-request_next_task tools.
Relationship to Diagnostic Systems#
Separation of Concerns#
The MCP tool logger is NOT a diagnostic system. It is an activity tracker that points to diagnostic systems:
Aspect |
MCP Tool Logger |
Diagnostic Systems |
|---|---|---|
Purpose |
Activity tracking |
Root cause analysis |
Answers |
WHAT failed, WHEN |
WHY it failed |
Location |
|
|
Log Stream |
Conversation logs |
Python logs |
Trigger |
Every MCP tool call |
Automatic when failures occur |
Assumptions |
None |
Deep analysis |
Audience |
Quick overview |
Detailed investigation |
Diagnostic System: task_diagnostics.py#
When request_next_task cannot find suitable tasks, it automatically runs diagnostics:
# From src/marcus_mcp/tools/task.py (lines 946-989)
if todo_tasks:
# Tasks exist but can't be assigned - run diagnostics
logger.warning(f"No tasks assignable but {len(todo_tasks)} TODO tasks exist")
diagnostic_report = await run_automatic_diagnostics(
project_tasks=state.project_tasks,
completed_task_ids=completed_task_ids,
assigned_task_ids=assigned_task_ids,
)
# Format and log for operators
formatted_report = format_diagnostic_report(diagnostic_report)
logger.info(f"Diagnostic Report (for operators):\n{formatted_report}")
What Diagnostics Provide:
Dependency chain analysis (blocking tasks)
Task filtering statistics (why tasks were excluded)
Skill mismatch detection
Circular dependency detection
Recommended actions
Why Separate:
Diagnostics run automatically when needed
Detailed analysis logged to Python logs
MCP responses stay simple for agents
Operators get full context in separate stream
Why Categorization Was Removed#
Original Design (Removed)#
The initial implementation attempted to categorize failures:
# REMOVED: Aggressive categorization
def _categorize_failure(response: dict[str, Any]) -> str:
"""Categorize based on keywords."""
error_msg = response.get("error", "").lower()
if "dependency" in error_msg or "blocked by" in error_msg:
return "dependency_issue" # Assumption!
if "busy" in error_msg:
return "agent_busy"
if "no suitable" in error_msg:
return "no_suitable_tasks"
return "unknown"
Problems Discovered#
MCP responses lack context:
# Response says: {"error": "No suitable tasks available"} # But WHY? Could be: # - All tasks assigned # - Dependencies blocking # - Skill mismatch # - Tasks filtered by other criteria # - Circular dependencies # We can't tell from the response!
Keyword matching is unreliable:
βTask waiting for dependenciesβ β Labeled as dependency issue
But diagnostic might show: βActually all tasks are assignedβ
Keyword doesnβt equal root cause
Decision happens elsewhere:
# From src/core/ai_powered_task_assignment.py unblocked_tasks = await self._filter_unblocked_tasks(...) if not unblocked_tasks: logger.info("No unblocked tasks available") # Logs here return None # But doesn't pass info back!
The filtering logic knows WHY but doesnβt include it in the MCP response.
Diagnostics already exist:
run_automatic_diagnostics()provides accurate analysisRuns automatically when needed
No need to guess from keywords
Simplified Design (Current)#
# CURRENT: No categorization, just facts
def _log_mcp_tool_failure(...):
"""Log the failure as-is, no assumptions."""
conversation_logger.pm_logger.warning(
f"MCP tool '{tool_name}' returned failure",
error=error_msg, # Log what we know
# NO failure_category field
)
# Point to real diagnostics
if tool_name == "request_next_task":
log_pm_thinking("Check Python logs for 'Diagnostic Report'...")
Benefits:
β No false assumptions
β Clear separation: activity tracking vs. diagnostics
β Points to accurate analysis
β Simpler, more maintainable
Usage Patterns#
For Operators#
When investigating MCP tool failures:
Check conversation logs for activity overview:
grep 'returned failure' logs/conversations/marcus_*.log | tail -20
For request_next_task failures, check Python logs for diagnostics:
grep -A 30 'Diagnostic Report' logs/marcus_*.log | tail -50
Filter by tool:
grep 'create_project.*returned failure' logs/conversations/marcus_*.log
Time-based investigation:
# Find failures around 10:35 AM grep '10:35:' logs/conversations/marcus_*.log | grep 'returned failure'
For Developers#
Adding new MCP tools:
No changes needed! The logger automatically handles all tools via the centralized dispatch point.
Changing log format:
Update structured logging fields in _log_mcp_tool_failure():
context = {
"tool_name": tool_name,
"arguments": arguments,
"error": error_msg,
"retry_reason": retry_reason,
"response": response,
# Add new fields here
}
Adding diagnostic pointers for other tools:
# In _log_mcp_tool_failure()
if tool_name in ["request_next_task", "your_new_tool"]:
conversation_logger.log_pm_thinking(
"Check diagnostics at location X...",
context={"hint": "..."}
)
Testing#
The MCP tool logger has comprehensive test coverage across three levels:
1. Unit Tests (14 tests)#
Location: tests/unit/logging/test_mcp_tool_logger.py
Run:
pytest tests/unit/logging/test_mcp_tool_logger.py -v
Coverage:
Success/failure logging
Log levels (DEBUG/WARNING)
Diagnostic pointers
Integration with conversation logger
Activity tracking behavior (WHAT/WHEN, not WHY)
No categorization assumptions
2. Integration Tests (11 tests)#
Location: tests/integration/logging/test_mcp_tool_logger_integration.py
Run:
pytest tests/integration/logging/test_mcp_tool_logger_integration.py -v
Coverage:
Real-world failure scenarios
Log entry creation with real logger
Structured logging format
End-to-end convenience functions
3. Live System Testing#
Location: dev-tools/examples/test_mcp_logger_live.py
Run:
# Terminal 1: Start Marcus
python -m src.marcus_mcp.server --http
# Terminal 2: Run live test
python dev-tools/examples/test_mcp_logger_live.py
Tests 4 scenarios:
No tasks available
Agent already busy
All tasks assigned
Other tool failures
Verification:
# Activity tracking
grep 'returned failure' logs/conversations/marcus_*.log
# Diagnostic reports
grep -A 30 'Diagnostic Report' logs/marcus_*.log
For complete testing guide, see: docs/testing/MCP_TOOL_LOGGER_TESTING.md
Performance Characteristics#
Low Overhead Design#
Async-friendly: Uses structlogβs async logging
Minimal processing: No complex categorization logic
Conditional logging: DEBUG logs only if enabled
Efficient serialization: JSON serialization by structlog
Performance Metrics#
# Measured performance (1000 log entries)
Average: 0.8ms per log entry
Total overhead: <1% of MCP tool execution time
Why itβs fast:
Single log write per tool call
No synchronous I/O blocking
No expensive analysis or categorization
Structlog handles async buffering
Monitoring and Maintenance#
Health Indicators#
Healthy System:
# Mostly successes, few failures
grep 'MCP tool' logs/conversations/marcus_*.log | \
grep -c 'succeeded'
# Example output: 245
grep 'returned failure' logs/conversations/marcus_*.log | wc -l
# Example output: 12
Problem Indicators:
# High failure rate for specific tool
grep 'request_next_task.*returned failure' logs/conversations/marcus_*.log | wc -l
# If consistently high: investigate diagnostics
# Unexpected tool failures
grep 'create_project.*returned failure' logs/conversations/marcus_*.log
# If frequent: check project configuration
Log Rotation#
Conversation logs use automatic rotation (via structlog configuration):
Daily rotation
Compression after 7 days
Retention: 30 days
Configuration: src/logging/conversation_logger.py
Troubleshooting#
No log entries created:
# Check logger initialization
python -c "from src.logging.mcp_tool_logger import log_mcp_tool_response; print('OK')"
# Check log directory
ls -la logs/conversations/
Missing diagnostic pointers:
Verify tool name is exactly
"request_next_task"(case-sensitive)Check
_log_mcp_tool_failure()logic at line 102-108
Wrong log levels:
Verify conversation logger configuration
Check structlog setup in
conversation_logger.py
Comparison to Other Monitoring Systems#
vs. Assignment Monitor#
Aspect |
MCP Tool Logger |
Assignment Monitor |
|---|---|---|
Scope |
All MCP tool calls |
Task assignments only |
Trigger |
Every tool call |
Assignment state changes |
Purpose |
Activity tracking |
Detect assignment reversions |
Log Type |
Structured JSON |
Specialized alerts |
File |
|
|
vs. Error Predictor#
Aspect |
MCP Tool Logger |
Error Predictor |
|---|---|---|
Scope |
Historical activity |
Predictive analysis |
Timing |
Real-time (during execution) |
Proactive (before execution) |
Purpose |
Record what happened |
Predict what might happen |
Method |
Logging |
AI pattern analysis |
File |
|
|
vs. Task Diagnostics#
Aspect |
MCP Tool Logger |
Task Diagnostics |
|---|---|---|
Scope |
All tools |
Task assignment only |
Depth |
Activity tracking |
Deep analysis |
Answers |
WHAT/WHEN |
WHY |
Trigger |
Every call |
Automatic on failures |
Log Stream |
Conversation logs |
Python logs |
File |
|
|
Complementary Design: These systems work together, each serving a specific purpose.
Future Enhancements#
Potential Improvements#
Metrics Collection:
Tool success/failure rates
Average response times
Most common failure patterns
Export to dashboard
Enhanced Diagnostic Linking:
Include diagnostic report IDs in logs
Direct links between activity and diagnostics
Correlation IDs across log streams
Alert Integration:
Trigger alerts on sustained high failure rates
Integrate with monitoring dashboards
Webhook notifications for critical tools
Structured Metrics:
Prometheus/OpenTelemetry integration
Real-time dashboards
Trend analysis
Non-Goals#
β Root cause analysis (handled by diagnostics)
β Failure categorization (unreliable from MCP responses)
β Predictive analysis (handled by error predictor)
β Assignment tracking (handled by assignment monitor)
The MCP tool logger maintains its focused role as an activity tracker, leaving specialized analysis to purpose-built systems.