Service Registry System#

Overview#

The Marcus Service Registry is a lightweight, filesystem-based service discovery mechanism that enables multiple clients (like Cato, Claude Desktop, and other integrations) to automatically discover and connect to running Marcus instances without manual configuration.

What the System Does#

The Service Registry provides a decentralized approach to service discovery by:

  1. Service Advertisement: Each Marcus instance registers itself in a discoverable location when it starts

  2. Automatic Discovery: Clients can find available Marcus instances without knowing connection details beforehand

  3. Health Monitoring: Tracks service health and automatically cleans up stale registrations

  4. Connection Information: Provides MCP command strings and metadata for establishing connections

  5. Multi-Instance Support: Handles multiple concurrent Marcus instances across different projects

Architecture#

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Marcus        β”‚    β”‚  Service         β”‚    β”‚   Client        β”‚
β”‚   Instance 1    │───▢│  Registry        │◀───│   (Cato)      β”‚
β”‚                 β”‚    β”‚  (~/.marcus/     β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚   services/)     β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚                  β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Marcus        │───▢│  β”‚ marcus_1234 β”‚ β”‚    β”‚   Client        β”‚
β”‚   Instance 2    β”‚    β”‚  β”‚ .json       β”‚ β”‚    β”‚   (Claude       β”‚
β”‚                 β”‚    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚    β”‚    Desktop)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚                  β”‚    β”‚                 β”‚
                       β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚  β”‚ marcus_5678 β”‚ β”‚
β”‚   Marcus        │───▢│  β”‚ .json       β”‚ β”‚
β”‚   Instance 3    β”‚    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                 β”‚    β”‚                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Components#

  1. MarcusServiceRegistry Class: Main registry management

  2. Service Files: JSON files in ~/.marcus/services/ directory

  3. Global Registry Instance: Singleton pattern for process-wide access

  4. Convenience Functions: Simplified API for common operations

Marcus Ecosystem Integration#

The Service Registry serves as the discovery backbone for the Marcus ecosystem:

  • Marcus Server: Registers itself on startup, updates heartbeat periodically

  • Cato: Discovers available Marcus instances for GUI connections

  • Claude Desktop: Can auto-connect to running Marcus without manual MCP configuration

  • CLI Tools: Development tools can find running instances for debugging

  • Monitoring Systems: External monitoring can discover and health-check Marcus instances

Workflow Integration#

In the typical Marcus workflow, the Service Registry operates parallel to the main task flow:

create_project β†’ register_agent β†’ request_next_task β†’ report_progress β†’ report_blocker β†’ finish_task
        ↓              ↓                ↓                    ↓               ↓             ↓
   [Service Registration] ────────── [Heartbeat Updates] ──────────────── [Cleanup]

When It’s Invoked#

  1. Startup Registration: When Marcus MCP server starts (src/marcus_mcp/server.py)

  2. Heartbeat Updates: Periodic updates during operation (optional)

  3. Shutdown Cleanup: Automatic cleanup via atexit handler

  4. Client Discovery: When external clients need to find Marcus instances

What Makes This System Special#

1. Zero-Configuration Discovery#

Unlike traditional service discovery that requires service registries or configuration files, this system works automatically:

  • No external dependencies (Redis, Consul, etc.)

  • No network configuration required

  • Works across different user sessions and environments

2. Process-Aware Cleanup#

# Automatic stale service detection
if cls._is_process_running(service_info.get("pid")):
    services.append(service_info)
else:
    # Clean up stale service file
    service_file.unlink()

3. Cross-Platform Compatibility#

def _get_registry_dir(self) -> Path:
    if platform.system() == "Windows":
        base_dir = Path(os.environ.get("APPDATA", tempfile.gettempdir()))
    else:
        base_dir = Path.home()

    registry_dir = base_dir / ".marcus" / "services"

4. Rich Service Metadata#

Each service registration includes:

  • Connection details (MCP command)

  • Project context (current project, provider)

  • Runtime information (PID, working directory, Python version)

  • Lifecycle timestamps (started_at, last_heartbeat)

Technical Implementation Details#

Registration Process#

def register_service(self, mcp_command: str, log_dir: str, project_name: str = None,
                    provider: str = None, **kwargs) -> Dict[str, Any]:
    service_info = {
        "instance_id": self.instance_id,
        "pid": os.getpid(),
        "mcp_command": mcp_command,  # Key for client connections
        "log_dir": str(Path(log_dir).absolute()),
        "project_name": project_name,
        "provider": provider,
        "status": "running",
        "started_at": datetime.now().isoformat(),
        "last_heartbeat": datetime.now().isoformat(),
        "platform": platform.system(),
        "python_version": platform.python_version(),
        "working_directory": str(Path.cwd()),
        **kwargs,
    }

    # Atomic write to prevent corruption
    with open(self.registry_file, "w") as f:
        json.dump(service_info, f, indent=2)

Discovery Algorithm#

@classmethod
def discover_services(cls) -> List[Dict[str, Any]]:
    services = []

    # Scan all service files
    for service_file in registry.registry_dir.glob("marcus_*.json"):
        try:
            with open(service_file, "r") as f:
                service_info = json.load(f)

            # Health check via process existence
            if cls._is_process_running(service_info.get("pid")):
                services.append(service_info)
            else:
                try:
                    service_file.unlink()  # Cleanup stale entries
                except (OSError, PermissionError):
                    pass

        except (json.JSONDecodeError, FileNotFoundError):
            try:
                service_file.unlink()  # Cleanup corrupted files
            except (OSError, PermissionError):
                pass

    return sorted(services, key=lambda x: x.get("started_at", ""))

Instance Identification#

def __init__(self, instance_id: str = None):
    # Uses PID for uniqueness across restarts
    self.instance_id = instance_id or f"marcus_{os.getpid()}"
    self.registry_file = self.registry_dir / f"{self.instance_id}.json"

Pros and Cons#

Advantages#

  1. Simplicity: No external dependencies or complex setup

  2. Reliability: File system operations are atomic and reliable

  3. Performance: Fast discovery via filesystem globbing

  4. Debugging: Human-readable JSON files for troubleshooting

  5. Security: Uses user’s home directory with standard file permissions

  6. Multi-Platform: Works consistently across Windows, macOS, Linux

Disadvantages#

  1. Local Only: Cannot discover services across network boundaries

  2. File System Dependency: Requires writable filesystem access

  3. Cleanup Timing: Stale entries persist until next discovery operation

  4. Concurrency: No locking mechanism for concurrent registration/discovery

  5. Scale Limitations: Not designed for high-frequency operations or many services

Why This Approach Was Chosen#

Design Rationale#

  1. Developer Experience: Eliminates manual MCP server configuration for common use cases

  2. Zero Dependencies: Avoids external service registry dependencies that would complicate deployment

  3. Debugging Friendly: Service files can be inspected directly for troubleshooting

  4. Graceful Degradation: System continues working even if some service files are corrupted

Alternative Approaches Considered#

  • Network-based discovery (mDNS/Bonjour): Too complex for local development use case

  • Database registry: Overkill and would require database setup

  • Configuration files: Would require manual management and updates

  • Environment variables: Not dynamic enough for multiple instances

Evolution and Future Directions#

Planned Enhancements#

  1. Network Discovery: Support for remote Marcus instances via optional network protocols

  2. Service Metadata: Enhanced metadata for capability-based discovery

  3. Health Monitoring: More sophisticated health checks beyond process existence

  4. Load Balancing: Client-side load balancing for multiple available instances

Potential Improvements#

# Future: Enhanced service metadata
service_info = {
    # Current fields...
    "capabilities": ["project_management", "ai_analysis", "kanban_integration"],
    "load_metrics": {"active_agents": 3, "cpu_usage": 15.2, "memory_mb": 128},
    "api_version": "2.1.0",
    "supported_providers": ["github", "jira", "trello"],
}

# Future: Service selection by capability
def find_service_with_capability(capability: str) -> Optional[Dict[str, Any]]:
    services = discover_services()
    return next((s for s in services if capability in s.get("capabilities", [])), None)

Integration with Service Mesh#

As Marcus scales, the Service Registry could evolve to integrate with service mesh technologies:

  • Service discovery integration with Consul, etcd

  • Health check endpoints for external monitoring

  • Metrics exposure for observability platforms

  • Circuit breaker integration for resilience

Task Complexity Handling#

The Service Registry operates independently of task complexity:

Simple Tasks#

  • Registration occurs once at startup regardless of task complexity

  • Same discovery mechanism for all clients

  • No task-specific metadata in service registration

Complex Tasks#

  • Service registration includes project context that may be relevant for complex, multi-project scenarios

  • Heartbeat updates could include progress information for long-running operations

  • Multiple Marcus instances can handle different complexity levels simultaneously

Board-Specific Considerations#

Provider Integration#

# Service registration includes provider information.
# Note: register_marcus_service() is a module-level convenience wrapper with
# signature register_marcus_service(**kwargs: Any) -> Dict[str, Any].
# The named parameters (mcp_command, log_dir, project_name, provider) are
# forwarded to MarcusServiceRegistry.register_service(), not accepted by
# the wrapper's own signature.
register_marcus_service(
    mcp_command=command,
    log_dir=log_directory,
    project_name="my_project",
    provider="github",  # or "jira", "trello", etc.
)

Multi-Board Support#

  • Each Marcus instance can register with different provider information

  • Clients can discover instances by provider type

  • Supports scenarios where different boards require different Marcus configurations

Board-Aware Discovery#

# Future: Board-specific service discovery
def discover_services_by_provider(provider: str) -> List[Dict[str, Any]]:
    all_services = discover_services()
    return [s for s in all_services if s.get("provider") == provider]

Cato Integration#

The Service Registry is crucial for Cato’s auto-connection capability:

Discovery Flow#

  1. Cato calls MarcusServiceRegistry.discover_services()

  2. Gets list of available Marcus instances with connection details

  3. Uses mcp_command from service info to establish MCP connection

  4. Can present user with choice of multiple available instances

Connection Establishment#

# Cato discovers Marcus instances
services = MarcusServiceRegistry.discover_services()
preferred = MarcusServiceRegistry.get_preferred_service()

if preferred:
    mcp_command = preferred["mcp_command"]
    # Use mcp_command to establish connection
    client = MCPClient(command=mcp_command.split())

GUI Integration#

  • Service metadata provides rich information for Cato’s GUI

  • Project names, providers, and status information for user selection

  • Log directory paths for integrated log viewing

Monitoring and Observability#

Health Monitoring#

def _is_process_running(pid: int) -> bool:
    """Check if a process is running by PID"""
    if not pid:
        return False

    try:
        return psutil.pid_exists(pid)
    except Exception:
        return False

Service Lifecycle Tracking#

  • started_at: Service startup timestamp

  • last_heartbeat: Most recent activity indicator

  • status: Current service state

  • Automatic cleanup of dead services

Debugging Support#

  • Human-readable JSON service files

  • Rich metadata for troubleshooting connection issues

  • Log directory references for detailed investigation

Error Handling and Resilience#

Graceful Degradation#

  • Continues operation if some service files are corrupted

  • Automatic cleanup of invalid registrations

  • No cascading failures from registry issues

Recovery Mechanisms#

try:
    with open(service_file, "r") as f:
        service_info = json.load(f)
except (json.JSONDecodeError, FileNotFoundError):
    # Clean up invalid service files
    try:
        service_file.unlink()
    except (OSError, PermissionError):
        pass  # Fail silently for cleanup operations

Security Considerations#

File System Security#

  • Uses user’s home directory with standard file permissions

  • No network exposure reduces attack surface

  • JSON format prevents code injection through service files

Process Isolation#

  • PID-based health checking ensures process ownership

  • Each service registration is isolated to its own file

  • No shared state between different Marcus instances

Performance Characteristics#

Discovery Performance#

  • O(n) file system scan where n = number of registered services

  • Typically very fast for expected number of services (< 10)

  • Caching possible at client level for high-frequency discovery

Registration Performance#

  • Single file write operation (atomic)

  • No network round-trips required

  • Minimal overhead during Marcus startup

Resource Usage#

  • Minimal memory footprint (small JSON files)

  • No persistent connections or background processes

  • Clean automatic cleanup prevents resource leaks

Integration Testing#

The Service Registry system should be tested with:

  1. Multi-instance scenarios: Multiple Marcus instances registering simultaneously

  2. Crash recovery: Service cleanup after ungraceful shutdown

  3. Client discovery: Various clients finding and connecting to services

  4. Cross-platform: Registry behavior on Windows, macOS, Linux

  5. File corruption: Recovery from corrupted service files

  6. Permission issues: Handling of read-only filesystems or permission errors

This service registry system provides the foundational infrastructure that makes Marcus’s multi-client ecosystem possible, enabling seamless service discovery while maintaining simplicity and reliability.