35. Assignment Lease System#

Primary spec: This document focuses on the lease data model, adaptive duration math, and renewal mechanics. For the end-to-end liveness detection, cadence-based recovery, touch-on-any-tool behavior, and the recovery handoff flow, see Resilience and Task Recovery System.

Executive Summary#

The Assignment Lease System tracks a short-lived lease for every in-progress task assignment. A lease is Marcus’s proof-of-life record for an agent: while the lease is fresh, Marcus trusts that the agent is working; once the lease expires past its grace period, the lease is evaluated for recovery.

Marcus runs in an aggressive lease mode β€” lease durations are measured in seconds to minutes, not hours. The defaults are tuned to detect dead tmux panes, killed processes, and dropped network connections within one to two monitor cycles. Adaptive duration math, priority and complexity multipliers, and renewal decay still exist, but they operate within a much tighter timing envelope than older versions of the system.

Source Files#

Component

Path

Lease data model + manager + monitor

src/core/assignment_lease.py

Default configuration

src/config/marcus_config.py β€” TaskLeaseSettings

Server wiring + recovery callback

src/marcus_mcp/server.py

Lease creation + progress renewal + recreation

src/marcus_mcp/tools/task.py

Touch-on-any-tool-call

src/marcus_mcp/handlers.py

Lease Data Model#

@dataclass
class AssignmentLease:
    task_id: str
    agent_id: str
    assigned_at: datetime
    lease_expires: datetime
    last_renewed: datetime
    renewal_count: int = 0
    estimated_hours: float = 4.0
    progress_percentage: int = 0
    last_progress_message: str = ""
    grace_period_seconds: Optional[float] = None
    update_timestamps: list[datetime] = field(default_factory=list)

Key properties:

  • update_timestamps β€” every progress update appends a timestamp. The lease exposes median_update_interval, the median seconds between updates, which powers cadence-based recovery (see Resilience and Task Recovery System).

  • grace_period_seconds β€” per-lease adaptive grace, set by the current progressive timeout phase.

  • renewal_count β€” used by renewal decay and by the stuck-task detector.

Aggressive Defaults#

All tuning lives in TaskLeaseSettings in src/config/marcus_config.py. The current defaults are:

Setting

Default

Meaning

default_hours

0.025

~90 seconds base lease.

grace_period_minutes

0.5

30 seconds of grace after expiry.

min_lease_hours

0.0167

60 seconds β€” the floor.

max_lease_hours

0.0833

5 minutes β€” the ceiling.

warning_hours

0.01

~36 seconds before expiry, emit a warning.

max_renewals

10

Safety cap on renewal count.

stuck_threshold_renewals

5

Flag for stuck-task detection.

silence_multiplier

1.5

Cadence threshold multiplier.

enable_adaptive

true

Enable progressive phases.

renewal_decay_factor

0.9

Decay applied on each renewal.

The dict-path fallback in server.py mirrors these values so config-less startup still matches the dataclass defaults.

Progressive Timeout Phases#

When default_hours < 1.0 (the aggressive mode path), create_lease and renew_lease call calculate_adaptive_timeout to pick a phase-specific lease + grace based on where the task is in its lifecycle:

Phase

Trigger

Lease

Grace

Total

Rationale

1. Unproven

0 updates

60s

20s

80s

Detect startup failures fast.

2. Working

1 update

90s

30s

120s

Agent is alive, be moderate.

3. Proven

25–75% progress

120s

30s

150s

Protect in-flight work.

4. Finishing

>75% progress

60s

15s

75s

Detect final stalls quickly.

A new task starts in Phase 1. As soon as the first report_task_progress arrives, the lease transitions to Phase 2 and subsequent phases follow the progress percentage.

Lease Lifecycle#

Task Assigned
     β”‚
     β–Ό
create_lease() ──────────► Phase 1 lease (60s + 20s grace)
     β”‚
     β”‚ ◄─── touch_lease(agent_id)    (any MCP tool call extends lease)
     β”‚
     β–Ό
Agent reports progress
     β”‚
     β–Ό
renew_lease(task_id, progress) ──► Phase 2–4 based on progress
     β”‚
     β”œβ”€β”€β”€β”€ on each renewal: update_timestamps append
     β”‚     renewal_count += 1
     β”‚     renewal_decay applied
     β”‚
     β–Ό
Task completes β†’ lease removed from active_leases

touch_lease(agent_id)#

A cheap extension triggered by the MCP dispatch loop in handlers.py. Any tool call from an agent (including log_decision, log_artifact, report_blocker, get_task_context, etc.) counts as a touch β€” the agent proves it is alive by working. No explicit heartbeat is ever required.

# src/marcus_mcp/handlers.py
agent_id = arguments.get("agent_id") if arguments else None
if agent_id and hasattr(state, "lease_manager") and state.lease_manager:
    await state.lease_manager.touch_lease(agent_id)

renew_lease(task_id, progress, message)#

The full renewal path used by report_task_progress. It appends a new update_timestamps entry, recomputes the phase, applies decay, and pushes out lease_expires.

Lease Recreation on Progress Report#

If report_task_progress runs and the lease is missing (because the monitor already recovered the task due to a cadence false-positive), the progress handler recreates the lease instead of failing:

# src/marcus_mcp/tools/task.py
renewed_lease = await state.lease_manager.renew_lease(task_id, progress, message)
if renewed_lease is None:
    # Lease was recovered β€” recreate so monitoring resumes.
    task_obj = next((t for t in state.project_tasks if t.id == task_id), None)
    new_lease = await state.lease_manager.create_lease(task_id, agent_id, task_obj)

This makes false positives self-healing: the next progress update re-attaches a lease and the monitor resumes watching.

recover_expired_lease(lease)#

Resets the task to TODO, clears assigned_to, builds a RecoveryInfo, dual-writes to the board (field + comment), removes the lease from active_leases, and finally invokes on_recovery_callback(agent_id, task_id).

See Resilience and Task Recovery System for the full recovery flow, including RecoveryInfo, worktree-aware handoff instructions, and the cadence check that gates recovery.

LeaseMonitor#

class LeaseMonitor:
    """Background asyncio task that polls every 60 seconds."""

    check_interval = 60  # seconds

On each poll it walks active_leases, identifies expired leases (past grace), calls should_recover_expired_lease for the cadence-aware check, and invokes recover_expired_lease when the check says to recover.

Lazy start on the correct event loop#

The monitor must not be started during server setup. The HTTP transport spins up its own event loop per request context, and a monitor task created during setup is bound to a loop that no longer exists by the time a request arrives. Instead the server exposes ensure_lease_monitor_running() and the first request_next_task call lazily starts the monitor on the live request loop:

# src/marcus_mcp/tools/task.py
if hasattr(state, "ensure_lease_monitor_running"):
    await state.ensure_lease_monitor_running()

Recovery Callback#

AssignmentLeaseManager does not import server state. The server sets a callback at initialization so in-memory assignment tracking (agent_tasks, tasks_being_assigned) is cleaned whenever a lease is recovered:

# src/marcus_mcp/server.py
def _on_recovery(agent_id: str, task_id: str) -> None:
    if agent_id in self.agent_tasks:
        del self.agent_tasks[agent_id]
    self.tasks_being_assigned.discard(task_id)

self.lease_manager.on_recovery_callback = _on_recovery

Without this, the assignment filter would keep the recovered task marked as taken and no agent could pick it up.

Adaptive Duration (Conservative Mode Only)#

When default_hours >= 1.0, the manager falls back to classic adaptive duration math using the task’s estimated_hours, priority_multipliers, and complexity_multipliers, bounded by min_lease_hours and max_lease_hours. The Marcus production default is aggressive mode, so this path is primarily a configuration escape hatch for long-running non-agent workflows.

Priority Multipliers#

"priority_multipliers": {
    "critical": 0.5,
    "high": 0.75,
    "medium": 1.0,
    "low": 1.5,
}

Complexity Multipliers#

"complexity_multipliers": {
    "simple": 0.5,
    "complex": 1.5,
    "research": 2.0,
    "epic": 3.0,
}

Note on default inconsistency: AssignmentLeaseManager.__init__ has its own inline default of max_lease_hours = 0.0333 (120 seconds), which differs from the TaskLeaseSettings dataclass default of 0.0833 (5 minutes). When the server initializes the manager using TaskLeaseSettings, the TaskLeaseSettings values take effect. The manager’s inline default only applies if the manager is constructed without configuration arguments.

Renewal Decay#

# With decay factor 0.9:
# Renewal 0: 1.00 * base
# Renewal 1: 0.90 * base
# Renewal 2: 0.81 * base
# ...
# bounded by min_lease_hours

Decay discourages tasks from being held indefinitely by agents that keep renewing without finishing.

Configuration#

{
  "task_lease": {
    "default_hours": 0.025,
    "max_renewals": 10,
    "warning_hours": 0.01,
    "grace_period_minutes": 0.5,
    "renewal_decay_factor": 0.9,
    "min_lease_hours": 0.0167,
    "max_lease_hours": 0.0833,
    "stuck_threshold_renewals": 5,
    "silence_multiplier": 1.5,
    "enable_adaptive": true,
    "priority_multipliers": {
      "critical": 0.5,
      "high": 0.75,
      "medium": 1.0,
      "low": 1.5
    },
    "complexity_multipliers": {
      "simple": 0.5,
      "complex": 1.5,
      "research": 2.0,
      "epic": 3.0
    }
  }
}

Project-specific overrides live under each project in the project registry; the global fallback lives in src/config/marcus_config.py.

Statistics and Health#

AssignmentLeaseManager.get_statistics() returns counts for active, expiring-soon, expired, and stuck leases plus average and max renewal counts. These are exposed through the ping health tool for operational visibility.