Contract-First Pipeline#

Status#

Field	Value
Status	Implemented
Version	1.0
Date	2026-04-11
Issue	GH-320 (PRs #330-#335)

Problem#

Feature-based decomposition produces Single-Author Products on tightly-coupled projects. One agent absorbs shared infrastructure and the contribution split skews to 90/10 or worse. See Contract-First Decomposition for the conceptual background.

Solution#

Contract-first decomposition generates interface contracts between functional domains before splitting the project into tasks. The pipeline lives in _try_contract_first_decomposition in src/integrations/nlp_tools.py and integrates with the existing design autocomplete system via a pre_generated_content parameter on _run_design_phase.

Pipeline Stages#

_try_contract_first_decomposition()
  ├── 1. Domain discovery from PRD analysis
  ├── 2. _generate_contracts_by_domain()
  │     └── one LLM call per domain, FORBIDDEN PATTERNS enforced
  ├── 3. check_contract_cross_file_consistency()  [Invariant 5]
  │     ├── FAIL → fallback to feature-based
  │     └── PASS → continue
  ├── 4. decompose_by_contract()
  │     └── splits requirements into domain-scoped tasks
  ├── 5. Ghost synthesis
  │     └── one DONE design task per domain
  └── 6. Return tasks + pre_generated_content for Phase B

Stage 1: Domain Discovery#

The PRD analysis identifies functional domains in the project. Each domain represents a coherent area of responsibility (e.g., “weather-service”, “time-widget”, “dashboard-layout”). Domain boundaries are derived from the project requirements, not from file structure or technology choices.

Stage 2: Contract Generation#

_generate_contracts_by_domain makes one LLM call per domain. Each call produces an interface contract document specifying:

Data types shared with other domains
API surface (endpoints, function signatures, event schemas)
Integration points (what this domain consumes from others)

Scope clamping (PR #330): The _ARTIFACT_PROMPT and _INTERFACE_CONTRACTS_PROMPT templates include FORBIDDEN PATTERNS that prevent the LLM from generating contracts outside the domain’s scope. Without this, the LLM would routinely generate contracts for adjacent domains, causing duplication and divergence.

The generated contracts are stored as an artifacts dict keyed by domain name on the NaturalLanguageProjectCreator instance.

Stage 3: Invariant 5 Gate#

check_contract_cross_file_consistency (in src/integrations/contract_validation.py) scans all generated contracts for type contradictions. A type contradiction occurs when two domains define the same named type with incompatible field types (e.g., domain A says WidgetPosition.x: int while domain B says WidgetPosition.x: string).

If contradictions are found: the entire contract-first attempt is abandoned and the system falls back to feature-based decomposition. The rationale is that type contradictions are correctness failures that no amount of downstream work can fix.
If no contradictions: the pipeline continues.

The filter matches contracts by filename pattern ("interface-contracts" in filename), not by artifact_type. This is important because the live generator emits artifact_type="specification" despite the contracts being interface contracts by content. This mismatch was caught by Codex P1 review.

For full specification, see Contract Validation.

Stage 4: Decomposition#

decompose_by_contract receives the validated contracts and splits the project requirements into domain-scoped implementation tasks. Each task:

Owns one domain
References its domain’s contract as the source of truth for interfaces
Has a dependency on its domain’s design ghost (Stage 5)

Stage 5: Ghost Synthesis#

For each usable contract domain, Marcus synthesizes one DONE design ghost task:

{
    "name": f"Design {domain}",
    "assigned_to": "Marcus",
    "labels": ["design", "auto_completed"],
    "source_type": "contract_first_design",
    "status": "done"
}

Implementation tasks are linked to their domain’s ghost via source_context["contract_file"]. This provides:

Cato observability: ghosts appear in the DAG, showing that contract generation happened
Artifact discovery: implementation agents call get_task_context, which walks dependencies and finds contract artifacts on the ghost
Provenance: source_type="contract_first_design" identifies ghosts as contract-generated rather than manually created

Label design (Codex P2 fix): An earlier design used a shared contract_first label on all ghosts. This caused SafetyChecker._find_related_tasks to over-link every implementation task to every ghost (because label matching is set intersection). The fix was to drop the shared label entirely. Provenance lives on source_type, not labels.

Stage 6: Phase B Handoff#

The generated contracts are passed to _run_design_phase via the pre_generated_content parameter. When this parameter is supplied:

Phase A is skipped entirely (no additional LLM calls needed; contracts already exist)
Phase B runs normally, registering the pre-generated contracts as artifacts via log_artifact and log_decision

This reuses the existing design autocomplete infrastructure without modification. See Design Autocomplete for the Phase A/B lifecycle.

Fallback Behavior#

If any stage fails, the system falls back gracefully:

Failure	Behavior
Domain discovery returns no domains	Feature-based decomposition
LLM call fails during contract generation	Feature-based decomposition
Invariant 5 detects type contradiction	Feature-based decomposition
`decompose_by_contract` fails	Feature-based decomposition

Fallback is always to feature-based decomposition, never to an error state. The experiment continues regardless of which decomposition strategy runs.

Supporting Fixes (PR #331, #333)#

Two pre-existing bugs were fixed as part of the GH-320 work because they affected contract-first experiment evaluation.

Validator Parser Rewrite (PR #331)#

_parse_text_response in src/ai/validation/work_analyzer.py was silently dropping evidence and remediation fields for every validation issue. The parser assumed each ### CRITERION header started a new block, but ### Evidence and ### Remediation subheadings were also being treated as block terminators.

The fix is a two-pass block-based parser:

First pass: split text into blocks at ### CRITERION headers only (not generic markdown subheadings)
Second pass: within each block, case-insensitive keyword scan extracts evidence and remediation

A prose fallback handles LLM responses that do not follow the expected heading structure.

Integration Task Suppression Fix (PR #333)#

should_add_integration_task in src/integrations/integration_verification.py used substring match on "test" to detect test-only projects. This suppressed integration tasks for any project description mentioning “test suite”, “unit tests”, “test coverage”, etc.

The fix uses word-boundary regex plus compound phrase scrubbing: known compound phrases like “test suite” and “unit tests” are stripped before checking for standalone “test” as a keyword.

Task Type Breakdown Fix (PR #333)#

The task type breakdown in experiment logging always showed {'unknown': N} because getattr(task, "task_type", "unknown") read a non-existent attribute. The fix extracts a _task_type_breakdown helper that uses EnhancedTaskClassifier to infer task types from task metadata.

Skill Integration (PR #332)#

The /marcus skill accepts a --decomposer contract_first flag:

skills/marcus/SKILL.md documents the flag
dev-tools/experiments/templates/config.yaml.template includes a decomposer field
spawn_agents.py forwards project_options (including decomposer) via json.dumps to create_project
No runner code changes were needed because the project_options forwarding path already existed

Test-Mode Event Suppression#

MarcusServer.publish_event and Memory.__init__ use fire-and-forget asyncio.create_task calls that produce “Task was destroyed but it is pending” warnings at test teardown. These are suppressed in test mode to eliminate noise in the test suite. This is not specific to contract-first but was fixed as part of the GH-320 test health work (PR #323).

Implementation Files#

File	Purpose
`src/integrations/nlp_tools.py`	Pipeline orchestration, contract generation, ghost synthesis, Phase B handoff
`src/integrations/contract_validation.py`	Invariant 5 cross-contract consistency check
`src/integrations/integration_verification.py`	Integration task generation with “test” substring fix
`src/ai/validation/work_analyzer.py`	Validator parser (two-pass block-based)
`src/marcus_mcp/server.py`	Test-mode event suppression
`src/core/memory.py`	Test-mode persistence task suppression
`skills/marcus/SKILL.md`	`--decomposer` flag documentation
`dev-tools/experiments/templates/config.yaml.template`	Decomposer config field

Test Files#

File	Tests	Coverage
`tests/unit/integrations/test_contract_first_fallback.py`	14	Decomposition fallback, ghost synthesis, safety checker
`tests/unit/integrations/test_contract_validation.py`	7	Invariant 5 type contradiction detection
`tests/unit/integrations/test_design_autocomplete.py`	3	`pre_generated_content` Phase A skip
`tests/unit/integrations/test_nlp_tools.py`	3	Task type breakdown helper
`tests/unit/ai/validation/test_work_analyzer.py`	18	Validator parser block splitting and extraction
`tests/unit/integrations/test_integration_verification.py`	43	Integration task generation edge cases