Harness Engineering
DevSpark's harness runtime is an optional CLI execution layer for repeatable engineering workflows. It is additive: the prompt-first slash-command workflow remains unchanged, while the CLI adds a way to validate, execute, and inspect declarative workflow specs.
This page documents what is currently implemented in the repository.
CLI required — Everything on this page requires the DevSpark CLI. Install it once with:
uv tool install devspark-cli --force --from git+https://github.com/markhazleton/devspark.gitPrompt-first slash commands (
/devspark.*) work without the CLI and are documented in the Implementation Lifecycle Guide.
devspark.run — Development Workflow Aliases
devspark run <alias> is the fastest path through the spec-driven development cycle when the CLI is installed. It chains atomic prompts into a single terminal command with built-in pause points and structured artifact output.
These are CLI-only commands. There is no /devspark.run slash command and no backing file exists in .claude/commands/. Without the CLI, run the atomic commands manually in sequence (see Without the CLI below).
Available Aliases
| Alias | Chains | Pause point | Output |
|---|---|---|---|
create-spec |
specify → plan → tasks → analyze |
After analyze — review before implementing |
Reviewable spec artifact |
execute-plan |
implement → create-pr → pr-review |
After create-pr — confirm PR before review runs |
Pull request |
suggest-improvement |
capture-context → classify-improvement → create-issue → (assign-agent) → (implement) |
None by default; pass --yes to skip confirmation |
GitHub issue link |
Usage
# Start a new feature from scratch
devspark run create-spec
# Execute an existing plan through to a reviewed PR
devspark run execute-plan
# File a workflow improvement against markhazleton/devspark
devspark run suggest-improvement
devspark run suggest-improvement --yes # skip confirmation prompt
Pause and Resume
create-spec and execute-plan pause at defined checkpoints so a human can review before the workflow continues. When a pause fires, the CLI prints the exact resume command:
devspark resume <run_id>
Pause state is saved at .documentation/telemetry/runs/<run_id>.json. On resume, DevSpark verifies the persisted schema_version, workflow_id, and context_checksum — any mismatch exits with code 25 (EXIT_RESUME_FAILED).
Active paused runs can be listed with:
devspark runs list
Full Development Cycle with devspark.run
The two aliases cover the entire feature lifecycle when used in sequence:
devspark run create-spec
# → review analyze output, then resume or continue:
devspark run execute-plan
# → review and merge PR, then release:
devspark release <version>
For the full command order including release, see the Full Development → Release Cycle on the home page.
Without the CLI
Use the atomic slash commands directly in your agent:
devspark.run alias |
Manual equivalent (no CLI required) |
|---|---|
create-spec |
/devspark.specify → /devspark.plan → /devspark.tasks → /devspark.analyze |
execute-plan |
/devspark.implement → /devspark.create-pr → /devspark.pr-review |
suggest-improvement |
/devspark.specify with improvement framing, then file the issue manually |
When to Use It
Use the harness runtime when you need terminal-driven execution, repeatable local validation, or a structured audit trail for a workflow that should run the same way more than once.
Good fits:
- validate a harness spec before using it in a repeatable workflow
- run a repo or app-scoped engineering sequence and capture artifacts
- inspect why a prior run failed, retried, or aborted
- verify adapter availability on a new machine
Less suitable fits:
- ad hoc product work that already fits the prompt-first
/devspark.*flow - one-off changes where a full execution spec would add more overhead than value
Command Surface
These are CLI commands, not slash commands.
devspark doctor
devspark harness validate sample.harness.yaml
devspark harness run sample.harness.yaml --dry-run
devspark harness trace latest
devspark adapter list
devspark adapter default claude_code
devspark doctor
Checks whether the current environment is ready for harness workflows.
Current checks include:
- Python 3.11+
pydanticimportability- compatible project layout
- readable and valid
agents-registry.json gitavailability- required local CLIs for agent integrations that declare
requires_cli
The command accepts both installed-project layouts with .devspark/ and compatible source checkouts with .documentation/, pyproject.toml, and src/devspark_cli/.
devspark harness validate
Loads a YAML or JSON harness spec, validates it against the current Pydantic model and schema expectations, and exits without executing any steps.
Use it before committing a new spec or before a real run.
devspark harness run
Executes a harness spec sequentially, evaluates validations after each step, persists artifacts, and returns structured exit codes.
Important current behavior:
- exit codes are
0complete,1failed,2aborted,3validation error --dry-runwrites a run record without executing step actions--adapteroverrides the adapter for executable steps--adapter-defaultuses the saved user default adapter when present
devspark harness trace
Reads events.jsonl from a prior run and renders the recorded event stream. Use an explicit run ID or latest.
devspark adapter list
Lists the built-in adapters, whether each is available on the current machine, and the currently saved default.
devspark adapter doctor
Produces normalized readiness states for each adapter:
readywrite_approval_requiredwrite_incompatibleunavailable
Use this before hands-off lifecycle runs to confirm the selected adapter can execute write-required stages without interactive approval.
devspark adapter default
Persists a local default adapter in the user's config directory. This does not modify .devspark/ or .documentation/, so upgrades do not overwrite the preference.
Built-In Adapters
The current built-in adapters are:
noopmanualclaude_codecopilotcursor
noop
Safe default for contract tests, dry runs, and environments without an AI tool installed.
manual
Displays the prompt for a human operator and waits for an acknowledgement keypress. It requires a TTY. In non-interactive contexts it fails clearly instead of silently skipping the gate.
claude_code, copilot, cursor
These adapters call the corresponding local CLI if it is installed. Prompt content is sent through standard input rather than as a command-line argument, which avoids Windows command-length issues for larger prompts.
Spec Model
Harness specs are YAML or JSON documents with:
apiVersion: devspark.ai/v1kind: HarnessSpecnamescopedefaultsstepstelemetry
The checked-in example is sample.harness.yaml.
Step types currently implemented:
agent_taskvalidationhuman_gate
Validation rule types currently implemented:
always.passfile.existsfile.containscommand.exit_codejson.schemagit.cleanregex.match
Scope Resolution
Harness runs support repository scope and application scope.
scope.type: repowrites under the repository's.documentation/devspark/runs/scope.type: apprequires a valid multi-app registry and resolves the documentation root through the existing scope system
Current guardrails:
- the repository root is derived from the spec path, not the caller's current working directory
- malformed or path-invalid multi-app registries fail clearly instead of being treated as missing
- ambiguous scope resolution is surfaced as a harness spec error
Run Artifacts
By default, telemetry writes to .documentation/devspark/runs/<run-id>/.
Current artifact layout includes:
spec.resolved.yamlcontext.jsonevents.jsonlresult.jsonadapter-doctor.jsondecision-packet.jsonsteps/<step-id>/prompt.mdwhen a prompt was materializedsteps/<step-id>/output.txtwhen an adapter produced outputsteps/<step-id>/stdout.txtforcommand.exit_codevalidation output
Conditional artifacts:
no-change-explainer.mdwhen workflow completed but delivery evidence was unmetmax-pass-failure-report.mdwhen hands-off convergence reaches max passes without resolution
Runs are retained with a user-configurable limit. The default retention limit is 20.
Retry and Validation Behavior
After each executable step, the runner evaluates the declared validations.
- error-severity failures block success
- warning-severity failures are recorded but do not block the run
- retry policies can request another attempt on validation failure
- retry repair prompts append a
## Validation Errorssection to the next adapter prompt requireHumanAftercan force a manual pause after a configured attempt count
If a run is interrupted, the current implementation preserves the artifacts already written and records the run as aborted.
Operator Guidance
Recommended flow for a new spec:
- Run
devspark doctoron the target machine. - Validate the spec with
devspark harness validate <spec.yaml>. - Run a dry run first with
devspark harness run <spec.yaml> --dry-run. - Inspect the generated artifacts and the resolved spec.
- Execute a real run only after the adapter and validation behavior are what you expect.
For adapter-driven runs, prefer explicit adapters in the spec when reproducibility matters across machines. Use a saved adapter default when you want a machine-local convenience setting.
Hands-Off Troubleshooting
- If run fails with
write_incompatible_adapter, switch to a write-capable non-interactive adapter and rerundevspark adapter doctor. - If
delivery_statusis unmet, reviewno-change-explainer.mdand ensure changes exist undersrc/ortest/. - If convergence fails after max passes, inspect
max-pass-failure-report.mdand resolve remaining findings manually before retrying.
Relationship to the Prompt Workflow
The harness runtime does not replace DevSpark's prompt-first lifecycle.
- use slash commands to define, plan, implement, review, and release work
- use the harness runtime when you need repeatable terminal-driven execution and traceable run artifacts
That separation is intentional: prompt workflows manage human and agent collaboration, while the harness runtime executes declarative engineering flows.
Test Coverage
The harness runtime is covered by two kinds of tests, both under tests/:
pytest test modules (run via pytest tests/)
These use standard def test_* functions and are picked up automatically by the test runner.
| File | Tests | What it covers |
|---|---|---|
test_delivery_status_contract.py |
2 | Delivery gate logic: unmet when no src/ or test/ changes; met when src/ changes present |
test_convergence_loop_contract.py |
2 | Finding state transitions (open, resolved, deferred); stage iteration record structure |
Run: pytest tests/ -v
Runnable contract scripts (run directly via python)
These use a main() entry point and validate end-to-end CLI behavior through typer.testing.CliRunner or subprocess. CI runs them in the contract-validation job.
| File | What it covers |
|---|---|
test_harness_validation_contract.py |
devspark harness validate — loads and validates a spec YAML against the schema |
test_harness_spec_contract.py |
Spec model parsing, field validation, and constraint checking |
test_harness_runner_contract.py |
Full harness run lifecycle — artifacts written, exit codes, retry and abort paths |
test_harness_adapters_contract.py |
Adapter routing via agents-registry.json, step-level adapter resolution |
test_adapter_doctor_contract.py |
devspark adapter doctor — normalized readiness states (ready, write_approval_required, write_incompatible, unavailable) |
test_hands_off_lifecycle_contract.py |
--hands-off flag — write-incompatible adapter triggers abort; decision-packet.json and result.json artifacts created |
Run individually: python tests/test_harness_runner_contract.py
Run all: python tests/test_harness_validation_contract.py && python tests/test_harness_spec_contract.py && python tests/test_harness_runner_contract.py && python tests/test_harness_adapters_contract.py && python tests/test_adapter_doctor_contract.py && python tests/test_hands_off_lifecycle_contract.py