Checkmate:自主任务完成与质量强制执行 - Openclaw Skills

作者:互联网

2026-03-30

AI教程

什么是 Checkmate?

Checkmate 是为 Openclaw Skills 生态系统设计的高权限技能,它将正确性置于速度之上。它作为一个确定性的 Python 编排器运行,管理执行者代理和裁判代理之间的递归循环。通过将通用目标转化为具体的通过/失败标准,Checkmate 保证在满足每一项要求之前,任何工作都不会被视为已完成。

与标准的一次性 AI 提示词不同,Checkmate 维护一个有状态的工作空间,通过多次迭代跟踪进度。它支持交互式的人机协作检查点和全自主的批处理模式,对于在 AI 工作流中需要高保真输出和严格验证的开发者来说,它是必不可少的工具。

下载入口:https://github.com/openclaw/skills/tree/main/skills/insipidpoint/checkmate

安装与下载

1. ClawHub CLI

从源直接安装技能的最快方式。

npx clawhub@latest install checkmate

2. 手动安装

将技能文件夹复制到以下位置之一

全局模式 ~/.openclaw/skills/ 工作区 /skills/

优先级:工作区 > 本地 > 内置

3. 提示词安装

将此提示词复制到 OpenClaw 即可自动安装。

请帮我使用 Clawhub 安装 checkmate。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。

Checkmate 应用场景

  • 开发必须通过特定单元测试或符合严格技术规范的代码。
  • 生成必须遵守定义好的质量标准的平衡技术文档或报告。
  • 执行必须覆盖特定领域且不遗漏关键细节的深度研究任务。
  • 处理任何通常需要手动迭代才能达到满意结果的任务。
Checkmate 工作原理
  1. 编排器启动引入循环,将用户任务转换为通过/失败标准的草案。
  2. 用户审核标准并提供反馈或批准以锁定需求。
  3. 生成一个具有完整主机代理权限的执行者代理,根据标准执行任务。
  4. 一个独立的裁判代理评估执行者的输出,并发布“通过”或“失败”的裁决。
  5. 如果裁决为“失败”,裁判将识别具体差距,并将其反馈到下一次迭代中。
  6. 该过程循环进行,直到满足所有标准或达到最大迭代限制。

Checkmate 配置指南

要使用 Checkmate,您的 PATH 中必须提供 Openclaw 平台 CLI。此技能使用标准库在 Python 3 上运行。

请按照以下步骤初始化技能:

# 1. 解析您的会话 UUID 以允许回合注入
openclaw gateway call sessions.list --params '{"limit":1}' --json

# 2. 为任务创建专用工作空间
bash scripts/workspace.sh /tmp "TASK_DESCRIPTION"

# 3. 在后台启动编排器
python3 scripts/run.py --workspace /path/to/work --task "DESCRIPTION" --session-uuid UUID

Checkmate 数据架构与分类体系

Checkmate 维护结构化的工作空间以确保持久性和透明度。数据组织如下:

文件/文件夹 用途
task.md 包含主要任务描述。
criteria.md 裁判使用的锁定需求。
state.json 存储当前迭代和状态,用于支持恢复。
iter-N/ 包含每个回合的 output.md(执行者)和 verdict.md(裁判)的子目录。
final-output.md 在获得“通过”裁决后生成的最终经过验证的结果。
pending-input.json 用于在编排器等待人工输入时发出信号的桥接文件。
name: checkmate
description: "Enforces task completion: turns your goal into pass/fail criteria, runs a worker, judges the output, feeds back what's missing, and loops until every criterion passes. Nothing ships until it's truly done. Trigger: 'checkmate: TASK'"
requires:
  cli:
    - openclaw  # platform CLI: sessions.list, agent spawn, message send
privileges: high  # spawned workers inherit full host-agent runtime (exec, OAuth, all skills)

Checkmate

A deterministic Python loop (scripts/run.py) calls an LLM for worker and judge roles. Nothing leaves until it passes — and you stay in control at every checkpoint.

Requirements

  • OpenClaw platform CLI (openclaw) — must be available in PATH. Used for:
    • openclaw gateway call sessions.list — resolve session UUID for turn injection
    • openclaw agent --session-id — inject checkpoint messages into the live session
    • openclaw message send — fallback channel delivery (e.g. T@elegrimm, Signal)
  • Python 3run.py is pure stdlib; no pip packages required
  • No separate API keys or env vars needed — routes through the gateway's existing OAuth

Security & Privilege Model

?? This is a high-privilege skill. Read before using in batch/automated mode.

Spawned workers and judges inherit full host-agent runtime, including:

  • exec (arbitrary shell commands)
  • web_search, web_fetch
  • All installed skills (including those with OAuth-bound credentials — Gmail, Drive, etc.)
  • sessions_spawn (workers can spawn further sub-agents)

This means the task description you provide directly controls what the worker does — treat it like code you're about to run, not a message you're about to send.

Batch mode (--no-interactive) removes all human gates. In interactive mode (default), you approve criteria and each checkpoint before the loop continues. In batch mode, criteria are auto-approved and the loop runs to completion autonomously — only use this for tasks and environments you fully trust.

User-input bridging writes arbitrary content to disk. When you reply to a checkpoint, the main agent writes your reply verbatim to user-input.md in the workspace. The orchestrator reads it and acts on it. Don't relay untrusted third-party content as checkpoint replies.

When to Use

Use checkmate when correctness matters more than speed — when "good enough on the first try" isn't acceptable.

Good fits:

  • Code that must pass tests or meet a spec
  • Docs or reports that must hit a defined quality bar
  • Research that must be thorough and cover specific ground
  • Any task where you'd otherwise iterate manually until satisfied

Trigger phrases (say any of these):

  • checkmate: TASK
  • keep iterating until it passes
  • don't stop until done
  • until it passes
  • quality loop: TASK
  • iterate until satisfied
  • judge and retry
  • keep going until done

Architecture

scripts/run.py  (deterministic Python while loop — the orchestrator)
  ├─ Intake loop [up to max_intake_iter, default 5]:
  │    ├─ Draft criteria (intake prompt + task + refinement feedback)
  │    ├─ ? USER REVIEW: show draft → wait for approval or feedback
  │    │     approved? → lock criteria.md
  │    │     feedback? → refine, next intake iteration
  │    └─ (non-interactive: criteria-judge gates instead of user)
  │
  ├─ ? PRE-START GATE: show final task + criteria → user confirms "go"
  │         (edit task / cancel supported here)
  │
  └─ Main loop [up to max_iter, default 10]:
       ├─ Worker: spawn agent session → iter-N/output.md
       │          (full runtime: exec, web_search, all skills, OAuth auth)
       ├─ Judge:  spawn agent session → iter-N/verdict.md
       ├─ PASS?  → write final-output.md, notify user, exit
       └─ FAIL?  → extract gaps → ? CHECKPOINT: show score + gaps to user
                     continue?  → next iteration (with judge gaps)
                     redirect:X → next iteration (with user direction appended)
                     stop?      → end loop, take best result so far

Interactive mode (default): user approves criteria, confirms pre-start, and reviews each FAIL checkpoint. Batch mode (--no-interactive): fully autonomous; criteria-judge gates intake, no checkpoints.

User Input Bridge

When the orchestrator needs user input, it:

  1. Writes workspace/pending-input.json (kind + workspace path)
  2. Sends a notification via --recipient and --channel
  3. Polls workspace/user-input.md every 5s (up to --checkpoint-timeout minutes)

The main agent acts as the bridge: when pending-input.json exists and the user replies, the agent writes their response to user-input.md. The orchestrator picks it up automatically.

Each agent session is spawned via:

openclaw agent --session-id  --message  --timeout  --json

Routes through the gateway WebSocket using existing OAuth — no separate API key. Workers get full agent runtime: exec, web_search, web_fetch, all skills, sessions_spawn.

Your Job (main agent)

When checkmate is triggered:

  1. Get your session UUID (for direct agent-turn injection):

    openclaw gateway call sessions.list --params '{"limit":1}' --json r
      | python3 -c "import json,sys; s=json.load(sys.stdin)['sessions'][0]; print(s['sessionId'])"
    

    Also note your --recipient (channel user/ch@t ID) and --channel as fallback.

  2. Create workspace:

    bash /scripts/workspace.sh /tmp "TASK"
    

    Prints the workspace path. Write the full task to workspace/task.md if needed.

  3. Run the orchestrator (background exec):

    python3 /scripts/run.py r
      --workspace /tmp/checkmate-TIMESTAMP r
      --task "FULL TASK DESCRIPTION" r
      --max-iter 10 r
      --session-uuid YOUR_SESSION_UUID r
      --recipient YOUR_RECIPIENT_ID r
      --channel 
    

    Use exec with background=true. This runs for as long as needed. Add --no-interactive for fully autonomous runs (no user checkpoints).

  4. Tell the user checkmate is running, what it's working on, and that they'll receive criteria drafts and checkpoint messages via your configured channel to review and approve.

  5. Bridge user replies: When user responds to a checkpoint message, check for pending-input.json and write their response to workspace/user-input.md.

Bridging User Input

When a checkpoint message arrives (the orchestrator sent the user a criteria/approval/checkpoint request), bridge their reply:

# Find active pending input
cat /checkmate-*/pending-input.json 2>/dev/null

# Route user's reply
echo "USER REPLY HERE" > /path/to/workspace/user-input.md

The orchestrator polls for this file every 5 seconds. Once written, it resumes automatically and deletes the file.

Accepted replies at each gate:

Gate Continue Redirect Cancel
Criteria review "ok", "approve", "lgtm" any feedback text
Pre-start "go", "start", "ok" "edit task: NEW TASK" "cancel"
Iteration checkpoint "continue", (empty) "redirect: DIRECTION" "stop"

Parameters

Flag Default Notes
--max-intake-iter 5 Intake criteria refinement iterations
--max-iter 10 Main loop iterations (increase to 20 for complex tasks)
--worker-timeout 3600s Per worker session
--judge-timeout 300s Per judge session
--session-uuid Agent session UUID (from sessions.list); used for direct turn injection — primary notification path
--recipient Channel recipient ID (e.g. user/ch@t ID, E.164 phone number); fallback if injection fails
--channel Delivery channel for fallback notifications (e.g. telegram, whatsapp, signal)
--no-interactive off Disable user checkpoints (batch mode)
--checkpoint-timeout 60 Minutes to wait for user reply at each checkpoint

Workspace layout

memory/checkmate-YYYYMMDD-HHMMSS/
├── task.md               # task description (user may edit pre-start)
├── criteria.md           # locked after intake
├── feedback.md           # accumulated judge gaps + user direction
├── state.json            # {iteration, status} — resume support
├── pending-input.json    # written when waiting for user; deleted after response
├── user-input.md         # agent writes user's reply here; read + deleted by orchestrator
├── intake-01/
│   ├── criteria-draft.md
│   ├── criteria-verdict.md  (non-interactive only)
│   └── user-feedback.md     (interactive: user's review comments)
├── iter-01/
│   ├── output.md         # worker output
│   └── verdict.md        # judge verdict
└── final-output.md       # written on completion

Resume

If the script is interrupted, just re-run it with the same --workspace. It reads state.json and skips completed steps. Locked criteria.md is reused; completed iter-N/output.md files are not re-run.

Prompts

Active prompts called by run.py:

  • prompts/intake.md — converts task → criteria draft
  • prompts/criteria-judge.md — evaluates criteria quality (APPROVED / NEEDS_WORK) — used in non-interactive mode
  • prompts/worker.md — worker prompt (variables: TASK, CRITERIA, FEEDBACK, ITERATION, MAX_ITER, OUTPUT_PATH)
  • prompts/judge.md — evaluates output against criteria (PASS / FAIL)

Reference only (not called by run.py):

  • prompts/orchestrator.md — architecture documentation explaining the design rationale