Checkmate:自主任务完成与质量强制执行 - Openclaw Skills
作者:互联网
2026-03-30
什么是 Checkmate?
Checkmate 是为 Openclaw Skills 生态系统设计的高权限技能,它将正确性置于速度之上。它作为一个确定性的 Python 编排器运行,管理执行者代理和裁判代理之间的递归循环。通过将通用目标转化为具体的通过/失败标准,Checkmate 保证在满足每一项要求之前,任何工作都不会被视为已完成。
与标准的一次性 AI 提示词不同,Checkmate 维护一个有状态的工作空间,通过多次迭代跟踪进度。它支持交互式的人机协作检查点和全自主的批处理模式,对于在 AI 工作流中需要高保真输出和严格验证的开发者来说,它是必不可少的工具。
下载入口:https://github.com/openclaw/skills/tree/main/skills/insipidpoint/checkmate
安装与下载
1. ClawHub CLI
从源直接安装技能的最快方式。
npx clawhub@latest install checkmate
2. 手动安装
将技能文件夹复制到以下位置之一
全局模式~/.openclaw/skills/
工作区
/skills/
优先级:工作区 > 本地 > 内置
3. 提示词安装
将此提示词复制到 OpenClaw 即可自动安装。
请帮我使用 Clawhub 安装 checkmate。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。
Checkmate 应用场景
- 开发必须通过特定单元测试或符合严格技术规范的代码。
- 生成必须遵守定义好的质量标准的平衡技术文档或报告。
- 执行必须覆盖特定领域且不遗漏关键细节的深度研究任务。
- 处理任何通常需要手动迭代才能达到满意结果的任务。
- 编排器启动引入循环,将用户任务转换为通过/失败标准的草案。
- 用户审核标准并提供反馈或批准以锁定需求。
- 生成一个具有完整主机代理权限的执行者代理,根据标准执行任务。
- 一个独立的裁判代理评估执行者的输出,并发布“通过”或“失败”的裁决。
- 如果裁决为“失败”,裁判将识别具体差距,并将其反馈到下一次迭代中。
- 该过程循环进行,直到满足所有标准或达到最大迭代限制。
Checkmate 配置指南
要使用 Checkmate,您的 PATH 中必须提供 Openclaw 平台 CLI。此技能使用标准库在 Python 3 上运行。
请按照以下步骤初始化技能:
# 1. 解析您的会话 UUID 以允许回合注入
openclaw gateway call sessions.list --params '{"limit":1}' --json
# 2. 为任务创建专用工作空间
bash scripts/workspace.sh /tmp "TASK_DESCRIPTION"
# 3. 在后台启动编排器
python3 scripts/run.py --workspace /path/to/work --task "DESCRIPTION" --session-uuid UUID
Checkmate 数据架构与分类体系
Checkmate 维护结构化的工作空间以确保持久性和透明度。数据组织如下:
| 文件/文件夹 | 用途 |
|---|---|
task.md |
包含主要任务描述。 |
criteria.md |
裁判使用的锁定需求。 |
state.json |
存储当前迭代和状态,用于支持恢复。 |
iter-N/ |
包含每个回合的 output.md(执行者)和 verdict.md(裁判)的子目录。 |
final-output.md |
在获得“通过”裁决后生成的最终经过验证的结果。 |
pending-input.json |
用于在编排器等待人工输入时发出信号的桥接文件。 |
name: checkmate
description: "Enforces task completion: turns your goal into pass/fail criteria, runs a worker, judges the output, feeds back what's missing, and loops until every criterion passes. Nothing ships until it's truly done. Trigger: 'checkmate: TASK'"
requires:
cli:
- openclaw # platform CLI: sessions.list, agent spawn, message send
privileges: high # spawned workers inherit full host-agent runtime (exec, OAuth, all skills)
Checkmate
A deterministic Python loop (scripts/run.py) calls an LLM for worker and judge roles. Nothing leaves until it passes — and you stay in control at every checkpoint.
Requirements
- OpenClaw platform CLI (
openclaw) — must be available inPATH. Used for:openclaw gateway call sessions.list— resolve session UUID for turn injectionopenclaw agent --session-id— inject checkpoint messages into the live sessionopenclaw message send— fallback channel delivery (e.g. T@elegrimm, Signal)
- Python 3 —
run.pyis pure stdlib; no pip packages required - No separate API keys or env vars needed — routes through the gateway's existing OAuth
Security & Privilege Model
?? This is a high-privilege skill. Read before using in batch/automated mode.
Spawned workers and judges inherit full host-agent runtime, including:
exec(arbitrary shell commands)web_search,web_fetch- All installed skills (including those with OAuth-bound credentials — Gmail, Drive, etc.)
sessions_spawn(workers can spawn further sub-agents)
This means the task description you provide directly controls what the worker does — treat it like code you're about to run, not a message you're about to send.
Batch mode (--no-interactive) removes all human gates. In interactive mode (default), you approve criteria and each checkpoint before the loop continues. In batch mode, criteria are auto-approved and the loop runs to completion autonomously — only use this for tasks and environments you fully trust.
User-input bridging writes arbitrary content to disk. When you reply to a checkpoint, the main agent writes your reply verbatim to user-input.md in the workspace. The orchestrator reads it and acts on it. Don't relay untrusted third-party content as checkpoint replies.
When to Use
Use checkmate when correctness matters more than speed — when "good enough on the first try" isn't acceptable.
Good fits:
- Code that must pass tests or meet a spec
- Docs or reports that must hit a defined quality bar
- Research that must be thorough and cover specific ground
- Any task where you'd otherwise iterate manually until satisfied
Trigger phrases (say any of these):
checkmate: TASKkeep iterating until it passesdon't stop until doneuntil it passesquality loop: TASKiterate until satisfiedjudge and retrykeep going until done
Architecture
scripts/run.py (deterministic Python while loop — the orchestrator)
├─ Intake loop [up to max_intake_iter, default 5]:
│ ├─ Draft criteria (intake prompt + task + refinement feedback)
│ ├─ ? USER REVIEW: show draft → wait for approval or feedback
│ │ approved? → lock criteria.md
│ │ feedback? → refine, next intake iteration
│ └─ (non-interactive: criteria-judge gates instead of user)
│
├─ ? PRE-START GATE: show final task + criteria → user confirms "go"
│ (edit task / cancel supported here)
│
└─ Main loop [up to max_iter, default 10]:
├─ Worker: spawn agent session → iter-N/output.md
│ (full runtime: exec, web_search, all skills, OAuth auth)
├─ Judge: spawn agent session → iter-N/verdict.md
├─ PASS? → write final-output.md, notify user, exit
└─ FAIL? → extract gaps → ? CHECKPOINT: show score + gaps to user
continue? → next iteration (with judge gaps)
redirect:X → next iteration (with user direction appended)
stop? → end loop, take best result so far
Interactive mode (default): user approves criteria, confirms pre-start, and reviews each FAIL checkpoint. Batch mode (--no-interactive): fully autonomous; criteria-judge gates intake, no checkpoints.
User Input Bridge
When the orchestrator needs user input, it:
- Writes
workspace/pending-input.json(kind + workspace path) - Sends a notification via
--recipientand--channel - Polls
workspace/user-input.mdevery 5s (up to--checkpoint-timeoutminutes)
The main agent acts as the bridge: when pending-input.json exists and the user replies, the agent writes their response to user-input.md. The orchestrator picks it up automatically.
Each agent session is spawned via:
openclaw agent --session-id --message --timeout --json
Routes through the gateway WebSocket using existing OAuth — no separate API key. Workers get full agent runtime: exec, web_search, web_fetch, all skills, sessions_spawn.
Your Job (main agent)
When checkmate is triggered:
-
Get your session UUID (for direct agent-turn injection):
openclaw gateway call sessions.list --params '{"limit":1}' --json r | python3 -c "import json,sys; s=json.load(sys.stdin)['sessions'][0]; print(s['sessionId'])"Also note your
--recipient(channel user/ch@t ID) and--channelas fallback. -
Create workspace:
bash/scripts/workspace.sh /tmp "TASK" Prints the workspace path. Write the full task to
workspace/task.mdif needed. -
Run the orchestrator (background exec):
python3/scripts/run.py r --workspace /tmp/checkmate-TIMESTAMP r --task "FULL TASK DESCRIPTION" r --max-iter 10 r --session-uuid YOUR_SESSION_UUID r --recipient YOUR_RECIPIENT_ID r --channel Use
execwithbackground=true. This runs for as long as needed. Add--no-interactivefor fully autonomous runs (no user checkpoints). -
Tell the user checkmate is running, what it's working on, and that they'll receive criteria drafts and checkpoint messages via your configured channel to review and approve.
-
Bridge user replies: When user responds to a checkpoint message, check for
pending-input.jsonand write their response toworkspace/user-input.md.
Bridging User Input
When a checkpoint message arrives (the orchestrator sent the user a criteria/approval/checkpoint request), bridge their reply:
# Find active pending input
cat /checkmate-*/pending-input.json 2>/dev/null
# Route user's reply
echo "USER REPLY HERE" > /path/to/workspace/user-input.md
The orchestrator polls for this file every 5 seconds. Once written, it resumes automatically and deletes the file.
Accepted replies at each gate:
| Gate | Continue | Redirect | Cancel |
|---|---|---|---|
| Criteria review | "ok", "approve", "lgtm" | any feedback text | — |
| Pre-start | "go", "start", "ok" | "edit task: NEW TASK" | "cancel" |
| Iteration checkpoint | "continue", (empty) | "redirect: DIRECTION" | "stop" |
Parameters
| Flag | Default | Notes |
|---|---|---|
--max-intake-iter |
5 | Intake criteria refinement iterations |
--max-iter |
10 | Main loop iterations (increase to 20 for complex tasks) |
--worker-timeout |
3600s | Per worker session |
--judge-timeout |
300s | Per judge session |
--session-uuid |
— | Agent session UUID (from sessions.list); used for direct turn injection — primary notification path |
--recipient |
— | Channel recipient ID (e.g. user/ch@t ID, E.164 phone number); fallback if injection fails |
--channel |
— | Delivery channel for fallback notifications (e.g. telegram, whatsapp, signal) |
--no-interactive |
off | Disable user checkpoints (batch mode) |
--checkpoint-timeout |
60 | Minutes to wait for user reply at each checkpoint |
Workspace layout
memory/checkmate-YYYYMMDD-HHMMSS/
├── task.md # task description (user may edit pre-start)
├── criteria.md # locked after intake
├── feedback.md # accumulated judge gaps + user direction
├── state.json # {iteration, status} — resume support
├── pending-input.json # written when waiting for user; deleted after response
├── user-input.md # agent writes user's reply here; read + deleted by orchestrator
├── intake-01/
│ ├── criteria-draft.md
│ ├── criteria-verdict.md (non-interactive only)
│ └── user-feedback.md (interactive: user's review comments)
├── iter-01/
│ ├── output.md # worker output
│ └── verdict.md # judge verdict
└── final-output.md # written on completion
Resume
If the script is interrupted, just re-run it with the same --workspace. It reads state.json and skips completed steps. Locked criteria.md is reused; completed iter-N/output.md files are not re-run.
Prompts
Active prompts called by run.py:
prompts/intake.md— converts task → criteria draftprompts/criteria-judge.md— evaluates criteria quality (APPROVED / NEEDS_WORK) — used in non-interactive modeprompts/worker.md— worker prompt (variables: TASK, CRITERIA, FEEDBACK, ITERATION, MAX_ITER, OUTPUT_PATH)prompts/judge.md— evaluates output against criteria (PASS / FAIL)
Reference only (not called by run.py):
prompts/orchestrator.md— architecture documentation explaining the design rationale
相关推荐
专题
+ 收藏
+ 收藏
+ 收藏
+ 收藏
+ 收藏
最新数据
相关文章
Memoria: AI 智能体结构化记忆系统 - Openclaw Skills
Deno 运行时专家:安全 TypeScript 开发 - Openclaw Skills
为 AI 代理部署 Spark Bitcoin L2 代理 - Openclaw Skills
加密货币价格技能:实时市场数据集成 - Openclaw Skills
Happenstance:专业人脉搜索与研究 - Openclaw Skills
飞书日历技能:通过 Openclaw Skills 自动化日程安排
顾问委员会:多人格 AI 加密货币分析 - Openclaw Skills
CRIF:面向 AI Agent 的加密深度研究框架 - Openclaw Skills
个人社交:社交生活与生日助手 - Openclaw Skills
Maccabi 药房库存查询:实时药品库存 - Openclaw Skills
AI精选
