Token Saver 75+: 面向 Openclaw Skills 的高级模型路由

作者:互联网

2026-03-28

AI教程

什么是 Token Saver 75+?

Token Saver 75+ 是一款专为使用 Openclaw Skills 的开发者设计的精密模型路由与优化协议,旨在保持高性能的同时大幅降低运营成本。通过实施分层分类系统,该技能确保昂贵的模型仅用于复杂推理,而将代码生成、摘要和格式化等常规任务委托给专业或免费的模型端点。

该技能遵循“全面理解,低成本执行”的原则。它在您的智能体工作流中充当智能流量控制器,使用会话派生(session spawning)将工作负载分配到包括 Groq、Codex 和 Claude Opus 在内的各种模型集群中。除了路由之外,它还执行严格的输出压缩规则,以确保生成的每个 Token 都提供最大价值,消除冗余和不必要的叙述。

下载入口:https://github.com/openclaw/skills/tree/main/skills/mariovallereyes/token-saver-75plus

安装与下载

1. ClawHub CLI

从源直接安装技能的最快方式。

npx clawhub@latest install token-saver-75plus

2. 手动安装

将技能文件夹复制到以下位置之一

全局模式 ~/.openclaw/skills/ 工作区 /skills/

优先级:工作区 > 本地 > 内置

3. 提示词安装

将此提示词复制到 OpenClaw 即可自动安装。

请帮我使用 Clawhub 安装 token-saver-75plus。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。

Token Saver 75+ 应用场景

  • 通过将所有生成任务路由到专业的 Codex 模型,降低大规模代码重构的 API 开销。
  • 利用免费推理层处理海量文本分析和格式化,降低总持有成本。
  • 管理复杂的智能体策略,让 Opus 等高推理模型仅处理顶层编排,同时派生廉价智能体执行子任务。
  • 在开发环境中强制执行简洁的技术沟通,减少干扰性的 AI 闲聊。
  • 实施自动故障恢复,仅在低成本尝试失败时才将任务升级到能力更强的模型。
Token Saver 75+ 工作原理
  1. 请求分类器分析每个传入的消息,以确定其属于第 1 层(简单)、第 2 层(批量)、第 3 层(结构化/代码)还是第 4 层(战略)。
  2. 工具门控逻辑检查是否可以使用现有内存完成请求,或者是否可以批量处理多个工具调用以节省 Token。
  3. 编排器确定成本最低的可用模型,并利用 sessions_spawn 命令委托执行层。
  4. 对于代码特定任务,系统会自动派生一个 Codex 实例,确保主编排器专注于高层目标。
  5. 所有模型输出都通过压缩模板(如 CAUSE-FIX-VERIFY)进行过滤,以确保响应保持在其对应层级分配的 Token 预算内。
  6. 可选地附加测量尾标,以提供 Token 使用情况和所采用的具体路由路径的透明度。

Token Saver 75+ 配置指南

要将此协议集成到您的 Openclaw Skills 工作流中,您必须定义模型路由表并确保各个提供商的 API 凭据处于激活状态。

# 将协议安装到您的智能体目录中
openclaw install token-saver-75plus

# 配置您的模型端点
export GROQ_API_KEY="your_key"
export OPENAI_API_KEY="your_key"
export ANTHROPIC_API_KEY="your_key"

Token Saver 75+ 数据架构与分类体系

该协议利用结构化的分类和路由架构来管理请求的生命周期。它为每次交互跟踪以下元数据:

属性 描述
请求层级 基于复杂性和推理要求的 T1 到 T4 分类。
派生任务 传递给子智能体的特定且富有上下文的指令字符串。
路由图 所使用的模型序列(例如:Llama-3 用于批量处理,Codex 用于实现)。
Token 预算 基于识别层级的最大允许输出长度。
测量 会话结束时报告的最终 Token 计数和效率增量。
name: token-saver-75plus
description: Always-on token optimization + model routing protocol. Auto-classifies requests (T1-T4), routes execution to the cheapest capable model via sessions_spawn, and applies maximum output compression. Target: 75%+ token savings.

Token Saver 75+ with Model Routing

Core Principle

Understand fully, execute cheaply. The orchestrator must fully understand the task before routing. Never sacrifice comprehension for speed.

Request Classifier (silent, every message)

Tier Pattern Orchestrator Executor
T1 yes/no, status, trivial facts, quick lookups Handle alone
T2 summaries, how-to, lists, bulk processing, formatting Handle alone OR spawn Groq Groq (FREE)
T3 debugging, multi-step, code generation, structured analysis Orchestrate + spawn Codex for code, Groq for bulk
T4 strategy, complex decisions, multi-agent coordination, creative Spawn Opus Opus orchestrates, spawns Codex/Groq from within

Model Routing Table

Model Use For Cost Spawn with
groq/llama-3.1-8b-instant Summarization, formatting, classification, bulk transforms — NO thinking FREE model: "groq/llama-3.1-8b-instant"
openai/gpt-5.3-codex ALL code generation, code review, refactoring $$$ model: "openai/gpt-5.3-codex"
openai/gpt-5.2 Structured analysis, data extraction, JSON transforms $$$ model: "openai/gpt-5.2"
anthropic/claude-opus-4-6 Strategy, complex orchestration, failure recovery (T4 only) $$$$ model: "anthropic/claude-opus-4-6"

Routing via sessions_spawn

When to spawn (MANDATORY)

  • Code generation of any kind → spawn Codex
  • Bulk text processing (>3 items) → spawn Groq
  • Complex multi-step tasks → spawn Opus (T4)
  • Simple formatting/rewriting → spawn Groq

When NOT to spawn

  • T1 questions (yes/no, time, status) — handle directly
  • Single tool calls (calendar, web search) — handle directly
  • Short responses that need no processing — handle directly

Spawn patterns

Groq (free bulk work):

sessions_spawn(
  task: "",
  model: "groq/llama-3.1-8b-instant"
)

Codex (all code):

sessions_spawn(
  task: "Write  code that . Include comments. Output the complete file.",
  model: "openai/gpt-5.3-codex"
)

Opus (T4 strategy):

sessions_spawn(
  task: ". You have full tool access. Use sessions_spawn with Codex for code and Groq for bulk subtasks.",
  model: "anthropic/claude-opus-4-6"
)

Critical spawn rules

  1. Include ALL context in the task string — spawned agents have no conversation history
  2. Be specific — vague tasks waste tokens on clarification
  3. One task per spawn — don't bundle unrelated work
  4. For code: always use Codex — never write code yourself

Output Compression (applies to ALL tiers, ALL models)

Templates

  • STATUS: OK/WARN/FAIL one-liner
  • CHOICE: A vs B → Recommend: X (1 line why)
  • CAUSE→FIX→VERIFY: 3 bullets max
  • RESULT: data/output directly, no wrap-up

Rules

  • No filler. No restating the question. Lead with the answer.
  • Bullets/tables/code > prose.
  • Do not narrate routine tool calls.
  • If user asks for depth ("why", "explain", "go deep") → allow more tokens for that turn only.

Budget by tier

Tier Max output
T1 1-3 lines
T2 5-15 bullets
T3 Structured sections, <400 words
T4 Longer allowed, still dense

Tool Gating (before ANY tool call)

  1. Already known? → No tool.
  2. Batchable? → Parallelize.
  3. Can a spawned Groq handle it? → Spawn instead of doing it yourself.
  4. Cheapest path? → memory_search > partial read > full read > web.
  5. Needed? → Do not fetch "just in case."

Failure Protocol

  • If Groq spawn fails → retry with GPT-5.2
  • If Codex spawn fails → retry with GPT-5.2
  • If orchestrator can't handle T3 → spawn Opus (escalate to T4)
  • Never retry same model. Escalate.

Measurement (when asked or during testing)

Append: [~X tokens | Tier: Tn | Route: model(s) used]