智能体审计:优化 AI 智能体成本与性能 - Openclaw Skills
作者:互联网
2026-03-26
什么是 智能体审计?
智能体审计(Agent Audit)是一个全面的诊断实用程序,旨在帮助用户了解其 AI 工作流的财务和运营效率。通过扫描配置和分析执行历史,该技能可以精确识别资源的使用效率,并在不牺牲 Openclaw Skills 生态系统内性能的前提下,找出可以削减成本的环节。无论您是在运行复杂的编码智能体还是简单的状态检查,此工具都能提供管理 AI 模型扩张所需的透明度。
它专门针对模型能力与任务复杂度之间的平衡,确保用户在有更高效的替代方案时,不会为高级模型支付超额费用。作为 Openclaw Skills 广泛套件的一部分,智能体审计通过详细的 Markdown 报告提供可操作的见解,使开发者能够针对其智能体基础设施做出数据驱动的决策。该技能专注于通过对任务进行分类并建议跨不同供应商(如 Anthropic、OpenAI 和 Google)的模型与任务匹配优化,从而实现投资回报率(ROI)最大化。
下载入口:https://github.com/openclaw/skills/tree/main/skills/sharbelayy/agent-audit
安装与下载
1. ClawHub CLI
从源直接安装技能的最快方式。
npx clawhub@latest install agent-audit
2. 手动安装
将技能文件夹复制到以下位置之一
全局模式~/.openclaw/skills/
工作区
/skills/
优先级:工作区 > 本地 > 内置
3. 提示词安装
将此提示词复制到 OpenClaw 即可自动安装。
请帮我使用 Clawhub 安装 agent-audit。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。
智能体审计 应用场景
- 审计智能体配置以发现成本节约机会。
- 分析定时任务(Cron Job)历史记录,识别高成本或低效率的任务。
- 将特定任务映射到最具成本效益的模型层级(例如 Haiku vs. Opus)。
- 计算不同模型供应商的投资回报率(ROI)和每月 Token 支出。
- 确定任务复杂度以确保模型与任务的匹配及性能。
- 发现阶段:扫描本地配置文件,将智能体映射到特定任务,并从模型名称中检测供应商。
- 历史分析:调取过去七天的定时任务和会话历史,计算平均 Token 使用量和成功率。
- 任务分类:根据推理需求和输出模式,将工作流分为简单、中等或复杂三个层级。
- 建议引擎:为简单任务识别潜在的模型降级方案,同时为关键任务保留高级模型。
- 报告生成:创建详细的 Markdown 摘要,包括潜在的成本节约以及针对 Openclaw Skills 的特定配置修改字符串。
智能体审计 配置指南
要对您的设置运行完整审计,请使用以下命令:
python3 {baseDir}/scripts/audit.py
如需特定输出格式或模拟运行,可以使用以下附加选项:
# 仅生成快速摘要
python3 {baseDir}/scripts/audit.py --format summary
# 预览将要分析的内容而不生成报告
python3 {baseDir}/scripts/audit.py --dry-run
# 将报告保存到特定文件路径
python3 {baseDir}/scripts/audit.py --output /path/to/report.md
智能体审计 数据架构与分类体系
该技能将数据划分为复杂度层级以确定模型建议。分类模式如下:
| 层级 | 推荐模型 | 标准 |
|---|---|---|
| 简单 | Haiku, GPT-4o-mini, Flash | 短输出 (<500 tokens),重复模式,健康检查 |
| 中等 | Sonnet, GPT-4o, Pro, Grok | 中等输出,需要推理,研究任务 |
| 复杂 | Opus, GPT-4.5, Ultra, Grok-2 | 长输出,多步骤推理,编码,安全审计 |
所有 Openclaw Skills 数据均在本地处理,以生成包含智能体明细、定时任务频率和月度支出估算的 Markdown 报告。
name: agent-audit
description: >
Audit your AI agent setup for performance, cost, and ROI. Scans OpenClaw config, cron jobs,
session history, and model usage to find waste and recommend optimizations.
Works with any model provider (Anthropic, OpenAI, Google, xAI, etc.).
Use when: (1) user says "audit my agents", "optimize my costs", "am I overspending on AI",
"check my model usage", "agent audit", "cost optimization", (2) user wants to know which
cron jobs are expensive vs cheap, (3) user wants model-task fit recommendations,
(4) user wants ROI analysis of their agent setup, (5) user says "where am I wasting tokens".
Agent Audit
Scan your entire OpenClaw setup and get actionable cost/performance recommendations.
What This Skill Does
- Scans config — reads OpenClaw config to map models to agents/tasks
- Analyzes cron history — checks every cron job's model, token usage, runtime, success rate
- Classifies tasks — determines complexity level of each task
- Calculates costs — per agent, per cron, per task type using provider pricing
- Recommends changes — with confidence levels and risk warnings
- Generates report — markdown report with specific savings estimates
Running the Audit
python3 {baseDir}/scripts/audit.py
Options:
python3 {baseDir}/scripts/audit.py --format markdown # Full report (default)
python3 {baseDir}/scripts/audit.py --format summary # Quick summary only
python3 {baseDir}/scripts/audit.py --dry-run # Show what would be analyzed
python3 {baseDir}/scripts/audit.py --output /path/to/report.md # Save to file
How It Works
Phase 1: Discovery
- Read OpenClaw config (
~/.openclaw/openclaw.jsonor similar) - List all cron jobs and their configurations
- List all agents and their default models
- Detect provider (Anthropic, OpenAI, Google, xAI) from model names
Phase 2: History Analysis
- Pull cron job run history (last 7 days by default)
- Calculate per-job: avg tokens, avg runtime, success rate, model used
- Pull session history where available
- Calculate total token spend by model tier
Phase 3: Task Classification
Classify each task into complexity tiers:
| Tier | Examples | Recommended Models |
|---|---|---|
| Simple | Health checks, status reports, reminders, notifications | Cheapest tier (Haiku, GPT-4o-mini, Flash, Grok-mini) |
| Medium | Content drafts, research, summarization, data analysis | Mid tier (Sonnet, GPT-4o, Pro, Grok) |
| Complex | Coding, architecture, security review, nuanced writing | Top tier (Opus, GPT-4.5, Ultra, Grok-2) |
Classification signals:
- Simple: Short output (<500 tokens), low thinking requirement, repetitive pattern, status/health tasks
- Medium: Medium output, some reasoning needed, creative but templated, research tasks
- Complex: Long output, multi-step reasoning, code generation, security-critical, tasks that previously failed on weaker models
Phase 4: Recommendations
For each task where the model tier doesn't match complexity:
?? RECOMMENDATION: Downgrade "Knox Bot Health Check" from opus to haiku
Current: anthropic/claude-opus-4 ($15/M input, $75/M output)
Suggested: anthropic/claude-haiku ($0.25/M input, $1.25/M output)
Reason: Simple status check averaging 300 output tokens
Estimated savings: $X.XX/month
Risk: LOW — task is simple pattern matching
Confidence: HIGH
Safety Rules — NEVER Recommend Downgrading:
- Coding/development tasks
- Security reviews or audits
- Tasks that have previously failed on weaker models
- Tasks where the user explicitly chose a higher model
- Complex multi-step reasoning tasks
- Anything the user flagged as critical
Phase 5: Report Generation
Output a clean markdown report with:
- Overview — total agents, crons, monthly spend estimate
- Per-agent breakdown — model, usage, cost
- Per-cron breakdown — model, frequency, avg tokens, cost
- Recommendations — sorted by savings potential
- Total potential savings — monthly estimate
- One-liner config changes — exact model strings to swap
Model Pricing Reference
See references/model-pricing.md for current pricing across all providers. Update this file when prices change.
Task Classification Details
See references/task-classification.md for detailed heuristics on how tasks are classified into complexity tiers.
Important Notes
- This skill is read-only — it never changes your config automatically
- All recommendations include risk levels and confidence scores
- When unsure about a task's complexity, it defaults to keeping the current model
- The audit should be re-run periodically (monthly) as usage patterns change
- Token counts are estimates based on cron history — actual costs depend on your provider's billing
相关推荐
专题
+ 收藏
+ 收藏
+ 收藏
+ 收藏
+ 收藏
最新数据
相关文章
信号管道:自动化营销情报工具 - Openclaw Skills
技能收益追踪器:监控 Openclaw 技能并实现变现
AI 合规准备就绪度:评估与治理工具 - Openclaw Skills
FOSMVVM ServerRequest 测试生成器:自动化 API 测试 - Openclaw Skills
酒店搜索器:AI 赋能的住宿与位置情报 - Openclaw Skills
Dub 链接 API:程序化链接管理 - Openclaw Skills
IntercomSwap:P2P BTC 与 USDT 跨链兑换 - Openclaw Skills
spotplay:macOS 原生 Spotify 播放控制 - Openclaw Skills
DeepSeek OCR:AI驱动的图像文本识别 - Openclaw Skills
Web Navigator:自动化网页研究与浏览 - Openclaw Skills
AI精选
