智能体审计：优化 AI 智能体成本与性能

AI智能体脚本智能办公脚本自动化游戏脚本浏览器自动化脚本服务器脚本

智能体审计：优化 AI 智能体成本与性能 - Openclaw Skills

作者：互联网

2026-03-26

AI教程

什么是智能体审计？

智能体审计（Agent Audit）是一个全面的诊断实用程序，旨在帮助用户了解其 AI 工作流的财务和运营效率。通过扫描配置和分析执行历史，该技能可以精确识别资源的使用效率，并在不牺牲 Openclaw Skills 生态系统内性能的前提下，找出可以削减成本的环节。无论您是在运行复杂的编码智能体还是简单的状态检查，此工具都能提供管理 AI 模型扩张所需的透明度。

它专门针对模型能力与任务复杂度之间的平衡，确保用户在有更高效的替代方案时，不会为高级模型支付超额费用。作为 Openclaw Skills 广泛套件的一部分，智能体审计通过详细的 Markdown 报告提供可操作的见解，使开发者能够针对其智能体基础设施做出数据驱动的决策。该技能专注于通过对任务进行分类并建议跨不同供应商（如 Anthropic、OpenAI 和 Google）的模型与任务匹配优化，从而实现投资回报率（ROI）最大化。

下载入口:https://github.com/openclaw/skills/tree/main/skills/sharbelayy/agent-audit

安装与下载

1. ClawHub CLI

从源直接安装技能的最快方式。

npx clawhub@latest install agent-audit

2. 手动安装

将技能文件夹复制到以下位置之一

全局模式 ~/.openclaw/skills/ 工作区 /skills/

优先级：工作区 > 本地 > 内置

3. 提示词安装

将此提示词复制到 OpenClaw 即可自动安装。

请帮我使用 Clawhub 安装 agent-audit。如果尚未安装 Clawhub，请先安装（npm i -g clawhub）。

智能体审计应用场景

审计智能体配置以发现成本节约机会。
分析定时任务（Cron Job）历史记录，识别高成本或低效率的任务。
将特定任务映射到最具成本效益的模型层级（例如 Haiku vs. Opus）。
计算不同模型供应商的投资回报率（ROI）和每月 Token 支出。
确定任务复杂度以确保模型与任务的匹配及性能。

智能体审计工作原理

发现阶段：扫描本地配置文件，将智能体映射到特定任务，并从模型名称中检测供应商。
历史分析：调取过去七天的定时任务和会话历史，计算平均 Token 使用量和成功率。
任务分类：根据推理需求和输出模式，将工作流分为简单、中等或复杂三个层级。
建议引擎：为简单任务识别潜在的模型降级方案，同时为关键任务保留高级模型。
报告生成：创建详细的 Markdown 摘要，包括潜在的成本节约以及针对 Openclaw Skills 的特定配置修改字符串。

智能体审计配置指南

要对您的设置运行完整审计，请使用以下命令：

python3 {baseDir}/scripts/audit.py

如需特定输出格式或模拟运行，可以使用以下附加选项：

# 仅生成快速摘要
python3 {baseDir}/scripts/audit.py --format summary

# 预览将要分析的内容而不生成报告
python3 {baseDir}/scripts/audit.py --dry-run

# 将报告保存到特定文件路径
python3 {baseDir}/scripts/audit.py --output /path/to/report.md

智能体审计数据架构与分类体系

该技能将数据划分为复杂度层级以确定模型建议。分类模式如下：

层级	推荐模型	标准
简单	Haiku, GPT-4o-mini, Flash	短输出 (<500 tokens)，重复模式，健康检查
中等	Sonnet, GPT-4o, Pro, Grok	中等输出，需要推理，研究任务
复杂	Opus, GPT-4.5, Ultra, Grok-2	长输出，多步骤推理，编码，安全审计

所有 Openclaw Skills 数据均在本地处理，以生成包含智能体明细、定时任务频率和月度支出估算的 Markdown 报告。

name: agent-audit
description: >
  Audit your AI agent setup for performance, cost, and ROI. Scans OpenClaw config, cron jobs,
  session history, and model usage to find waste and recommend optimizations.
  Works with any model provider (Anthropic, OpenAI, Google, xAI, etc.).
  Use when: (1) user says "audit my agents", "optimize my costs", "am I overspending on AI",
  "check my model usage", "agent audit", "cost optimization", (2) user wants to know which
  cron jobs are expensive vs cheap, (3) user wants model-task fit recommendations,
  (4) user wants ROI analysis of their agent setup, (5) user says "where am I wasting tokens".

Agent Audit

Scan your entire OpenClaw setup and get actionable cost/performance recommendations.

What This Skill Does

Scans config — reads OpenClaw config to map models to agents/tasks
Analyzes cron history — checks every cron job's model, token usage, runtime, success rate
Classifies tasks — determines complexity level of each task
Calculates costs — per agent, per cron, per task type using provider pricing
Recommends changes — with confidence levels and risk warnings
Generates report — markdown report with specific savings estimates

Running the Audit

python3 {baseDir}/scripts/audit.py

Options:

python3 {baseDir}/scripts/audit.py --format markdown    # Full report (default)
python3 {baseDir}/scripts/audit.py --format summary     # Quick summary only
python3 {baseDir}/scripts/audit.py --dry-run             # Show what would be analyzed
python3 {baseDir}/scripts/audit.py --output /path/to/report.md  # Save to file

How It Works

Phase 1: Discovery

Read OpenClaw config (~/.openclaw/openclaw.json or similar)
List all cron jobs and their configurations
List all agents and their default models
Detect provider (Anthropic, OpenAI, Google, xAI) from model names

Phase 2: History Analysis

Pull cron job run history (last 7 days by default)
Calculate per-job: avg tokens, avg runtime, success rate, model used
Pull session history where available
Calculate total token spend by model tier

Phase 3: Task Classification

Classify each task into complexity tiers:

Tier	Examples	Recommended Models
Simple	Health checks, status reports, reminders, notifications	Cheapest tier (Haiku, GPT-4o-mini, Flash, Grok-mini)
Medium	Content drafts, research, summarization, data analysis	Mid tier (Sonnet, GPT-4o, Pro, Grok)
Complex	Coding, architecture, security review, nuanced writing	Top tier (Opus, GPT-4.5, Ultra, Grok-2)

Classification signals:

Simple: Short output (<500 tokens), low thinking requirement, repetitive pattern, status/health tasks
Medium: Medium output, some reasoning needed, creative but templated, research tasks
Complex: Long output, multi-step reasoning, code generation, security-critical, tasks that previously failed on weaker models

Phase 4: Recommendations

For each task where the model tier doesn't match complexity:

?? RECOMMENDATION: Downgrade "Knox Bot Health Check" from opus to haiku
   Current: anthropic/claude-opus-4 ($15/M input, $75/M output)
   Suggested: anthropic/claude-haiku ($0.25/M input, $1.25/M output)
   Reason: Simple status check averaging 300 output tokens
   Estimated savings: $X.XX/month
   Risk: LOW — task is simple pattern matching
   Confidence: HIGH

Coding/development tasks
Security reviews or audits
Tasks that have previously failed on weaker models
Tasks where the user explicitly chose a higher model
Complex multi-step reasoning tasks
Anything the user flagged as critical

Phase 5: Report Generation

Output a clean markdown report with:

Overview — total agents, crons, monthly spend estimate
Per-agent breakdown — model, usage, cost
Per-cron breakdown — model, frequency, avg tokens, cost
Recommendations — sorted by savings potential
Total potential savings — monthly estimate
One-liner config changes — exact model strings to swap

Model Pricing Reference

See references/model-pricing.md for current pricing across all providers. Update this file when prices change.

Task Classification Details

See references/task-classification.md for detailed heuristics on how tasks are classified into complexity tiers.

Important Notes

This skill is read-only — it never changes your config automatically
All recommendations include risk levels and confidence scores
When unsure about a task's complexity, it defaults to keeping the current model
The audit should be re-run periodically (monthly) as usage patterns change
Token counts are estimates based on cron history — actual costs depend on your provider's billing

上一篇：URL 转 PNG：将网页转换为适配移动端的图片 - Openclaw Skills 下一篇：PwnClaw 安全扫描：AI Agent 漏洞测试 - Openclaw Skills